Skip to main content
Glimpses of Daniel's world

M101JS Week 6

On August 12th the first M101JS (MongoDB for Node.js developers) course started. Previously I completed both the M101J (MongoDB for Java developers) and M102 (MongoDB for DBAs) courses available through 10gen's education site.

When I finish this course I will have refreshed my MongoDB knowledge and got some experience with Node.js in the form of a blog application. I decided to document my progress in (at least) weekly summaries. This will be the sixth of seven. You can read part five here.

For this series of blog posts I will structure each post into an introduction (you just passed it) and two sections. The first section, Lectures, will summarize what I learned from the videos and related examples. In the second section, Homework, I will mention what I learned from practicing the homework assignments and anything I might have done extra because of it.

This week I scanned the homework assignments before I started with the lectures. Two out of five assignments I could answer instantly, but the others made realize I need to watch the lectures so I am sure of the answers I give.

Lectures

I assumed the lectures would cover the different levels of write concerns on write operations and the read preferences on read operations in replicated environments. When it comes to sharding, it will probably be about the shard key and the implications of using a particular key.

Replication

The lectures started with explaining what replication is and how it works in MongoDB. Developers won't need a lot of details on how to set it up and maintain it.

As a developer there are a few things you should know:

Failover with Node.js driver

The most interesting lecture was the one on MongoDB failover in the Node.js driver. It had been a while when I saw code to connect with MongoDB. In the previous weeks the focus was less on connecting to the database and directed more at making the code work for the homework assignment.

The Node.js MongoDB driver hides all the handling of waiting for the failover from the application. Requests go into a buffer until the failover completes. The application code can keep sending its asynchronous requests and at some time in the future they will return.

The lectures left me with some questions and situations I like to explore in perhaps a future blog:

Write Concern and Read Preference

When dealing with data you want to make sure a write persist and data is up to date. MongoDB allows you to tweak these levels for a balance between security and speed. You could set a global preference or have custom settings for write concern and read preference for each operation during a connection.

Write concern instructs the driver to wait until the data has passed a certain point of persistence. The basic levels go from being in the journal on the primary node to writing on n, majority or m number of nodes where n ≤ 'majority' < m. When you're not using a replica set then it doesn't make much sense to put the write concern any higher than 1, since the driver will wait until the concern is satisfied.

By default reads can only be done against the primary. The data on the primary is most up to date, so it makes sense it is the default for reads. In that scenario secondary nodes will function as fail over. By setting a read preference it is possible to read from secondary nodes as well. Perhaps to balance the load in situations where new data can eventually be consistent or stale data is permitted.

Tag sets

A topic not covered during this course is tag sets. Tag sets allow for advanced configuration of write concerns and read preferences by considering tags on nodes, more details are found in the documentation.

Sharding

The final lectures of the course are about the way you scale horizontally in MongoDB. Horizontal scaling is done by making shards of the collection that will be distributed over replication sets.

There isn't much for developers to know about sharding, just that it exists and how to design your application to allow horizontal scaling. The most important thing is that to make shards you will need a shard key.

For efficient use of shards the majority of operations on the collection must use the shard key to find documents to insert, read or update. If not, then instead of knowing which shard to address, you will end up waiting for a response of all shards.

Picking the shard key

The best way to pick a shard key is to figure out the usage pattern of your data.

Considerations when pick the shard key are:

Homework

This week's homework assignments consisted of four questions and one practical assignment. Two questions covered replication in MongoDB and the other two tested the sharding knowledge. I don't really get the point yet of the practical assignment, which makes you set up a sharded environment and verifies you have done so properly, unless it is something that will be used in the final exam.

I had no problem with the questions that asked how to ensure a certain persistence or which shard would be queried for a certain read operation. However, when asked about replication I gave too much credit to the mongo shell. I also wasn't sure about the implications of sharding and in particular what influence it has on uniqueness.

The last assignment was setting up the sharded environment, not much trouble with that because I knew it from my M102 course and the script was pretty much handed to me.

This week concluded all the regular lectures, in week 7 there will be non-curriculum lectures and of course the final exams. It's almost over and time for my holiday!