M101JS Week 6

September 23, 2013 19:37
m101js,
mongodb,
nodejs

On August 12th the first M101JS (MongoDB for Node.js developers) course started. Previously I completed both the M101J (MongoDB for Java developers) and M102 (MongoDB for DBAs) courses available through 10gen's education site.

When I finish this course I will have refreshed my MongoDB knowledge and got some experience with Node.js in the form of a blog application. I decided to document my progress in (at least) weekly summaries. This will be the sixth of seven. You can read part five here.

For this series of blog posts I will structure each post into an introduction (you just passed it) and two sections. The first section, Lectures, will summarize what I learned from the videos and related examples. In the second section, Homework, I will mention what I learned from practicing the homework assignments and anything I might have done extra because of it.

This week I scanned the homework assignments before I started with the lectures. Two out of five assignments I could answer instantly, but the others made realize I need to watch the lectures so I am sure of the answers I give.

Lectures

I assumed the lectures would cover the different levels of write concerns on write operations and the read preferences on read operations in replicated environments. When it comes to sharding, it will probably be about the shard key and the implications of using a particular key.

Replication

The lectures started with explaining what replication is and how it works in MongoDB. Developers won't need a lot of details on how to set it up and maintain it.

As a developer there are a few things you should know:

A replica set is a set of nodes with one primary and several secondary nodes
Only the primary node will receive writes from clients and these writes propagate to the secondaries
You connect to a replica set by feeding a list of known nodes to the driver, the driver will discover the other nodes on its own.
When the master node is unresponsive for some time, failover will occur and a new primary node is elected.
If writes haven't propagated beyond the master node, they might be rolled back once that node joins the replica set again.
Electing a new primary node will take some time
As always, network errors might occur and should be handled elegantly

Failover with Node.js driver

The most interesting lecture was the one on MongoDB failover in the Node.js driver. It had been a while when I saw code to connect with MongoDB. In the previous weeks the focus was less on connecting to the database and directed more at making the code work for the homework assignment.

The Node.js MongoDB driver hides all the handling of waiting for the failover from the application. Requests go into a buffer until the failover completes. The application code can keep sending its asynchronous requests and at some time in the future they will return.

The lectures left me with some questions and situations I like to explore in perhaps a future blog:

How to deal with a crash if the failover is still in progress?
Even with the driver handling buffering of requests, do you still need to handle errors of too many open files?
Which writes finished successfully, since some might still be buffered?

Write Concern and Read Preference

When dealing with data you want to make sure a write persist and data is up to date. MongoDB allows you to tweak these levels for a balance between security and speed. You could set a global preference or have custom settings for write concern and read preference for each operation during a connection.

Write concern instructs the driver to wait until the data has passed a certain point of persistence. The basic levels go from being in the journal on the primary node to writing on n, majority or m number of nodes where n ≤ 'majority' < m. When you're not using a replica set then it doesn't make much sense to put the write concern any higher than 1, since the driver will wait until the concern is satisfied.

By default reads can only be done against the primary. The data on the primary is most up to date, so it makes sense it is the default for reads. In that scenario secondary nodes will function as fail over. By setting a read preference it is possible to read from secondary nodes as well. Perhaps to balance the load in situations where new data can eventually be consistent or stale data is permitted.

Tag sets

A topic not covered during this course is tag sets. Tag sets allow for advanced configuration of write concerns and read preferences by considering tags on nodes, more details are found in the documentation.

Sharding

The final lectures of the course are about the way you scale horizontally in MongoDB. Horizontal scaling is done by making shards of the collection that will be distributed over replication sets.

There isn't much for developers to know about sharding, just that it exists and how to design your application to allow horizontal scaling. The most important thing is that to make shards you will need a shard key.

For efficient use of shards the majority of operations on the collection must use the shard key to find documents to insert, read or update. If not, then instead of knowing which shard to address, you will end up waiting for a response of all shards.

Picking the shard key

The best way to pick a shard key is to figure out the usage pattern of your data.

Considerations when pick the shard key are:

Once you picked a shard key, you can't change it. It's immutable.
The shard key must be present in each document.
The shard key is part of an index. If you want to shard a collection that has data inside, you should create that index first.
The shard key should have enough variation (cardinality) to let MongoDB chop it up in smaller pieces.
Unique indexes in a sharded environment need to include the shard key. Effectively this means that only the shard key guarantees uniqueness in a sharded environment. Other unique indexes are only unique on a shard.

Homework

This week's homework assignments consisted of four questions and one practical assignment. Two questions covered replication in MongoDB and the other two tested the sharding knowledge. I don't really get the point yet of the practical assignment, which makes you set up a sharded environment and verifies you have done so properly, unless it is something that will be used in the final exam.

I had no problem with the questions that asked how to ensure a certain persistence or which shard would be queried for a certain read operation. However, when asked about replication I gave too much credit to the mongo shell. I also wasn't sure about the implications of sharding and in particular what influence it has on uniqueness.

The last assignment was setting up the sharded environment, not much trouble with that because I knew it from my M102 course and the script was pretty much handed to me.

This week concluded all the regular lectures, in week 7 there will be non-curriculum lectures and of course the final exams. It's almost over and time for my holiday!

← Previous
M101JS Week 5
Next →
M101JS Week 7