M202 July 2014 Second week

July 30, 2014 21:23
m202,
mongodb

My assumption was that the second week would be all about tuning and sizing the system. When I glanced through the video titles it seemed a lot of configuration and it's impact would be presented.

Judging by the content of the syllabus, completion of this week would probably leave me with insights into, amongst others, the performance impact of the oplog, virtualization, geography, file systems and hardware on MongoDB.

While watching the videos of the second week, I noticed there are lots of pointers to external information on specific hardware configurations or file systems. There was much emphasis on the usefulness of the (latest) MongoDB production notes when configuring your systems.

Perhaps I will look into more of those informational sources at some later time, but the pressure I put on myself to finish the week didn't allow for delving deep into some of that external material. My decision to finish the first weeks videos first and some private troubles with staying awake made it harder actually finish the week on time.

MongoDB memory

The video modules had some clear chapters, the first being about MongoDB itself. The videos in this section mainly covered the usage of memory by MongoDB and how it displays in MMS.

The most important data in resident memory, all memory allocated to MongoDB, is the working set. This is the active data used by MongoDB, collections and indexes. Aside from that there's process data which has stuff like connections, journaling, etc.

MongoDB uses mmap to load data into memory and on top of that Linux usually likes to load as much as possible into file system cache as well. When something isn't found in memory it causes a hard page fault, opposed to a soft page fault which occurs when it's in memory but not associated with MongoDB yet. Hard page faults in general cause performance issues, that's why there are some videos on how to pre-heat using either touching a collection or, more efficiently, using targeted queries that will load a good chunk of working set into memory.

Choosing disks

There were some videos on the benefits and choices to make when selecting the storage solution(s) for your data. Most data access in MongoDB is random, which means SSD or RAID increases performance because they have lower seek times. Capped collections, which include the special oplog collection, are usually sequentially stored and could be efficiently stored on spinning disks. Separating the data over different types of storage is possible by specifying parameters on the mongod process.

Network storage usually has penalties because of the extra overhead of the network on whatever storage media used to back it.

Choosing file systems

MongoDB tends to allocate a lot of space which it might not use right away. EXT4 or XFS are the recommended file systems because they support a low level call to allocate space, while EXT3 requires zero-ing out space to actually allocate it. There was a brief mention of BtrFS and ZFS, but these file systems aren't thoroughly tested by MongoDB Inc. at the moment.

There's one video explaining why and how to use swap partitions, but in short you want to use and configure it properly on Linux just to avoid mongod processes from being killed when the kernel detects the system is running out of memory.

Configuring file systems

Even if you pick a recommended file system, it still needs some tweaking to configure it optimally for MongoDB. In particular you should disable atime on the mount, because it slows down performance and serves no benefit in case of MongoDB.

Managing file systems

Some of the videos covered how MongoDB uses disk space — allocating ahead of using it to avoid some penalties — and how to reclaim it.

By default a mongod process it will easy start allocating in excess of 3 Gb of disk space without actually putting data in. There are options to configure this according to the situation, which in some cases is advisable.

As the collection is written to data in the data files might get fragmented. This could lead to additional disk usage and performance degradation as well when memory is restricted. One of the ways to remedy fragmentation is by compacting the fragmented collection(s) including existing indexes. MongoDB will then block activity for the entire database and starts moving data around. A possible side-effect is that the data files use more disk space after completion.

The curriculum only mentions using repairDatabase() on a stand-alone MongoDB or a rolling rebuild of nodes in a replica set. I believe mongoexport is an option as well.

Final videos

The last videos covered the topics of virtualization and replica sets bigger than three nodes.

Although the support on common virtualization systems such as VMWare and Xen is well established, recently the options of jails and containers gained popularity. These last two options are at the moment less explored and supported in official ways. The main point made in the videos was that virtualization gives you a way to split up hardware into more manageable chunks. If needed, you will have the benefit of relocating particularly busy virtualized systems to dedicated hardware should your virtualization option support this.

Bigger replica sets

Official documentation recommends replica sets of three nodes in either primary-secondary-secondary or primary-secondary-arbiter configuration. Of course it's possible and documented to use bigger replica sets, but there are limits to the total size and number of voters in a set. The last two videos cover replica sets bigger than three nodes and why you might want to use those configurations.

As an example the course went through implications of a geographically distributed replica set. It's key to consider network latency and failover scenarios in this case as well as the, new for me, option of chaining in the replica set. Although the option to specify a replica set member to synchronize from has been available since version 2.2 and my first MongoDB version was 2.4, I never came across this option before. This was also the first time I heard of new members in a replica set picking the closest member as the one to start synchronizing from.

This second week was a lot to take in, but not overwhelming. I guess the most benefit is, as always, in going through the materials and concepts referenced in the video. Read more, practice more and actually try to delve deeper into the resources than you would by simply completing the course.

← Previous
Catching up to week 1 videos
Next →
M202 July 2014 Third week