Data Persistence in MongoDB

We have talked about the data persistence in Redis in my previous article, and we had come out a conclusion.

There is no way to ensure no data loss in the single instance Redis even the strictest setting is turned on.

Because Redis is designed as a in-memory data storage handling the high-throughput user scenarios, aka a cache not a database. Hence, the persistence only needs to be usable not reliable.

On the other hand, MongoDB as a NoSQL database is designed to deal with high-volume data and can be scaled out as needed. How about the data persistence in MongoDB?

WiredTiger

We are going to talk about MongoDB with the most popular storage engine, WiredTiger. First, I have to tell that MongoDB is able to persist data. However, if you don't use MongoDB correctly, data would still be lost.

The story starts from the journaling. By default, MongoDB stores its data into the memory to improve the performance. After reaching the criteria, MongoDB flushes data to the disk. Until then, the data is finally persisted. The criteria is:

  1. At every 100 milliseconds (can be adjusted by storage.journal.commitIntervalMs)
  2. Every 100 MB of data

The journaling is like Redis AOF, it uses WAL to write data to the disk.

Write Concern

If the behavior of MongoDB is storing data in the memory first, how should I make sure the data is durable? Fortunately, MongoDB provides the write concern to accomplish the data persistence. You can set the write concern to "{j: true}" to make MongoDB acknowledge clients after the data is stored on the disk. Or, you can use "{w: majority}", this implies "{j: true}" as well as replicates data to most slaves.

The connect string also supports the journaling, journal=true. This provides some convenience if you don’t want to assign the write concern every time.

Conclusion

The data persistence plays a very important role in the system design. In fact, many NoSQL databases have some tricks in them. When you are using them, you should not only use them carefully but also understand the implementations behind them.

MongoDB provides the data persistence as long as the write concern is used properly. In my opinion, "{w: majority}" is the best choice when you are considering the write concern. Because, we usually would like our data can be persisted in addition to persist to replicas.

15