Skip to main content

MongoDB Write Concern Performance

MongoDB is a popular NoSQL database which scales to very significant volumes through sharding and can provide resiliency through replication sets. MongoDB doesn't support the sort or transaction isolation that you might get with a more traditional database (read committed, dirty reads etc.) and works at the document level as an atomic transaction (it's either inserted/updated, or it's not) - you cannot have a transaction spanning multiple documents.

What MongoDB does provide is called "Write-Concern" which provides some assurance over whether the transaction was safely written or not.

You can store a document and request "acknowledgement" (or not), whether the document was replicated to any replica-sets (for resiliency), whether the document was written to the transaction log etc. There's a very good article on the details of Write-Concern over on the MongoDB site. Clearly the performance will vary depending on the options chosen and the Java driver supports a wide range of these:


  • ACKNOWLEDGED




  • ERRORS_IGNORED




  • FSYNCED




  • FSYNC_SAFE




  • JOURNALED




  • JOURNAL_SAFE




  • MAJORITY




  • NONE




  • NORMAL




  • REPLICAS_SAFE




  • REPLICA_ACKNOWLEDGED




  • SAFE




  • UNACKNOWLEDGED




So for a performance comparison I fired up a small 3 node MongoDB cluster (2 database servers, 1 arbitrator) and ran a script to store 100 documents in the database using the various methods available to see what the difference is. The database was cleaned down each time (to zero - so overall is very small).

**WARNING: Performance testing is highly dependent upon the environment in which it is run. These results are based on a dev/test environment running x3 guests on the same host node and may not be representative for you and only exist to provide a comparison. **

The results for all modes are shown below and shows x3 relatively distinct clusters.

[caption id="attachment_163" align="alignnone" width="480"]Write Concern - All Modes Write Concern - All Modes[/caption]

Note: The initial run in all cases incurs a start up cost and hence appears slower than normal. This dissipates quickly though and performance can be seem to improve after this first run.

The slowest of these are FSYNCED, FSYNC_SAFE, JOURNALED and JOURNALED_SAFE (with JOURNALED_SAFE being the slowest).

[caption id="attachment_166" align="alignnone" width="480"]Write-Concern Cluster 3 - Slowest Write-Concern Cluster 3 - Slowest[/caption]

These options all require the data to be written to disk which explains why they are significantly slower than other options though the contended nature of the test environment likely makes the results appear worse than they would be in a production environment. FSYNC modes are mainly useful for backups and the like so shouldn't be used in code. JOURNALED modes depend on the journal commit interval (default 30 or 100ms) as well as the performance of your disks. Interestingly JOURNAL_SAFE is the supposedly the same as JOURNALED so seems a little odd that I can see a relatively significant reduction in performance consistently.

The second cluster improves performance significantly (from 3.5s overall to 500ms). This group covers the MAJORITY, REPLICAS_SAFE and REPLICAS_ACKNOWLEDGED options.

[caption id="attachment_165" align="alignnone" width="480"]write-concern-c2 Write-Concern Cluster 2 - Mid[/caption]

These options are all related to data replication to secondary nodes. REPLICA_ACKNOWLEDGED waits for x2 servers to have stored the data whilst MAJORITY waits for the majority to have stored and in this test since there are only x2 database servers it's unsurprising that the results are similar. As the number of database servers increases then MAJORITY may be safer than REPLICA_ACKNOWLEDGED but will suffer some performance degradation. This though isn't a linearly scaled performance drop since replication will generally occur in parallel. REPLICA_SAFE is supposedly the same as REPLICA_ACKNOWLEDGED and in this instance the results seem to back this up.

The fastest options cover everything else; ACKNOWLEDGED, SAFE, NORMAL, NONE, ERRORS_IGNORED and UNACKNOWLEDGED.

[caption id="attachment_164" align="alignnone" width="480"]Write-Concern Cluster 1 - Fastest Write-Concern Cluster 1 - Fastest[/caption]

In theory I was expecting SAFE and ACKNOWLEDGED to be similar with NORMAL, NONE, ERRORS_IGNORED and UNACKNOWLEDGED quicker still since this last set shouldn't wait for any acknowledgement from the server - once written to socket then assume all ok. However, the code I used was an older library I developed some time back which returns the object ID once stored. Since this has to read some data back, some sort of acknowledgement is implicit and so unsurprisingly they all perform similarly.

ERRORS_IGNORED and NONE are deprecated and shouldn't be used anymore whilst NORMAL seems an odd name as the default for MonoDB itself is ACKNOWLEDGED!?

In summary. For raw speed ACKNOWLEDGED should do though if you want fire-and-forget then specific code and UNACKNOWLEDGED should be faster still. A performance drop will occur if you want the assurance that the data has been replicated to another server via REPLICA_ACKNOWLEDGED and this will depend on your network performance and locations so is worth testing for your specific needs. Finally, if you want to know it's gone to disk then it's slower still with the JOURNALED option, especially if you've contention on the disks as I do. For the truly paranoid there should be a REPLICA_JOURNALED option which would confirm both replicated and journaled.

Finally, if you insist on a replica acknowledging as well then it needs to be online and your code may hang if a replica is not available. If you've lots of replicas then this may be acceptable but if you've only 1 (as in this test case) then it's bad enough to bring the application down immediately.

 

Comments

  1. Alice Bevan-McGregorFri Nov 14, 03:13:00 am UTC

    It is more efficient to generate ObjectIds client-side and include them in your inserts; this helps eliminate the extra round-trip. It would be interesting to see this article updated with this slight change in approach.

    ReplyDelete
  2. Thanks Alice. I've archived off the environment so it'll take me a little time to spin it back up but I'll try that out and see what difference it makes...

    ReplyDelete

Post a comment

Popular posts from this blog

An Observation

Much has changed in the past few years, hell, much has changed in the past few weeks, but that’s another story... and I’ve found a little time on my hands in which to tidy things up. The world of non-functionals has never been so important and yet remains irritatingly ignored by so many - in particular by product owners who seem to think NFRs are nothing more than a tech concern. So if your fancy new product collapses when you get get too many users, is that ok? It’s fair that the engineering team should be asking “how many users are we going to get?”,   or “how many failures can we tolerate?” but the only person who can really answer those questions is the product owner.   The dumb answer to these sort of question is “lots!”, or “none!” because at that point you’ve given carte-blanche to the engineering team to over engineer... and that most likely means it’ll take a hell of a lot longer to deliver and/or cost a hell of a lot more to run. The dumb answer is also “only a couple” and “

Inter-microservice Integrity

A central issue in a microservices environment is how to maintain transactional integrity between services. The scenario is fairly simple. Service A performs some operation which persists data and at the same time raises an event or notifies service B of this action. There's a couple of failure scenarios that raise a problem. Firstly, service B could be unavailable. Does service A rollback or unpick the transaction? What if it's already been committed in A? Do you notify the service consumer of a failure and trigger what could be a cascading failure across the entire service network? Or do you accept long term inconsistency between A & B? Secondly, if service B is available but you don't commit in service A before raising the event then you've told B about something that's not committed... What happens if you then try to commit in A and find you can't? Do you now need to have compensating transactions to tell service B "oops, ignore that previous messag

Equifax Data Breach Due to Failure to Install Patches

"the Equifax data compromise was due to their failure to install the security updates provided in a timely manner." Source: MEDIA ALERT: The Apache Software Foundation Confirms Equifax Data Breach Due to Failure to Install Patches Provided for Apache® Struts™ Exploit : The Apache Software Foundation Blog As simple as that apparently. Keep up to date with patching.