Most of my experience as a developer is with MongoDB. In a lot of ways, I regret this. SQL is the defacto database, data analysis skill, and data science tool. There’s just no way around it, and it’s for good reason - SQL queries are incredibly expressive and SQL databases tend to be well engineered, ACID-compliant machines that require applications to get their shit together and normalize their data before getting started.
That said, MongoDB is a beast too. It is very young compared to PostGreSQL and MySQL, but in recent years it has matured to the point where it can rival them in almost every feature. Some examples
- It’s ACID Compliant
- It does allow for schema definition and enforcement
- It can handle what would be known in SQL-language as joins (making it technically relational, albeit a little bit less relational in design)
- It provides all sorts of other features like profiling, caching
Of course it’s not the same.
- It allows documents to be embedded, which can lead to both wins and messes in data architecture
- It’s easier to shard
- It’s usually not as ACID as, say, PostGres, at least not when it comes to sharding, but this is mathematically a mutually-exclusive tradeoff anyway
- Queries and configuration are in JSON, making it easy for JS engineers to pick up
- Other devs usually don’t think you’re a real engineer if you use it
It has a lot of differences, and as MongoDB has evolved, they’ve become increasingly more subtle and situationally nuanced. And I am still trying to understand them! That said, there a ton of things I love about it, so I thought I’d list them out here.
Dope Features
Time Series Collections - a collection type that can be useful for collecting and analyzing time-based data.
Geospatial Queries - useful query type for geospatial data, and I guess potentially any data that can be distributed across multiple dimensions.
[MongoDB Schemas](MongoDB Schema Design: Data Modeling Best Practices)- MongoDB is not famous for having schemas - that’s a SQL thing. But. It can. And they can be expressed and enforced thoroughly and granularly. It’s possible, for example, to define regular expressions that govern field data. With tools like the Compass (MongoDB’s official GUI) schema analysis tool and the ability to make schema-violations log warnings
and not throw errors, there appears to be considerable support for incrementally migrate non-schema’d collections into ones with schema protection. This is potentially an amazing solution to nimble application design where the schema requirements start unknown and become more concrete development evolves.
Aggregation Operations - very powerful tool for doing data transformation with database compute, and prior to sending data over the network. Because aggregations are expressed as arrays of JSON, it’s easy to define aggregations dynamically in javascript, which makes them even more flexible. They can be a little bit more confusing to reason about than SQL queries, but are very expressive, and maybe touring complete who knows.
Atlas Charts- configurable, tableaux-style data visualization dashboard with granular permissioning and programmatic configuration offered by the MongoDB organization to create charts and other (dynamically updating and embeddable) graphics using MongoDB data. Aggregations can be used as an intermediary to transform data before visualization. For convenience, aggregations can also be saved and re-used across multiple visualizations to fields that represent abstractions of other fields and update in realtime. Kind of OP.
Profiling - I guess I lumped these together because they are both just good tools for improving performance at scale for slow operations. The MongoDB query profiling is configurable and outputs logs for queries that take longer than a certain threshhold, as well as information about those queries. There is an Atlas service (atlas is the suite of services offered by the for-profit part of the MongoDB organization) that allows these logs to selectively create alerts, thereby allowing developers to identify key bottlenecks in queries.
WiredTiger- WireTiger is a caching / “storage engine” program that can be used out of the box with MongoDB Atlas or can be configured to run on a server and cache frequent results.
Two Repos for trying mongo out
- Mongoose Lab(mongoose is a popular client library / ODM for using mongo)
- Mongo Query Optimization (data seeding and writing performance research)