MongoDB

mongod, mongo, mongosh, mongos, what now?

As you might know, I joined MongoDB last month as a Keynote Storyteller, and, as you might expect, in order to work on the keynotes it’s important that I understand the world of MongoDB to the best of my ability.

As I progressed through my technical onboarding, I became aware that I was following some instructions without really understanding why. There is always a time and a place for this approach but this wasn’t it. In particular, I realised I didn’t exactly know when I would write these to my terminal:

  • mongod
  • mongo
  • mongosh
  • mongos

Yes, I know there are many more like mongodump, snigger, but I want to focus on these four first because they are the ones that I found myself using the most in my technical onboarding.

What is MongoDB?

This is as good a place to start as any. If I ask you what is MongoDB you’ll probably say “It’s a Database, Helen!” And you’d be right, but this is an excellent time for me to point out that it’s So Much More than just a database.

This is a good time to introduce MongoDB Atlas into the equation. MongoDB Atlas is a MongoDB database as a service and it has a ton of cool services that you can check out at your own pace with a free tier.

I’m starting here because the rest of this post makes references to MongoDB terminology. You will need a MongoDB cluster which itself is made up of multiple database servers of which one will be primary and the rest secondary (replicas). This means that if your primary server is unavailable/unreachable, your data can still be accessed. Your cluster also has clients that can connect to it and manipulate your data. Now, down to business.

mongod

No, mongod is not some kind of database deity. In fact, it’s usually pronounced mongo-dee and it’s the MongoDB daemon, also known as the server for MongoDB. If you don’t start your server you have no cluster and then, well, you got nothing. You run mongod to start your server.

The MongoDB server listens for connections from clients on port 27017, and stores data in the /data/db directory when you use mongod. However, you can start your MongoDB server with different parameters if required:

mongod -–port 12345 --dbpath /srv/mongodb/

That said, MongoDB Atlas runs mongod for you, so you don’t need to run the server yourself. It’s just useful to know from an architectural perspective at this stage especially as it might pop up in training.

The MongoDB daemon process that you start with mongod (or that MongoDB Atlas starts for you) manages your data access and any requests you might make such as add this, query that, or change that.

Documentation and helpful links:

mongo and mongosh

So what is mongo? It’s the shell, it’s the client, it’s a javascript interface that you can use to interact with the MongoDB server (mongod). For example:

mongo --host mongodb0.example.com:27017

However, as of June 2020, it was superseded by the new Mongo Shell, called, wait for it, mongosh! Bold for emphasis, it genuinely took me more than a few seconds to realise why it was called mongosh – you’re welcome. mongosh has improved syntax highlighting, command history and logging in case you were wondering! For example:

mongosh --host mongodb0.example.com:27017

When should you use mongo or mongosh? If you are able, you should run mongosh when you want to interact or change the data. Of course, if you prefer you can still use mongo but you won’t get the new features introduced in the newer MongoDB shell (mongosh). Don’t forget that the shell, whichever you use, is just a way to communicate with your database cluster.

Documentation and helpful links:

mongos

So that’s the daemon, or server, for MongoDB covered and two client options, that just leaves mongos for this blog post. This does not spin up multiple MongoDB clients, sadly. It’s just a proxy that sits between the client application (mongo/mongosh) and a sharded database cluster, that is multiple mongod replica sets.

It’s what your clients (mongo or mongosh) interact with when you run queries against sharded data. It does not replace your database server, it is in addition to. You need to point it at your replica sets, for example:

mongos --configdb cfg/127.0.0.1:27017

Now your first question should be why do I even need a proxy between my client and server just because my data is sharded? That is an excellent question! You need mongos to route the queries and any write operations to the shard(s) as required.

The mongos proxy is a map that your client can use to query or make changes in your cluster when your data is sharded. This in-between proxy is required because your cluster doesn’t know which shard(s) that data exists on, but the mongos proxy does. It also knows which shard to insert data into, if that’s your requirement.

Documentation and helpful links:

Putting it all together

This blog post has been a deliberately simplistic viewpoint to illustrate the components, however, I like to know what’s going on beneath various abstractions which is why I wrote it. That said, do remember that MongoDB Atlas manages this for you!

In summary, from the CLI, run mongod to start your MongoDB server for your cluster in a training environment. Note that in the real world (which again MongoDB Atlas manages for you), you would need to run at least three mongod instances. You should never run just one in a production environment. You can then use either the legacy mongo or the newer mongosh shell commands to interact with your cluster.

If your data is sharded you will need to additionally run at least one mongos process to ensure that your queries are routed to the right shard(s). Again I’m talking about a training environment here rather than a production environment!

I got one of the real MongoDB developer advocates (thanks Mark) to help me draw a diagram for you:

Diagram of the commands as architecture

Finally, for those of you wondering where you can get your hands on some MongoDB training, go check out MongoDB Basics and see where your learning journey takes you.

And that is mongod, mongo, mongosh, and mongos!