Defending Against Query Selector Injection Attacks

In case you haven’t come across Petko Petkov’s post on injection attacks against MongoDB and NodeJS yet, its definitely worth a careful read. In this article, he explains a pretty simple exploit that I suspect affects a fair number of applications, including some that I’ve implemented.

The general idea behind Petko’s exploit is that, typically, when you want to get all documents where username is equal to the user-provided username, you may do something like this:

User.findOne({ username: req.body.username }, function(err, user) {
  // Handler code here
});

However, let’s say you’ve exposed a JSON-based API and I’m a malicious user that sends you the following body JSON:

{ username: { $gt: "" } }

The query that will get sent to MongoDB then looks like this:

{ username: { $gt: "" } }

Assuming your usernames are strings, that query will return a random user!

Even if you’re using URL encoding instead of JSON for your API, you may not be safe. ExpressJS’ body parser middleware, by default, uses the qs module to parse URL-encoded HTTP request bodies. The qs module is designed to parse URL-encoded strings in a way that makes decoding objects easier, so parsing the string username[$gt]= gives you a nested object { username: { $gt: undefined } }. This is really bad news bears.

Thankfully, query selector injection attacks are pretty easy to defend against, so no need to throw your Express JSON API out the window. Here are two strategies to make sure you’re not vulnerable.

Remove keys that start with $ from user input

One of the cruxes of Petko’s exploit is that, in the above example, MongoDB determines the query selector by scanning the req.body.username object for a key that matches a query selector. There are two ways you can avoid this. The first, and probably most obvious, is to make sure req.body.username is a string rather than an object. JavaScript’s toString function should be sufficient:

User.findOne({ username: (req.body.username || "").toString(10) }, function(err, user) {
  // Handler code here
});

However, in some cases, you may want to query on user-provided objects, and so casting to a string isn’t sufficient. Since all MongoDB query selectors start with $, you can check if req.body.username is an object, and, if so, remove any keys from the object that start with $. I put together a really simple npm module called mongo-sanitize (see it on Github) does this for you, in case you don’t want to implement this yourself. 

var sanitize = require('mongo-sanitize');

// The sanitize function will strip out any keys that start with '$' in the input,
// so you can pass it to MongoDB without worrying about malicious users overwriting
// query selectors.
var clean = sanitize(req.params.username);

Users.findOne({ name: clean }, function(err, doc) {
  // ...
});

If this approach doesn’t work for you for whatever reason, don’t worry, there’s another way.

Explicitly specify the query selector when querying with untrusted data

The other crux of Petko’s exploit is that, typically, you don’t specify a query selector when you want to find a document where username is exactly equal to the user input. As a matter of fact, MongoDB doesn’t have a fully supported $eq query selector just yet (although the core server team is working on it). In lieu of $eq, however, you can use the $in selector:

User.findOne({ username: { $in: [req.body.username] } }, function(err, user) {
  // Handler code here
});

This is slightly more verbose, but if a malicious user tried a query selector injection attack, the query passed would look like this:

{ username: { $in: [{ $gt: "" }] } }

Assuming that your usernames were all strings, this query would return no results, as expected.

Conclusion

Query selector injection attacks are pretty insidious and its easy to be vulnerable, especially if you’ve been happily implementing JSON REST APIs. Thankfully, using one of the above principles, either by using mongo-sanitize or by explicitly specifying a query selector for untrusted data, you can avoid the query selector injection pitfall without having to give up the ease-of-use of JSON APIs. If you want more details on securing your MongoDB application, check out the security checklist and MongoDB’s blog post on security design and configuration.

 

The Future of MongooseJS

Two weeks ago marked a big milestone: mongoose 3.9.0 was released. Be warned, mongoose’s versioning practice is that even numbered branches are stable and odd are unstable. While all our tests check out on 3.9.0, I would recommend sticking to 3.8.x releases in production for now. 3.9.0 was mongoose’s first unstable release since October 2013. While the changes in 3.9.0 were relatively minor, they open the door to getting some interesting features into 4.0. Here are some of the high-level features I think should make it in to 4.0:

1) Update() with Validators

Mongoose right now doesn’t run validators on calls to Model.update(). I’ve found often that its more elegant and performant to call update() directly instead of loading the document, modifying it, and then saving it. Mongoose should have better support for this paradigm in the future.

2) Browser-friendly and browserify-friendly schema validation module.

Currently, there’s no good way to send your schemas to the browser to do client-side validation. While introducing an API endpoint for validation is quite possible, hooking up mongoose schema validation directly to a tool like AngularJS in the browser can open up some incredibly cool opportunities.

3) Better integration with Koa.js and Harmony in general

Fair warning, I’m not well versed in the particulars of ES6 or Koa just yet, but I have noticed some people opening Github issues related to these subjects. As more people start moving to ES6, mongoose needs to have its A-game ready.

harmony_is_coming

4) Per-document events

The general idea is that mongoose doesn’t scope document events to a particular document, that is, doc1.on(‘event’) will get triggered by doc2.emit(‘event’) if doc1 and doc2 are instances of the same model. This is expected behavior now, but its very counterintuitive. At the very least, in 4.0 doc1.on(‘event’) will get triggered by doc2.emit(‘event’) if doc1 and doc2 are the same JS object. However, we may introduce behavior where doc1.on(‘event’) will get triggered by doc2.emit(‘event’) if doc1 and doc2 have the same _id.

5) Reworking Population

Populate is extremely useful, but also has some very unfortunate dark corners and counter-intuitive behavior that I’d like to rework. There are numerous features, such as caching integration, manual population, and populating on fields other than _id that the current implementation makes very difficult. I’m hoping to get all these features into 4.0.

I’m still very much in the planning stages for mongoose 4.0, so comments, concerns, and feature suggestions are very much welcome. Feel free to open up issues on Github with features you’d like to see in 4.0.

What’s New in Mongoose 3.8.9

I have an important announcement to make: over the last couple weeks I’ve been taking over maintaining mongoose, the popular MongoDB/NodeJS ODM. I have some very big shoes to fill, Aaron Heckmann has done an extraordinary job building mongoose into an indispensable part of the NodeJS ecosystem. As an avid user of mongoose over the last two years, I look forward to continuing mongoose’s storied tradition of making dealing with data elegant and fun. However, mongoose isn’t perfect, and I’m already looking forward to the next major stable release, 4.0.0. Suggestions are most welcome, but please be patient, I’m still trying to catch up on the backlog of issues and pull requests.

On to what’s new in 3.8.9

On that note, Mongoose 3.8.9 was (finally) released yesterday. This was primarily a maintenance release, the major priority was to clean up several test failures against the new stable version of the MongoDB server, 2.6.x, without any backward-breaking API changes. I’m proud to say that 3.8.9 should be compatible with MongoDB 2.2.x, 2.4.x, and 2.6.x. In addition, I added improved support for a couple of key MongoDB 2.6 features:

Support for Text Search in MongoDB 2.6.x

As I mentioned in my post on text search, mongoose 3.8.8 didn’t quite support text search yet: mongoose prevented you from sorting by text score. This commit, which went into mquery 0.6.0, allows you to use the new $meta operator in sort() calls. Here’s an example of how you would use text search with sorting in mongoose:

/* Blog post collection with two documents:
 * { title : 'text search in mongoose' }
 * { title : 'searching in mongoose' }
 * and a text index on the 'title' field */
BlogPost.
  find(
    { $text : { $search : 'text search' } },
    { score : { $meta: "textScore" } }
  ).
  sort({ score : { $meta : 'textScore' } }).
  limit(2).
  exec(function(error, documents) {
    assert.ifError(error);
    assert.equal(2, documents.length);
    assert.equal('text search in mongoose', documents[0].title);
    assert.equal('searching in mongoose', documents[1].title);
    db.close();
    done();
  });

The relevant test case can be found here (there’s also test coverage for text search without sorting). Please note that you’re responsible for making sure you’re running >= MongoDB 2.6.0, running text queries against older versions of MongoDB will not give you the expected behavior. MongoDB’s docs about text search can be found here.

Aggregation helper for $out:

As I mentioned in my post about the aggregation framework’s $out pipeline stage (which pipes the aggregation output to a collection), mongoose’s aggregate() function doesn’t prevent you from using $out. However, mongoose also supports syntactic sugar for chaining helper functions onto aggregate() for building an aggregation pipeline:

MyModel.aggregate()
  .group(group.$group)
  .project(project.$project)
  .exec(function (err, res) {
  });

This commit adds a .out() helper function that you can use to add a $out stage to your pipeline. Note that you’re responsible for making sure that the .out() function is the last stage of your pipeline, because the MongoDB server will return an error if it isn’t. The relevant test case can be found here. Here’s how the new helper function looks in action:

var outputCollection = 'my_output_collection';

MyModel.aggregate()
  .group(group.$group)
  .project(project.$project)
  .out(outputCollection)
  .exec(function(error, result) {
  });

A Minor Caveat For 2.6.x Compatibility

There is still one unfortunate edge case remaining in 3.8.9 which only affects MongoDB 2.6.x. MongoDB 2.6.x unfortunately no longer allows empty $set operators to be passed to update() and findAndModify(). This change only affects mongoose in the case where you set the upsert flag to true. This commit attempts to mitigate this API inconsistency, but there is still one case where you will get an error on MongoDB 2.6.x but not in 2.4.x: if the query passed to your findAndModify() only includes an _id field. For example,

MyModel.findOneAndUpdate(
  { _id: 'MY_ID' },
  {},
  { upsert: true },
  function(error, document) {
  });

Will return a server error on MongoDB 2.6.1 but not 2.4.10. Right now, there is no good way to handle this case in both 2.4 and 2.6 without either doing an if-statement on the version or breaking the existing API. You can track the progress of this issue on Github.

Conclusion

Hope y’all are as excited about mongoose’s future as I am. There’s lots of exciting ideas that I’m looking forward to getting into mongoose 4.0. You’re more than welcome to add suggestions for new features or behavior changes on Github issues. I’m looking forward to seeing what y’all can come up with for improving mongoose and what y’all will be able to do with future versions.