MongoDB aggregations are easy

Building aggregation pipelines sometimes just seems so hard, especially when you're working with NoSQL database such as MongoDB, since there is no defined schema, and there are just so many nested fields

Well in this article i'm not going to go deep into what aggregation query is, since i think MongoDB Docs explained this perfectly, but instead i will go through a kind of seeming-complex aggregation query, which is you probably going to face if you're currently with MongoDB and i will use this example here to just show you how aggregation pipelines are just logically easy, and it's always depends on how you think while building it

The example

Well for the sake of our example, let's consider that you have this silly products collection which has these 2 documents:

{
  "_id": {
    "$oid": "55c30ff62cfa09af198b465a"
  },
  "name": "Awesome Tshirt",
  "currency": "usd",
  "one_size": "xl",
  "variants": [
    {
      "type": "color",
      "base_sku": 132145,
      "items": [
        {
          "color": "Grey Melange",
          "price": 80,
          "sku": 1243252369
        },
        {
          "color": "Bottle Green",
          "price": 90,
          "sku": 1243252368
        },
        {
          "color": "Deep Charcoal Grey",
          "price": 80,
          "sku": 1243252376
        },
        {
          "color": "White",
          "price": 80,
          "sku": 1243252363
        },
        {
          "color": "Black",
          "price": 80,
          "sku": 1243252362
        }
      ]
    }
  ]
},
{
  "_id": {
    "$oid": "55c30ff62cfa09af198b465c"
  },
  "name": "Hacker Tshirt",
  "currency": "usd",
  "one_size": false,
  "variants": [
    {
      "type": "color",
      "base_sku": 132155,
      "items": [
        {
          "color": "Black",
          "price": 100,
          "sku": 87987963
        }
      ]
    },
    {
      "type": "size",
      "base_sku": 342434,
      "items": [
        {
          "size": "sm",
          "price": 100,
          "sku": 97896796
        },
        {
          "size": "xl",
          "price": 100,
          "sku": 43534534
        },
        {
          "size": "xxl",
          "price": 100,
          "sku": 76576532
        }
      ]
    }
  ]
}

and now our goal is to get the price of all grey T-shirts, so since as you can see that each product has it's price vary based on the variant itself, so we have to get both from variants items item object. Yes in this collection its so silly that somehow you have to choose of either having black T-shirt or XXl tshirt, but not both :D

Stage #1

The very first step when you create a pipeline is to always match your query, this to narrow down querying the whole collection to just a limited number of documents that match your criteria

You always want to narrow down the number of documents you're searching within at the very beginning of the aggregation pipeline, this will lead to faster queries

So let's do this, first we want to use only the documents that has color variant, and it has also grey color inside it's variant items. So this is how we're translating this:

{
        '$match': {
            'variants': {
                '$elemMatch': {
                    'type': 'color', 
                    'items': {
                        '$elemMatch': {
                            'color': /grey/i
                        }
                    }
                }
            }
        }
    }

$elemMatch operator matches documents that contain an array field with at least one element that matches all the specified query criteria.

We're using $elemMatch here to find a variant of type color first, and then we're using it again to find an element of color that contains grey and its case insensitive - notice the regex /grey/i -

So this was our first step, notice that only 1 item will be returned in this case that has ID 55c30ff62cfa09af198b465a, since it is the only one with variants of type color that has grey color

Document 55c30ff62cfa09af198b465c it has variants of type color, but it has only black color

Stage #2

Now we still need to query a nested object that is inside variants (array) and also inside items (array), so its more like this variants -> items -> {color, price}

This seems complicated, and since we're dealing with array of objects here, what could make it easier? .. to just deal with this array as an object instead, so we can just leverage the dot notation syntax in Mongo

Well we can do that by just unwinding the array! simple as that, just flatten this array into objects by using $unwind

$unwind Deconstructs an array field from the input documents to output a document for each element

$unwind is so simple that the only required parameter to pass is path which is the path of the array you want to flatten

{
    '$unwind': {
      'path': '$variants'
    }
  }

Notice the dollar sign before variants, we have to prefix the field name with it so Mongo can interpret it, it just tells Mongo to inject the actual value of variants

Now by then we will have also 1 document still, since variants array has only 1 element

Notice that variants is now object instead of array

Stage #3

Now what? we still need to query variants.items elements which is the same case as variants before we unwind it. So i guess we will have to flatten variants.items too, so next stage will be

{
    '$unwind': {
      'path': '$variants.items'
    }
  }

Notice now that we can easily access items with dot notation, since variants is an object and not an array anymore, now these are the the new documents returned after this stage

items is an object now with 1 document per items element which is exactly what we need, but did you notice something strange?

Stage #4

Now we have documents with items.color value that does not contains grey we have Black, White, and Bottle Green as well, why is that?

Well that would be because our first $match stage was only getting the documents that have items with grey color, having this does not necessary means that it will magically just filter other colors from items, this our job to do now

So now we will need to get only the documents that has variants.items.color with greyish color, looks like another $match query, right?

{
    '$match': {
    'variants.type': 'color', 
    'variants.items.color': /grey/i
  }

It is so simple now, here we're just saying, out of the result of stage 3 we just want the documents which has variants of type color and any item that has grey with case insensitive, that will return us these documents:

Stage #5

Now we have the documents, but we don't really care about all of these properties right? We only care about the color and the price

So since we're having a couple levels of nesting here variants -> items and we only care about price and color properties, then we need to focus/project these properties only and ignore any other stuff we don't need

$project Passes along the documents with the requested fields to the next stage in the pipeline.

$project is so simple also, you'll probably just need to know that:

It takes field name as a key and 1 or 0 as value (1 to show, 0 to hide)

_id is shown by default, unless you specify to hide it

You have to specify at least 1 output field

So our stage implementation will be simple as

{
    '_id': 0,
    'color': '$variants.items.color',
    'price': '$variants.items.price'
  }

Notice that we passed '_id': 0 because we don't really care about the document ID - at least not in this example, normally you'll need it though - so we just hid it

So now this will be the final result

The aggregation query

db.getCollection('products').aggregate([
  {
    '$match': {
      'variants': {
        '$elemMatch': {
          'type': 'color', 
          'items': {
            '$elemMatch': {
              'color': new RegExp('grey', 'i')
            }
          }
        }
      }
    }
  },
  {
    '$unwind': {
      'path': '$variants'
    }
  },
  {
    '$unwind': {
      'path': '$variants.items'
    }
  },
  {
    '$match': {
      'variants.type': 'color', 
      'variants.items.color': new RegExp('grey', 'i')
    }
  },
  {
    $project:  {
      '_id': 0,
      'color': '$variants.items.color',
      'price': '$variants.items.price'
    }
  }
])

Final words

As you can see it's pretty straight forward, and it's quite easy and seems quite logical too, building aggregation pipelines stages is just like actually talking to rubber duck while coding, instead this time you're just talking to Mongo shell

This post was originally published on my blog blog.mrg.sh