MongoDB Aggregation

·

4 min read

Definitions

  • Aggregation: Collection and summary of data

  • Stage: One of the built-in methods that can be completed on the data, but does not permanently alter it

  • Aggregation pipeline: A series of stages completed on the data in order

Structure of an Aggregation Pipeline

aggregate() performs an aggregation operation by using an aggregation pipeline.

aggregate() takes an array of aggregation stages to form the pipeline.

db.collection.aggregate([
    {
        $stage1: {
            { expression1 },
            { expression2 }...
        },
        $stage2: {
            { expression1 }...
        }
    }
])

Using $match and $group Stages

$match

The $match stage filters for documents that match specified conditions and passes those documents to the next stage of the pipeline.

{
  $match: {
     "field_name": "value"
  }
}

When using a $match stage, place it as early as possible in the pipeline so it can use indexes. Because it filters, it reduces the number of documents, which lessens the amount of processing required.

$group

The $group stage groups documents by a group key.

The output of the $group stage is one document for each unique value of the group key.

{
  $group:
    {
      _id: <expression>, // Group key
      <field>: { <accumulator> : <expression> }
    }
 }

$match and $group in an Aggregation Pipeline

The following aggregation pipeline finds the documents with a field named "state" that matches a value "CA" and then groups those documents by the group key "$city" and shows the total number of zip codes in the state of California.

db.zips.aggregate([
{   
   $match: { 
      state: "CA"
    }
},
{
   $group: {
      _id: "$city",
      totalZips: { $count : { } }
   }
}
])

Using $sort and $limit Stages

$sort

The $sort stage sorts all input documents and returns them to the pipeline in sorted order.

We use 1 to represent ascending order, and -1 to represent descending order.

{
    $sort: {
        "field_name": 1
    }
}

$limit

The $limit stage returns only a specified number of records.

{
  $limit: 5
}

$sort and $limit in an Aggregation Pipeline

The following aggregation pipeline sorts the documents in descending order, so the documents with the greatest pop value appear first, and limits the output to only the first five documents after sorting.

db.zips.aggregate([
{
  $sort: {
    pop: -1
  }
},
{
  $limit:  5
}
])

Using $project, $count, and $set Stages

$project

The $project stage specifies the fields of the output documents.

1 means that the field should be included, and 0 means that the field should be suppressed.

The field can also be assigned a new value (new value specified for new fields and existing fields being given a new value).

{
    $project: {
        state:1, 
        zip:1,
        population:"$pop",
        _id:0
    }
}

It should usually be the last stage because it specifies the exact fields in the output.

$set

The $set stage creates new fields or changes the value of existing fields, and then outputs the documents with the new fields.

{
    $set: {
        place: {
            $concat:["$city",",","$state"]
        },
        pop:10000
     }
  }

This is useful when we want to change existing fields in the pipeline, or add new ones to be used in upcoming pipeline stages without having to specify all the existing fields.

💡
$set and $project can both create and assign values to fields, but only $project can be used to reshape the data.

$count

The $count stage creates a new document, with the number of documents at that stage in the aggregation pipeline assigned to the specified field name.

{
  $count: "total_zips"
}

Using the $out Stage

The $out aggregation operator writes documents that are returned by an aggregation pipeline into a collection. So, $out must be the last stage in the pipeline.

$out: {
    db: "<db>",
    coll: "<newcollection>"
}

If the database or the collection that you specified doesn't exist, it's created.

Another way to use the $out stage is to provide only the collection name. In this case, the $out stage uses the same database used in the aggregation.

{ $out: "<newcollection>" }

You should be aware that $out creates a new collection if it does not already exist. But if the collection exists, $out replaces the existing collection with new data. So be mindful of what you name this new collection unless you want an existing collection to be overwritten.

MongoDB Aggregation in Node.js

const client = new MongoClient(uri)
const dbname = "bank";
const collection_name = "accounts";
const accountsCollection = client.db(dbname).collection(collection_name);
const pipeline = [
  // Stage 1: match the accounts with a balance less than $1,000
  { $match: { balance: { $lt: 1000 } } },
  // Stage 2: Calculate average balance and total balance
  {
    $group: {
      _id: "$account_type",
      total_balance: { $sum: "$balance" },
      avg_balance: { $avg: "$balance" },
    },
  },
]
const main = async () => {
  try {
    await client.connect()
    console.log(`Connected to the database 🌍. \nFull connection string: ${safeURI}`)
    let result = await accountsCollection.aggregate(pipeline)
    for await (const doc of result) {
      console.log(doc)
    }
  } catch (err) {
    console.error(`Error connecting to the database: ${err}`)
  } finally {
    await client.close()
  }
}

main()