MongoDB Aggregation
Definitions
Aggregation: Collection and summary of data
Stage: One of the built-in methods that can be completed on the data, but does not permanently alter it
Aggregation pipeline: A series of stages completed on the data in order
Structure of an Aggregation Pipeline
aggregate()
performs an aggregation operation by using an aggregation pipeline.
aggregate()
takes an array of aggregation stages to form the pipeline.
db.collection.aggregate([
{
$stage1: {
{ expression1 },
{ expression2 }...
},
$stage2: {
{ expression1 }...
}
}
])
Using $match
and $group
Stages
$match
The $match
stage filters for documents that match specified conditions and passes those documents to the next stage of the pipeline.
{
$match: {
"field_name": "value"
}
}
When using a $match
stage, place it as early as possible in the pipeline so it can use indexes. Because it filters, it reduces the number of documents, which lessens the amount of processing required.
$group
The $group
stage groups documents by a group key.
The output of the $group
stage is one document for each unique value of the group key.
{
$group:
{
_id: <expression>, // Group key
<field>: { <accumulator> : <expression> }
}
}
$match
and $group
in an Aggregation Pipeline
The following aggregation pipeline finds the documents with a field named "state" that matches a value "CA" and then groups those documents by the group key "$city" and shows the total number of zip codes in the state of California.
db.zips.aggregate([
{
$match: {
state: "CA"
}
},
{
$group: {
_id: "$city",
totalZips: { $count : { } }
}
}
])
Using $sort
and $limit
Stages
$sort
The $sort
stage sorts all input documents and returns them to the pipeline in sorted order.
We use 1 to represent ascending order, and -1 to represent descending order.
{
$sort: {
"field_name": 1
}
}
$limit
The $limit
stage returns only a specified number of records.
{
$limit: 5
}
$sort
and $limit
in an Aggregation Pipeline
The following aggregation pipeline sorts the documents in descending order, so the documents with the greatest pop
value appear first, and limits the output to only the first five documents after sorting.
db.zips.aggregate([
{
$sort: {
pop: -1
}
},
{
$limit: 5
}
])
Using $project
, $count
, and $set
Stages
$project
The $project
stage specifies the fields of the output documents.
1 means that the field should be included, and 0 means that the field should be suppressed.
The field can also be assigned a new value (new value specified for new fields and existing fields being given a new value).
{
$project: {
state:1,
zip:1,
population:"$pop",
_id:0
}
}
It should usually be the last stage because it specifies the exact fields in the output.
$set
The $set
stage creates new fields or changes the value of existing fields, and then outputs the documents with the new fields.
{
$set: {
place: {
$concat:["$city",",","$state"]
},
pop:10000
}
}
This is useful when we want to change existing fields in the pipeline, or add new ones to be used in upcoming pipeline stages without having to specify all the existing fields.
$set
and $project
can both create and assign values to fields, but only $project
can be used to reshape the data.$count
The $count
stage creates a new document, with the number of documents at that stage in the aggregation pipeline assigned to the specified field name.
{
$count: "total_zips"
}
Using the $out
Stage
The $out
aggregation operator writes documents that are returned by an aggregation pipeline into a collection. So, $out
must be the last stage in the pipeline.
$out: {
db: "<db>",
coll: "<newcollection>"
}
If the database or the collection that you specified doesn't exist, it's created.
Another way to use the $out
stage is to provide only the collection name. In this case, the $out
stage uses the same database used in the aggregation.
{ $out: "<newcollection>" }
You should be aware that $out
creates a new collection if it does not already exist. But if the collection exists, $out
replaces the existing collection with new data. So be mindful of what you name this new collection unless you want an existing collection to be overwritten.
MongoDB Aggregation in Node.js
const client = new MongoClient(uri)
const dbname = "bank";
const collection_name = "accounts";
const accountsCollection = client.db(dbname).collection(collection_name);
const pipeline = [
// Stage 1: match the accounts with a balance less than $1,000
{ $match: { balance: { $lt: 1000 } } },
// Stage 2: Calculate average balance and total balance
{
$group: {
_id: "$account_type",
total_balance: { $sum: "$balance" },
avg_balance: { $avg: "$balance" },
},
},
]
const main = async () => {
try {
await client.connect()
console.log(`Connected to the database 🌍. \nFull connection string: ${safeURI}`)
let result = await accountsCollection.aggregate(pipeline)
for await (const doc of result) {
console.log(doc)
}
} catch (err) {
console.error(`Error connecting to the database: ${err}`)
} finally {
await client.close()
}
}
main()