Get startedGet started for free

Aggregation Operators and Grouping

1. Back to Counting:

In the last lesson, we learned how to translate the implicit stages of a query to aggregation stages. Now, let's dip our toes into more-advanced aggregation capabilities.

2. Field paths

Aggregation stages can use expressions that contain field paths. To see this in action, first let's clarify some terminology. An expression object has the form "field1, expression1, dot dot dot". It's what you pass to an aggregation stage. Here, we pass an expression object to a "project" stage. The object has one key, "prizes-dot-share", with a corresponding expression value of 1. In contrast, here we project a field that we call "n_prizes". The field takes the value of the expression "dollar-size maps to dollar-prizes". The string dollar-prizes is a field path. It takes the value of the prizes field for each document processed at that stage by the pipeline. Note that you can create new fields, or overwrite old ones, during aggregation.

3. Operator expressions

The other new concept here is the operator expression, which treats an operator as a function. The expression applies the operator to one or more arguments and returns a value. Here, the size operator take the field path dollar-prizes as an argument. Thus, the expression object assigns the field n-prizes to the size of the prizes array. We could also write the operator expression as taking a list of one element, and we get the same result. For convenience, when an operator only has one parameter, we can omit the brackets as above.

4. One more example: a multi-parameter operator

Many operators available in query filters have counterparts for aggregation. For example, here I use the dollar-in operator, which takes two parameters. To get the array of prize shares for a laureate, I use a field path. I then project a new field, "solo winner", which is true if and only if the array of prize shares contains the string value "1".

5. Implementing .distinct()

Now we know a bit about expressions and field paths. Let's translate the "distinct" collection method to an aggregation. Here I use a new stage, dollar-group. A group stage takes an expression object that must map the underscore-id field. As for any MongoDB document, the underscore-id field must be unique. In this case, each output document will have as its id a distinct value of the bornCountry field. All bornCountry values get captured because no match stage precedes the group stage. Thus, our list comprehension collecting id values collects all distinct bornCountry values. This includes the value None, which happens when a field is not present in a document.

6. How many prizes have been awarded in total?

Let's combine a group stage with a project stage. How many prizes has the Nobel committee awarded? The project stage is familiar to us from a few slides back, but what about this group stage? The underscore-id gets mapped to None for every document. This means one and only one document will emerge from the group stage. This one document maps a new field, n-prizes-total, to an operator expression. Some operators, like dollar-sum here, act as accumulators in a group stage. This means they don't operate only on one document. Rather, they have state and will accumulate a value as one document after another of a group gets passed to it. Here, we compute the sum of lengths of all prizes arrays across all laureates. We do this without sending a single laureate document down the wire. Aggregations like this can save a lot of time and bandwidth for very large collections.

7. Let's practice!

Okay, time to practice using field paths, operator expressions, and group stages for aggregation.