Get startedGet started for free

Zoom into Array Fields

1. Zoom into Array Fields with $unwind

Documents can have array-valued fields, and aggregation stages can introduce them. In this lesson, we'll learn a tool to access array elements during aggregation.

2. Sizing and summing

Let's say we want the number of laureates for each prize. One way to do this is to project a field using the dollar-size operator. We can then add a stage to group by prize category, producing a count of laureates per category. I remove the projection of year in this second pipeline, as there is no need for it. Then, I reset the n_laureates field to be the sum of n_laureates values over each category. Finally, I sort by descending count.

3. How to $unwind

How might we use individual elements of the laureates array? One powerful option is the dollar-unwind stage. This outputs one pipeline document per array element. Here, we unwind the laureates field across three documents.

4. Renormalization, anyone?

We can use stages following an unwind to recompress data. What if we want to normalize our data and track only laureate ids for each prize? After all, we can fetch more information from the laureates collection. Here, we get a list of laureate ids for each prize. After unwinding the laureates array, we project year, category, and laureate id. Year and category together identify a prize. So, we can group by a concatenation of those values. I use the addToSet operator in the group stage to collect laureate ids for each prize, and there you have it. I could also have grouped by underscore-id. But, the category-year combo is more readable, and I introduced you to a new operator!

5. $unwind and count 'em, one by one

Here's another way to understand the unwind operator. Before, we used the size operator to project the number of laureates per prize. This projection fed into a group stage to output counts by category. Instead of projecting sizes and summing over them, we can unwind and count documents. The group stage here counts the documents per category fed to it by the unwind stage. The two pipelines shown produce the same result.

6. $lookup

Finally, let's see a stage that often accompanies unwinding: dollar-lookup. This stage pulls in documents from another collection via what's termed a left outer join. Let's collect countries of birth for economics laureates. From the prizes collection, we first unwind the laureates array. Each pipeline document now has a single laureates-dot-id. Then, we query the laureates collection for documents with the same value for id. For each one we find, we push it into an array we name "laureate bios". Next, we collect the distinct laureate bornCountry values. We want to feed single bornCountry values, not arrays, to the $addToSet operator. Hence, we unwind before the group stage. Is there an easier way to do this? Sure! MongoDB doesn't enforce a normalized schema. Thus, you can tailor a collection's schema to support query simplicity and efficiency. We know the laureates collection stores info on prize categories as well. So, this one-liner produces the same result as the five-stage aggregation pipeline above. Even so, it's good to know that you can perform server-side joins in a pinch.

7. Time to unwind... with exercises!

Sometimes, it feels good to unwind. Let's practice.