1. Something Extra: $addFields to Aid Analysis
It's time to round out our aggregation know-how and wrap up the course. In this lesson, we'll learn how to add fields in a pipeline without having to project existing fields.
2. A somber $project
For Nobel laureates that have died, I want to know the number of years they were alive.
I can start a pipeline to compute this by projecting out the "died" and "born" fields. Skimming the MongoDB documentation, I found a handy operator, dateFromString. This will help us subtract the date of birth from the date of death.
But wait! Some laureate documents have an invalid date of all zeroes. Why? This encodes that their date of birth is not recorded.
To overcome this, let's insert a match stage at the start of our pipeline. Now, we include only laureates with reasonable years of both birth and death.
Darn! It looks like some laureates have only their year of birth recorded. How can we accommodate this?
3. $split$ and $cond$-itionally correct (with $concat)
Here's one way we can choose to accommodate a date of birth that is only a year.
First, we can use a new stage, addFields, to provide new array fields split on the hyphen in the date strings. This gives us year, month, and day as the array elements. Why use addFields rather than project? Simple. We do not need to specify all the other fields we want to pass along in the pipeline.
This enables us to use the existing born field in this next stage, also an addFields stage. Here, I re-write the born field if the string value zero-zero is in the bornArray. I fix it to be a real date by concatenating the year element of bornArray with the string suffix for January 1st. The conditional expression operator, cond, is a ternary operator. It evaluates the first expression, and, if it's true, returns the value of the next expression. Otherwise, it returns the value of the third expression.
Now, at last, we are able to compute the number of years each laureate was alive.
4. A $bucket list
Now, let's compute the number of years between the died and born dates. I show only the last stage of our pipeline so far.
First, we subtract the dates. This produces a value in milliseconds. Next, we divide by the approximate number of milliseconds in an average year. Finally, we floor the value to the nearest integer. At this point I'd verify that this stage works. I may add a limit stage to inspect a few output documents.
I want to show you one last operator to get a sense of the distribution of "year" values across laureates. MongoDB's bucket operator groups values into buckets defined by a sequence of boundaries.
Here, we see that one laureate died before the age of 40, and two lived to be over a hundred years old!
5. Practice $addFields
Let's solidify your understanding of the addFields stage. I'll be sure to fold in some of what you learned before.