Get startedGet started for free

Survey Distinct Values

1. The distinct() method

In this lesson, we'll learn to use the "distinct" method on Mongo collections. Using this method, we can collect the set of values assigned to a field across all documents.

2. An exceptional laureate

We found an exceptional laureate in the last chapter. This laureate has received three Nobel prizes, more than any other laureate. Here we can see that this is the International Committee of the Red Cross. You may not have known that organizations can win Nobel prizes, but they have. For example, 23 employees of Lawrence Berkeley Lab shared the Nobel Peace Prize in 2007. They were part of the Intergovernmental Panel on Climate Change. A future exercise will be about the proportion of Nobel prizes awarded to immigrants. Keep in mind that the idea of immigration doesn't apply to some laureates. In this document, I see that the "gender" field has a value of "org", presumably short for "organization". What are the values that this field stores across documents in this collection? MongoDB provides a built-in collection method for such aggregation.

3. Using .distinct()

Here, we call the "distinct" method on the laureates collection. We pass a single argument, "gender". MongoDB collects the distinct values that this field takes across the collection. We see that there are three and only three values for the "gender" field across the collection. You may be wondering where this method comes from, or how you can define a similar operation yourself. The "distinct" method is a convenience for a common aggregation. The "count_documents" method we have been using is a similar convenience. An aggregation processes data across a collection and produces a computed result. In the last chapter of this course, we'll learn how to create custom aggregations. You may be wondering about the efficiency of aggregations in MongoDB. You can register so-called "indexes" on fields for MongoDB to maintain. These indexes can ensure efficient queries and aggregations. In some cases, a query might not even need to run on a collection. We will learn how to create indexes in the next chapter. But, if we're not working with a lot of data, indexes are generally not needed. The laureates collection we're using in this course fits in memory. It weighs in at under a megabyte and has on the order of a thousand documents or fewer. It doesn't matter much if you use an inefficient algorithm to sort a list of a few hundred items. Likewise, a full collection scan isn't a big deal for this aggregation.

4. .distinct() with dot notation

You can use dot notation to specify fields embedded deeper than the root level of a document. This applies in query methods like "find" and "find_one", and it applies for aggregations as well. I notice here that each subdocument in the "prizes" array field has a "category" field. The dot-two in the filter denotes index two of an array field. Thus, this is a laureate where a third element exists in the prizes array. Let's fetch the distinct values of this field. We see, as expected, that there is a value for each category of Nobel prize.

5. Let's practice!

Let's use the distinct method to answer some questions about our Nobel Prize data.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.