Enforcing a schema

1. Enforcing a schema

Welcome! In this final chapter, we’ll start by exploring how we can enforce a schema. Let’s see why this matters.

2. From flexible to validated schema

MongoDB gives you the freedom to store documents without a predefined schema. This is really convenient when you're building your application from scratch and still figuring out how you'll store data: MongoDB won't get in the way. But people make mistakes, like typos or missing fields. For example, here we accidentally use the field year instead of release_year. MongoDB allows it, but when we later query using the correct field name, nothing is returned, and we get no warning. Oops. That's why, once you're clear on the structure of your documents, it's important to set up some validation to prevent mistakes and inconsistencies. We'll explore two ways to do this: using pydantic, and using MongoDBs built-in schema validation.

3. Enforce a schema with pydantic

pydantic is a popular data validation library. We can use pydantic's BaseModel to create a Movie class and specify what each document should look like with Python's type annotations. Every movie should have a title that is a string, genre should be a list of strings, and so on. won_oscar is an optional field that should be boolean if it's set. With this BaseModel, we define a blueprint, or schema, for every movie document.

4. Inserting with typed data models

Before we defined new movies with a dictionary like this, but that didn't perform any checks on the data format. Let's now use our Movie model to define the new movie. This will verify that all specified fields conform with the schema. That's not the case here! We used year rather than release_year, so pydantic throws an error. This is exactly what we want: this way typos and missing fields are caught before we insert them into the collection!

5. Fixing our mistake

Let's fix our mistake by adjusting the new movie definition to use release_year now. Works! We can insert the movie into the collection with peace of mind now. Notice I had to wrap new_movie in a dict() function to convert it back to a regular dictionary. Also notice how I didn't set won_oscar, but that's not a problem, because in the schema definition it is marked as Optional.

6. MongoDB's built-in schema validation

OK, so pydantic is great, but did you know that MongoDB also offers built-in schema validation? You can configure it when creating the collection, as shown here for the movies_v2 collection. Can you decipher the schema definition? It's the perfect equivalent of what we did with pydantic before, but this time using MongoDB's JSON Schema validation.

7. Testing MongoDB's built-in schema validation

Let's put it to the test and try to insert our ill-defined movie, that used year instead of release year. As expected, we get an error because the release_year is missing while it's required. The main difference with pydantic-based validation is that the validation we defined here happens in the database itself, rather than in your Python code. This can be particularly useful when you have multiple applications accessing the same database, or when you want to ensure data consistency regardless of how the data is being inserted.

8. Summary

To summarize, you can use application-side validation like pydantic's BaseModel, but you can also use database-side validation like MongoDB's built in schema validation. Both are great to prevent mistake and enforce structure.

9. Let's practice!

Let’s put schema validation into action!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.