Get Started

Data serving

1. Data serving

Now, let's start by understanding what data serving and the serving layer is.

2. Serving layer

The serving layer normally refers to where we're going to store our processed data and the protocols or mechanisms we provide to our consumers to access such data. So, after our data is processed and has gone through data quality checks, enrichment, and other processes, we're ready to present it to our consumers. Thus, we will need to identify how it will be consumed to find a proper way to store and serve it.

3. What data do we have?

To create a good serving layer, we need to understand what data we have and how the business will consume it. For instance, is our data structured? If so, we could decide to use a data warehouse to take advantage of the structure itself and enable use cases like dashboarding, BI, or querying. Thus, a data warehouse as part of the serving layer is a great decision! However, what happens if we have unstructured data? Or if our data is a time series? There are better technologies for such use cases, such as blob storage, a time series database, or even a NoSQL database. It's important to note that a serving layer is not a single data store but a set of them.

4. How will data be consumed?

Now that we know our data, we need to understand how it's going to be consumed. Are we going to build machine learning models? Which applications will consume our data? And will these applications request summarized information? Or maybe request individual records? Not every data storage system is a good option for every query. That's why it's important to address these questions, as, for instance, if applications frequently request individual records, let's say, ask for just one specific user's data, a data warehouse may get exhausted and produce poor performance. Thus, we probably need to consider enabling an RDBMS, even if this means replicating data and exposing it via an API so it can handle such traffic.

5. Serving your data depending on your use case

Let's look at different systems and the use cases that better fit them. Starting with data warehouses. We already know that they are good with structured data and that we can use them for BI, reporting, and querying. Next, we have blob storages, which will allow us to store all types of data. They are also great for archiving data that we do not plan to access frequently, due to their lower cost. The next options, NoSQL and relational databases, are storages that we have already discussed in previous videos, so let's look at their role in the serving layer. For instance, creating a single source of truth that transactional applications will consume by searching for individual records. These applications may connect using an API that will expose the data via a defined protocol, usually HTTP, so applications can get data in a well-defined way.

6. Serving vs. Consuming

Finally, it's important to differentiate between serving and consuming. Even though we're considering how data is consumed to design our serving layer, we do not care how the consumption is actually happening. Thus, the serving layer does not care which exact dashboarding tool or machine learning model we will build. It cares about being able to get the data we need in the best possible way. Additionally, it can also help optimize performance by caching frequently accessed data and using indexing to speed up queries.

7. Let's practice!

Now that you know how to serve your data, let's practice!