1. Real-world use case: streaming music service
Nice work on the previous lesson! In the previous chapters we covered the concepts behind streaming data processes, and in the previous lesson we just looked at some popular tools. For the remainder of this course, we will focus on some real-world examples. Let's begin with a streaming music service.
2. Streaming music
When we look at any of these use cases, we want to make sure to consider the scenario and what we're actually interested in. This means having an idea of the data or processes we want to monitor before we design our system.
Most importantly for our analysis, we are not focusing on the actual music data being streamed. We'll discuss why in a moment.
We are more interested on the actual users of our service, or what information they're providing us versus what data we send to them.
This includes the user's interactions with our service,
their music preferences,
and some other details we'll cover shortly.
3. Interactions
The analytics department is mainly interested in a few primary questions of
what,
when,
and where. This means what did the user do, when did they do it, and where (in the app) did they do so?
This can include things such as when the user selects like or don't play, or when they change their selection.
It could include options for next, previous, or skip.
We may be interested in a user selecting a given channel or playlist.
We might also wish to know when they add or remove songs from a playlist.
4. How to store data
Now that we've discussed some of our data interests, let's consider how we're going to store this information.
Our best option is to archive the data in a log format. It's fairly straightforward, simple, allows for multiple file formats, and should scale reasonably well.
Note that the number of interactions will vary considerably between users - some will change songs constantly, others may launch a playlist and let it play throughout the day. We also do not know when any given user will start using our service, nor when they'll stop.
The logged data then allows for further analysis outside of the pipeline (such as in batched analytic tasks.)
5. Analytics
You may notice that some of the data we're interested in was not included directly in the logs.
This includes some of the users' preferences,
but it can actually be obtained from our logged data.
This can include the favorite genres, bands, and so forth.
We can also determine the user's favorite time of day to use our service. This information is all logged based on the interactions and the supporting details.
We also mentioned some other details of interest:
this includes determining the most popular app platform, app and platform versions, devices, etc.
We'd also love to analyze the location data provided by the users (from the app / IP addresses) and determine what insights we can glean.
6. Let's practice!
We just had a quick look at the data a streaming music service would produce and how it could be analyzed. Let's consider this scenario further in the exercises ahead.