1. Machine learning and data pyramid
Hi there, and welcome to the first lesson of the machine learning for business course! My name is Karolis and I lead a machine learning team at Amazon. I am excited to have you here.
2. Machine Learning applications
Machine learning is applying different methods on data to achieve three different things. One is draw causal insights answering the "Why?" questions. The second is to be able to predict future events better, and finally understand patterns in data. Let's see examples for each. A good causal insight project would be trying to understand what is causing the customers to cancel their subscription - we might hypothesize that satisfaction, content quality, selection might play a role, but there are hundreds of data points and it's impossible conclude by just looking at data. In contrast, the example for prediction is identifying WHICH customers are likely to cancel the subscription. In this case we are only interested in identifying customers at risk. Finally, pattern discovery is very important - a good example is customer segmentation where we uncover similar customer groups who behave in a similar way, and can be used to customize marketing and other activities.
3. Data hierarchy of needs
Now, let me introduce you to the concept of data hierarchy of needs, also called a data needs pyramid. This is the generalization of the data needs and their priority.
4. Collection
The basic level starts with data collection - the organization needs to invest into its infrastructure and systems, and ensure that required data is captured and extracted from them.
5. Storage
Next, the organization needs to store this data, hence it needs to invest in reliable and accessible data storage.
6. Preparation
After that, the organization needs to focus on organizing and cleaning the data so it can be used for insights and other use cases. This includes outlier detection, data quality processes and other mechanisms to ensure the data reflects reality.
7. Analysis
Then, these datasets can be analyzed to get an understanding of the business trends - in general, and at a cohort, geographic, demographic and other levels, get insights into distributions (for example - how many of our customers are in the top 10% spenders vs bottom 10%). These activities result in dashboards, business scorecards, adhoc and in-depth analyses of the business.
8. Model prototyping and testing
Once the business understands the data and trends well, it's time to start building machine learning models to gain insights into causal drivers of the desired outputs (like customer satisfaction, high spend and retention) and run experiments to make sure they are actionable. Also, in this step we build prototype models to predict desired outputs and run experiments to make sure they can be used in driving the metrics up e.g. running customer retention campaigns based on churn prediction model, targeting customers at risk with certain incentives to reduce their probability of churn.
9. ML in production
And finally, once the models are tested and confirmed that they work, the business focuses on automating them and deploying systems like CRM, website, mobile application and other tools into production.
10. Focus
In this course, we will mostly focus on understanding the last two steps in the data pyramid. It's worth mentioning, that every part of the pyramid is being performed simultaneously - the first three steps ensure that the data is being captured, is correct and can be used by the last three business application steps. If the first three fail - we get the so-called "garbage in garbage out" effect where no matter how sophisticated the analysis or machine learning model, if that data is incorrect, the outputs will also be wrong and potentially introduce very expensive mistakes.
11. Let's practice!
Lots of information to digest! Let's test our knowledge!