Get startedGet started for free

Special topics in Machine Learning

1. Special topics in Machine Learning

Previously, we learned about two general areas of Machine Learning: Supervised Machine Learning and Unsupervised Machine Learning. In this lesson, we'll investigate two special fields of machine learning: time series prediction and natural language processing.

2. Time series forecasting

Time series forecasting refers to any type of Supervised Machine Learning where time is an important feature. A good time series forecast will account for recent behavior as well as weekly, monthly, or yearly trends.

3. Seasonality

Time series forecasting can help us catch periodic events, known as "seasonality". Seasonality can happen on any timescale. For example, television viewership is lower on Friday nights because many folks choose to go out rather than stay in and watch TV. This is a weekly trend. Certain spending can spike at the end of the month when people receive a paycheck. This is a monthly trend. Ice cream sales are lower in the winter because people don't like to eat cold food when it's cold outside. This is an annual trend.

4. Natural Language Processing

Another special field of machine learning is Natural Language Processing, also known as NLP. NLP refers to any machine learning problem where the dataset is text. Possible inputs include customer reviews, Tweets, medical records, or email subjects. Understanding text is difficult to define and more difficult to do in practice, but NLP can accomplish many simpler tasks, such as classifying the sentiment of customer reviews or clustering medical records with similar pathologies. Successful NLP depends on having a specific question, and creating a good set of features from the input text.

5. Word counts

Previously, the features for our machine learning problems have been numbers or categories. What do we do when our data is text? A simple option is to count the number of times important words appear in each piece of text. Suppose we wanted to analyze the following two sentences: "The Texans are a great football team" and "The Giants are a great football team". We might end up with the word counts shown in the table.

6. Problems with word counts: negation

Although word counts are commonly used in NLP, there are a few obvious limitations. First, word counts don't take into account negation. Consider the sentence "The Giants are not a great football team". Although "great" is present in this sentence, "not" means that we don't actually mean "great".

7. Word counts and synonyms

Another problem is that word counts don't help us consider synonyms. For example, there are many words that all mean "blue", such as "sky-blue", "aqua" and "cerulean". Ideally, we would like to group these as a single feature.

8. Word embeddings

One solution to these problems is Word Embeddings. It is a special way of creating features that group together similar words. Word embeddings would create similar features for various shades of blue. Word embeddings have another interesting property: they are mathematical representations of words that obey intuitive rules. For example, in word embeddings, if we take the features for "King", subtract the features for "man", and add the features for "woman", we get a set of features that are very close to those of "queen".

9. Review

Let's review what we've learned. Time series forecasting is a special area of machine learning where time is an important feature. Time series forecasting helps us account for periodic trends in our data, called Seasonality. Another special area is Natural Language Processing (or NLP), which uses text as input data. Two important ways of turning text into features are word counts, which are simple but imprecise, and word embeddings, which are difficult to implement, but can be more precise.

10. Let's practice!

Now that we've learned about time series forecasting and NLP, let's practice!