Competitions overview

1. Competitions overview

Hi all! Welcome to the course on Kaggle competitions! In this course, you will develop the overall pipeline for successful participation in Machine Learning competitions. Also, you will learn some practical tips and tricks that can be used in any Machine Learning project.

2. Instructor

I will be your instructor for this course. My name is Yauhen Babakhin. I have a Master’s Degree in Applied Data Analysis and over 5 years of working experience in Data Science. I'm also a Kaggle competitions Grandmaster having gold medals in both classic Machine Learning and Deep Learning competitions.

3. Kaggle

First of all, let's discuss what Kaggle actually is. Kaggle is a web platform for Data Science and Machine Learning competitions. It allows us to solve Data Science challenges and compete with other participants in building the best predictive models.

4. Kaggle benefits

The list of Kaggle benefits is pretty long. Note that this platform could be useful for everyone: from beginners in Data Science to experienced professionals. We could get practical skills working with the real-world datasets, develop own pet projects, meet and grow with a great Kaggle community, get experience in new domain or model type, and also, keep up-to-date with the best performing machine learning methods.

5. Competition process

The general competition process consists of three major stages. Firstly, Kaggle gives us a problem definition, and data to resolve this problem.

6. Competition process

Then, we're developing a Machine Learning model and preparing the submission file that is uploaded to Kaggle.

7. Competition process

Finally, our submission is shown on the so-called "Leaderboard" together with the position relative to other competitors.

8. How to participate

To start competing on Kaggle, we should perform three simple steps. Firstly, go to the Kaggle website and select any active competition we're interested in. Then, download the data available in the competition. That's it! Now, we're ready to start exploring the data and build Machine Learning models.

9. New York city taxi fare prediction

As an example, we will work with a past Kaggle playground competition called New York city taxi fare prediction. The goal of this challenge is to predict the fare amount for a taxi ride in New York City given the pickup and dropoff locations.

10. Train and Test data

The typical data structure in Kaggle competitions consists of two major parts: train and test datasets. Our goal is to prepare a model on the train dataset given some labels. Afterwards, we should make predictions on the test set. Let's read the train dataset from New York taxi competition using pandas library and look at the columns available there. The first column is an ID variable called 'key'. The 'fare_amount' is a target variable we'd like to predict. And the rest of the columns are features we could use to build the model. Now, let's move on to the test set. It has the same list of columns except for the 'fare_amount', as this is the column we should predict.

11. Sample submission

After we have built a model, we could make predictions on the test set and save them as a .csv file. This .csv file could be submitted to Kaggle. Every Kaggle competition provides a sample submission file. This file shows the correct format and structure of the submission. Let's take a look at the head of the sample submission in the taxi fare prediction challenge. As expected, it consists of two columns: the ID column and 'fare_amount' we're predicting.

12. Let's practice!

All right, now let's explore train and test datasets from another Kaggle competition.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.