1. Final tips
All right, we're almost done. In this lesson, we'll just cover some tips that haven't been mentioned throughout the course.
2. Save information
The first tip is saving all the information we can.
To begin, save folds distribution to files. Our goal is to track the validation score during the competition. And of course, this validation score should always be calculated on the same folds.
Another data that we'd like to save is model runs. It will allow us to reproduce our experiments or go back if needed. One of the possibilities could be to create a separate git commit for each model run or submission.
It is also a good idea to save model predictions as well. If we start saving validation and test predictions from the very beginning of the competition, it will allow us to simply build model ensembles near the end. Because we store predictions for the models blending as well as features for the models stacking.
Finally, we should keep a log of models' results to track the performance progress. It could be done as comments to the git commits or as notes in a separate document.
3. Kaggle forum and kernels
Now let's speak about the Kaggle forum and kernels. It's one of the strongest sources of knowledge on Kaggle.
4. Kaggle forum and kernels
Each competition has an open forum where all the participants can start topics sharing their thoughts and ideas, asking questions and so on.
5. Kaggle forum and kernels
Kaggle kernels is another source of knowledge. It represents scripts and notebooks that participants are sharing during the competition. So, we have an opportunity not only to discuss the competition, but also to look at the code.
Moreover, kernels represent a computational environment where we have access to an interactive session running in a Docker container with pre-installed packages, the ability to mount data sources, use GPU resources, and more.
6. Forum and kernels usage
Forum and kernels could bring us lots of benefits during the different competition stages.
Suppose we decided to join some of the current Kaggle competitions. First of all, it is useful to find similar past competitions on Kaggle. Usually, top teams are sharing their approach on the forum once the competition has finished. It allows us to read through the best performing solutions and get to know what could work for the similar problem types.
During the rest of the competition, we should precisely follow all the topics in the forum and the most popular kernels. It allows us to be up-to-date during the competition and learn lots of new ideas and approaches.
Finally, even after the end of the competition, it's time to learn from the top participants. Usually, winners share their solutions a couple of days after the competition finish. It's very valuable information that we should utilize to determine what we could have done better during the competition.
7. Select final submissions
The last few words are devoted to the final submissions. Kaggle competitions have different durations, but generally, it's about 2 or 3 months. As we already know, every day we have a limited number of submissions to the Leaderboard.
So, if we have a 2-month competition with 5 submissions per day, we could make up to 300 submissions to the Public Leaderboard.
8. Select final submissions
However, for the final evaluation on the Private Leaderboard, we have to choose only 2 submissions. We mark them in the list of all submissions made.
9. Select final submissions
And only these are used for the final standings. Our result is the best score out of these two final submissions.
10. Final submissions
The suggested strategy that works pretty well is to select one submission that is the best on the local validation,
and another submission that is the best on the Public Leaderboard.
11. Let's practice!
Let's now review these final tips before saying good-bye!