Session Ready
Exercise

word2vec

You have been web-scrapping a lot of job titles from the internet and are unsure if you need to scrap additional job titles for your analysis. So far, you have collected over 13,000 job titles in a dataset called job_titles. You have read that word2vec generally performs best if the model has enough data to properly train, and if words are not mentioned enough in your data, the model might not be useful.

In this exercise you will test how helpful additional data is by running your model 3 times; each run will use additional data.

Instructions 1/3
undefined XP
  • 1
    • Using 33% of the available data, print a list of synonyms for the word teacher.
    • 2
      • Update the code to use 66% of the available data.
    • 3
      • Update the code to use 100% of the available data.