Exercise

Stem Spanish reviews

You will recall that in a previous chapter we used a language detection package to determine the language of different Amazon product reviews. In this exercise, you will first detect the languages in the non_english_reviews. The reviews are in multiple languages but you will select ONLY those in Spanish. Feel free to go back to the video discussing foreign language detection if you have forgotten some of the concepts.

In the second step, you will create word tokens from the Spanish reviews and will stem them using a SnowBall stemmer for the Spanish language. The language detection package is not perfect, unfortunately. Therefore, it is possible that sometimes the detected language is not correct.

Instructions 1/2

undefined XP
    1
    2
  • Import the langdetect package.
  • Iterate over the rows of the non_english_reviews using the len() method and range() function.
  • Use detect_langs() to detect the language of each review in the for loop.