Get startedGet started for free

Stem Spanish reviews

You will recall that in a previous chapter we used a language detection package to determine the language of different Amazon product reviews. In this exercise, you will first detect the languages in the non_english_reviews. The reviews are in multiple languages but you will select ONLY those in Spanish. Feel free to go back to the video discussing foreign language detection if you have forgotten some of the concepts.

In the second step, you will create word tokens from the Spanish reviews and will stem them using a SnowBall stemmer for the Spanish language. The language detection package is not perfect, unfortunately. Therefore, it is possible that sometimes the detected language is not correct.

This exercise is part of the course

Sentiment Analysis in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the language detection package
import ____

# Loop over the rows of the dataset and append  
languages = [] 
for i in ____(____(non_english_reviews)):
    languages.append(____.____(non_english_reviews.iloc[i, 1]))

# Clean the list by splitting     
languages = [str(lang).split(':')[0][1:] for lang in languages]
# Assign the list to a new feature 
non_english_reviews['language'] = languages

# Select the Spanish ones
filtered_reviews = non_english_reviews[non_english_reviews.language == 'es']
Edit and Run Code