Stem Spanish reviews
You will recall that in a previous chapter we used a language detection package to determine the language of different Amazon product reviews. In this exercise, you will first detect the languages in the non_english_reviews
. The reviews are in multiple languages but you will select ONLY those in Spanish. Feel free to go back to the video discussing foreign language detection if you have forgotten some of the concepts.
In the second step, you will create word tokens from the Spanish reviews and will stem them using a SnowBall stemmer for the Spanish language. The language detection package is not perfect, unfortunately. Therefore, it is possible that sometimes the detected language is not correct.
Este ejercicio forma parte del curso
Sentiment Analysis in Python
Ejercicio interactivo práctico
Prueba este ejercicio completando el código de muestra.
# Import the language detection package
import ____
# Loop over the rows of the dataset and append
languages = []
for i in ____(____(non_english_reviews)):
languages.append(____.____(non_english_reviews.iloc[i, 1]))
# Clean the list by splitting
languages = [str(lang).split(':')[0][1:] for lang in languages]
# Assign the list to a new feature
non_english_reviews['language'] = languages
# Select the Spanish ones
filtered_reviews = non_english_reviews[non_english_reviews.language == 'es']