Representación TF-IDF de opiniones de producto

Estás colaborando con un equipo de atención al cliente en una empresa de domótica. Han recopilado opiniones de usuarios sobre varios dispositivos inteligentes y quieren identificar qué palabras destacan en cada reseña. Les propones usar la técnica TF-IDF para resaltar los términos más relevantes en cada entrada de feedback. ¡Vamos a ayudarles a empezar!

Ya tienes precargada una función preprocess() que recibe un texto y devuelve una versión procesada. Esta función aplica conversión a minúsculas, tokenización y eliminación de puntuación. Pandas se ha importado como pd, y la clase TfidfVectorizer está lista para usar.

Este ejercicio forma parte del curso

Natural Language Processing (NLP) en Python

Instrucciones del ejercicio

Inicializa un vectorizer TF-IDF.
Transforma las reseñas limpias en una tfidf_matrix.
Crea un DataFrame df para la tfidf_matrix, usando las palabras del vocabulario como columnas.

ejercicio interactivo práctico

Prueba este ejercicio completando este código de ejemplo.

reviews = ["The smart speaker is incredible. Clear sound and fast responses!",
           "I am disappointed with the smart bulb. It stopped working in a week.",
           "The thermostat is okay. Not too smart, but functional."]
cleaned_reviews = [preprocess(review) for review in reviews]

# Initialize the vectorizer
vectorizer = ____
# Transform the cleaned reviews
tfidf_matrix = ____
# Create a DataFrame for TF-IDF
df = pd.DataFrame(
  tfidf_matrix.toarray(),
  columns=vectorizer.____
)
print(df.head())

Editar y ejecutar código

Este ejercicio forma parte del curso

Natural Language Processing (NLP) en Python

IntermedioNivel de habilidad

4.9+

Empieza el curso gratis

Learn the essentials of text processing in Natural Language Processing (NLP). Master techniques such as tokenization, stop word and punctuation removal, and text normalization with lowercasing, stemming, and lemmatization to prepare text data for further analysis and insight extraction.

Exercise 1: Introduction to natural language processing Exercise 2: Sentence and word tokenization Exercise 3: NLP workflow Exercise 4: Stop words and punctuation handling Exercise 5: Removing stop words Exercise 6: Removing punctuation Exercise 7: Text normalization techniques Exercise 8: Lowercasing Exercise 9: Stemming Exercise 10: Lemmatization

Transform raw text into powerful numerical features. Create Bag-of-Words and TF-IDF representations to capture word importance across documents, then explore word embeddings like Word2Vec and GloVe to uncover deep semantic patterns. Visualize frequency, relevance, and similarity to bring your text data to life.

Exercise 1: Representación Bag-of-Words Exercise 2: Construir el vocabulario a partir de reseñas de clientes Exercise 3: Transformar texto en números con BoW Exercise 4: Análisis de frecuencia de reseñas de productos Exercise 5: Visualizar frecuencias de palabras Exercise 6: Vectorización TF-IDF Exercise 7: Representación TF-IDF de opiniones de producto

Ejercicio actual

Exercise 8: Comparando las representaciones BoW y TF-IDF Exercise 9: Embeddings Exercise 10: Explora relaciones entre palabras con embeddings Exercise 11: Visualizar y comparar word embeddings

Harness the power of pre-trained models to perform advanced text classification tasks. Use Hugging Face pipelines for sentiment analysis, topic classification, and natural language inference. Evaluate semantic similarity and grammatical correctness with state-of-the-art models, all without building anything from scratch.

Exercise 1: Hugging Face pipelines for sentiment analysis Exercise 2: Analyzing the sentiment of a review Exercise 3: Batch classifying multiple reviews Exercise 4: Comparing models on labeled review data Exercise 5: Zero-shot classification and QNLI Exercise 6: Zero-shot classification of support tickets Exercise 7: Does the text answer the question?Exercise 8: Question similarity and grammatical correctness Exercise 9: Detecting duplicate questions Exercise 10: Checking grammatical correctness

Dive into the core of modern NLP applications with token classification and text generation techniques. Learn to extract meaningful entities and grammatical structures using NER and PoS tagging. Master both extractive and abstractive question answering, and explore advanced generation tasks including summarization, translation, and language modeling using Hugging Face pipelines.

Exercise 1: Token classification Exercise 2: Identifying named entities in news headlines Exercise 3: Part of Speech tagging for text analysis Exercise 4: Question answering Exercise 5: Answering questions from product descriptions Exercise 6: Generating natural answers with abstractive QA Exercise 7: Sequence generation tasks Exercise 8: Summarizing news articles for quick insights Exercise 9: Translating customer reviews to French Exercise 10: Building a search completion system Exercise 11: Congratulations