Projectie van woordvectoren

Je kunt woordvectoren visualiseren in een scatterplot om beter te begrijpen hoe woorden in de woordenschat gegroepeerd zijn. Om woordvectoren te visualiseren, moet je ze projecteren naar een tweedimensionale ruimte. Dit kan door de twee hoofdcomponenten te extraheren met Principal Component Analysis (PCA).

In deze oefening ga je oefenen met het extraheren van woordvectoren en ze projecteren naar tweedimensionale ruimte met behulp van de PCA-bibliotheek uit sklearn.

Een korte lijst met woorden staat in de lijst words en het en_core_web_md-model is beschikbaar. Het model is geladen als nlp. Alle benodigde libraries en packages zijn al voor je geïmporteerd (PCA, numpy als np).

Deze oefening maakt deel uit van de cursus

Natural Language Processing met spaCy

Cursus bekijken

Oefeninstructies

Haal de woord-ID's op van de gegeven woorden en sla ze op in de lijst word_ids.
Haal de eerste vijf elementen van de woordvectoren van de woorden op en stapel ze vervolgens verticaal met np.vstack() in word_vectors.
Gegeven een pca-object, bereken de getransformeerde woordvectoren met de functie .fit_transform() van de pca-klasse.
Print de eerste component van de getransformeerde woordvectoren met [:, 0]-indexering.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

words = ["tiger", "bird"]

# Extract word IDs of given words
word_ids = [nlp.____.____[w] for w in words]

# Extract word vectors and stack the first five elements vertically
word_vectors = np.vstack([nlp.____.____[i][:5] for i in word_ids])

# Calculate the transformed word vectors using the pca object
pca = PCA(n_components=2)
word_vectors_transformed = pca.____(____)

# Print the first component of the transformed word vectors
print(____[:, 0])

Code bewerken en uitvoeren

Deze oefening maakt deel uit van de cursus

Natural Language Processing met spaCy

SkillTag.level.intermediateSkillTag.label

4.8+

Begin de cursus gratis

This chapter will introduce you to NLP, some of its use cases such as named-entity recognition and AI-powered chatbots. You’ll learn how to use the powerful spaCy library to perform various natural language processing tasks such as tokenization, sentence segmentation, POS tagging, and named entity recognition.

Exercise 1: Natural Language Processing (NLP) basics Exercise 2: Doc container in spaCy Exercise 3: NER use case Exercise 4: Tokenization with spaCy Exercise 5: spaCy basics Exercise 6: Running a spaCy pipeline Exercise 7: Lemmatization with spaCy Exercise 8: Sentence segmentation with spaCy Exercise 9: Linguistic features in spaCy Exercise 10: POS tagging with spaCy Exercise 11: NER with spaCy Exercise 12: Text processing with spaCy

Learn about linguistic features, word vectors, semantic similarity, analogies, and word vector operations. In this chapter you’ll discover how to use spaCy to extract word vectors, categorize texts that are relevant to a given topic and find semantically similar terms to given words from a corpus or from a spaCy model vocabulary.

Exercise 1: Linguïstische kenmerken Exercise 2: Linguïstische annotaties in spaCy Exercise 3: Woordbetekenis-ontleding met spaCy Exercise 4: Dependency parsing met spaCy Exercise 5: Introductie tot woordvectoren Exercise 6: spaCy-woordenschat Exercise 7: Woordvectoren in de spaCy-woordenschat Exercise 8: Woordvectoren en spaCy Exercise 9: Analogieën en vectorbewerkingen Exercise 10: Projectie van woordvectoren

Huidige oefening

Exercise 11: Soortgelijke woorden in een vocabulaire Exercise 12: Semantische overeenkomsten meten met spaCy Exercise 13: Doc-overeenkomst met spaCy Exercise 14: Span-overeenkomst met spaCy Exercise 15: Semantische gelijkenis voor het categoriseren van tekst

Get familiar with spaCy pipeline components, how to add a pipeline component, and analyze the NLP pipeline. You will also learn about multiple approaches for rule-based information extraction using EntityRuler, Matcher, and PhraseMatcher classes in spaCy and RegEx Python package.

Exercise 1: spaCy pipelines Exercise 2: Adding pipes in spaCy Exercise 3: Analyzing pipelines in spaCy Exercise 4: spaCy EntityRuler Exercise 5: EntityRuler with blank spaCy model Exercise 6: EntityRuler for NER Exercise 7: EntityRuler with multi-patterns in spaCy Exercise 8: RegEx with spaCy Exercise 9: RegEx in Python Exercise 10: RegEx with EntityRuler in spaCy Exercise 11: spaCy Matcher and PhraseMatcher Exercise 12: Matching a single term in spaCy Exercise 13: PhraseMatcher in spaCy Exercise 14: Matching with extended syntax in spaCy

Explore multiple real-world use cases where spaCy models may fail and learn how to train them further to improve model performance. You’ll be introduced to spaCy training steps and understand how to train an existing spaCy model or from scratch, and evaluate the model at the inference time.

Exercise 1: Customizing spaCy models Exercise 2: Training spaCy models Exercise 3: Model performance on your data Exercise 4: spaCy training data format Exercise 5: Training steps Exercise 6: Annotation and preparing training data Exercise 7: Compatible training data Exercise 8: Training with spaCy Exercise 9: Training preparation steps Exercise 10: Train an existing NER model Exercise 11: Training a spaCy model from scratch Exercise 12: Wrap-up