LoslegenKostenlos loslegen

Match and split

Some of the tweets in your dataset were downloaded incorrectly. Instead of having spaces to separate words, they have strange characters. You decide to use regular expressions to handle this situation. You print some of these tweets to understand which pattern you need to match.

You notice that the sentences are always separated by a special character, followed by a number, the word break, and after that, another special character, e.g &4break!. The words are always separated by a special character, the word new, and a normal random character, e.g #newH.

The variable sentiment_analysis containing the text of one tweet, as well as the re module were already loaded in your session. You can use print(sentiment_analysis) to view it in the IPython Shell.

Diese Übung ist Teil des Kurses

Regular Expressions in Python

Kurs anzeigen

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Write a regex to match pattern separating sentences
regex_sentence = ____"____"
Code bearbeiten und ausführen