Practicing regular expressions: re.split() and re.findall()

Now you'll get a chance to write some regular expressions to match digits, strings and non-alphanumeric characters. Take a look at my_string first by printing it in the IPython Shell, to determine how you might best match the different steps.

Note: It's important to prefix your regex patterns with r to ensure that your patterns are interpreted in the way you want them to. Else, you may encounter problems to do with escape sequences in strings. For example, "\n" in Python is used to indicate a new line, but if you use the r prefix, it will be interpreted as the raw string "\n" - that is, the character "\" followed by the character "n" - and not as a new line.

The regular expression module re has already been imported for you.

Remember from the video that the syntax for the regex library is to always to pass the pattern first, and then the string second.

This exercise is part of the course

Introduction to Natural Language Processing in Python

View Course

Exercise instructions

Split my_string on each sentence ending. To do this:
- Write a pattern called sentence_endings to match sentence endings (.?!).
- Use re.split() to split my_string on the pattern and print the result.
Find and print all capitalized words in my_string by writing a pattern called capitalized_words and using re.findall().
- Remember the [a-z] pattern shown in the video to match lowercase groups? Modify that pattern appropriately in order to match uppercase groups.
Write a pattern called spaces to match one or more spaces ("\s+") and then use re.split() to split my_string on this pattern, keeping all punctuation intact. Print the result.
Find all digits in my_string by writing a pattern called digits ("\d+") and using re.findall(). Print the result.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Write a pattern to match sentence endings: sentence_endings
sentence_endings = r"[___]"

# Split my_string on sentence endings and print the result
print(re.____(____, ____))

# Find all capitalized words in my_string and print the result
capitalized_words = r"[___]\w+"
print(re.____(____, ____))

# Split my_string on spaces and print the result
spaces = r"___"
print(re.____(____, ____))

# Find all digits in my_string and print the result
digits = r"___"
print(re.____(____, ____))

Edit and Run Code