Creating a bag from saved text
This time your colleague has saved the reviews to some text files. There are multiple files and multiple reviews in each file. Each review is on a separate line of the text file.
You want to load these into Dask lazily so you can use parallel processing to analyze them more quickly.
dask.bag
has been imported for you as db
.
Diese Übung ist Teil des Kurses
Parallel Programming with Dask in Python
Anleitung zur Übung
- Use the
read_text()
function to load in all of the.txt
files inside the directorydata/tripadvisor_hotel_reviews
. - Count the number of reviews in the bag.
- Use the bag's
.compute()
method to print the answer.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Load in all the .txt files inside data/tripadvisor_hotel_reviews
review_bag = ____
# Count the number of reviews in the bag
review_count = review_bag.____
# Compute and print the answer
print(____)