Loading a dataframe from a parquet file
A dataframe file called sherlock_sentences.parquet is available in your workspace. Each row of this dataframe contains a single clause. Each clause is a sequence of words that is separated from other clauses by punctuation, such as periods, quotes, and other natural language delimiters that signify a sentence or sentence fragment. Your mission, if you choose to accept it, is to load this file.
Cet exercice fait partie du cours
Introduction to Spark SQL in Python
Instructions
- Load
sherlock_sentences.parquet. - Filter on "id > 70", and show the first 5 rows.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Load the dataframe
df = ____('sherlock_sentences.parquet')
# Filter and show the first 5 rows
df.where('id > 70').____(____, truncate=False)