1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Spark SQL in Python

Connected

Exercise

Loading a dataframe from a parquet file

A dataframe file called sherlock_sentences.parquet is available in your workspace. Each row of this dataframe contains a single clause. Each clause is a sequence of words that is separated from other clauses by punctuation, such as periods, quotes, and other natural language delimiters that signify a sentence or sentence fragment. Your mission, if you choose to accept it, is to load this file.

Instructions

100 XP
  • Load sherlock_sentences.parquet.
  • Filter on "id > 70", and show the first 5 rows.