Leveraging micro-partitions and data clustering
During a quick chat in the hall with your Lead Data Engineer, she shared with you that Snowflake is using data clustering to sort data within micro-partitions by the year
field in the olympic_medals
table. You have a few queries that you regularly execute against this table, which you'd like to update to better take advantage of Snowflake's micro-partitions and data clustering.
The create_engine
function from the sqlalchemy
module has been imported, and a connection object has been created and stored in the variable conn
.
This exercise is part of the course
Introduction to NoSQL
Exercise instructions
- Update the Snowflake query to only return records for games that took place in 2000 later.
- Return the results of the Snowflake query as a
pandas
DataFrame
, and print the result set.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Leverage the existing micro-partitions and data clustering
query = """
SELECT
team,
year,
sport,
event,
medal
FROM olympic_medals
____ year >= ____;
"""
# Execute the query, print the results
results = conn.cursor().____(query).fetch_pandas_all()
print(____)