Bringing it all together (1)
You've got your first taste of writing your own functions in the previous exercises. You've learned how to add parameters to your own function definitions, return a value or multiple values with tuples, and how to call the functions you've defined.
In this and the following exercise, you will bring together all these concepts and apply them to a simple data science problem. You will load a dataset and develop functionalities to extract simple insights from the data.
For this exercise, your goal is to recall how to load a dataset into a DataFrame. The dataset contains Twitter data and you will iterate over entries in a column to build a dictionary in which the keys are the names of languages and the values are the number of tweets in the given language. The file tweets.csv
is available in your current directory.
Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).
This exercise is part of the course
Introduction to Functions in Python
Exercise instructions
- Import the pandas package with the alias
pd
. - Import the file
'tweets.csv'
using the pandas functionread_csv()
. Assign the resulting DataFrame todf
. - Complete the
for
loop by iterating overcol
, the'lang'
column in the DataFramedf
. - Complete the bodies of the
if-else
statements in the for loop: if the key is in the dictionarylangs_count
, add1
to the value corresponding to this key in the dictionary, else add the key tolangs_count
and set the corresponding value to1
. Use the loop variableentry
in your code.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import pandas
# Import Twitter data as DataFrame: df
df = ____
# Initialize an empty dictionary: langs_count
langs_count = {}
# Extract column from DataFrame: col
col = df['lang']
# Iterate over lang column in DataFrame
for entry in ____:
# If the language is in langs_count, add 1
if entry in langs_count.keys():
____
# Else add the language to langs_count, set the value to 1
else:
____
# Print the populated dictionary
print(langs_count)