Working with multiple datasets
You can load as many different datasets as you’d like from data.world and work with them together. Here we’ve used the load_dataset
method to bring in two separate datasets, assigning them each to a variable.
We’ll leave it to you to create a dataframe for each using the dataframes
property, and then merge the two dataframes together on the state
and stusab
fields.
After merging them together, add a new citystate
field to your merged dataset, populating it with the concatenated values of the city
and state_name
fields, separated by ,
resulting in a city, state
format.
This exercise is part of the course
Intro to data.world in Python
Exercise instructions
- Create a
police_shootings
dataframe from thefatal_police_shootings_data
table inint_dataset
- Create a
state_abbrvs
dataframe from thestatesfipscodes
table infipsCodes_dataset
- Merge the two dataframes together on the
state
andstusab
fields using the merge() function. Assign tomerged_dataframe
. - Add a
citystate
column to your merged dataframe, populating it with the concatinated values from thecity
andstate_name
columns, separated by ', '. - Print first 5 rows of merged dataframe
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# datadotworld module has been imported as dw
import datadotworld as dw
# We've loaded two datasets to use 'int_dataset' and 'fipsCodes_dataset'
int_dataset = dw.load_dataset('https://data.world/jonloyens/intermediate-data-world')
fipsCodes_dataset = dw.load_dataset('https://data.world/uscensusbureau/fips-state-codes')
## Create two dataframes: police_shootings from the 'fatal_police_shootings_data' table of int_dataset and state_abbrvs, from the 'statesfipscodes' table of fipsCodes_dataset
## Merge the two datasets together on the state and stusab fields. Assign to a merged_dataframe variable.
merged_dataframe = ____.____(____, how = 'left', left_on = '____', right_on='____')
## Add a 'citystate' column to the merged_dataframe dataframe, populating it with the concatinated values from the 'city' and 'state_name' columns, separated by ', '.
## Print first 5 rows of merged_dataframe