Restructuring a dictionary
Now you want to clean up the politician data and move it into a Dask DataFrame. However, the politician data is nested, so you will need to process it some more before it fits into a DataFrame.
One particular piece of data you want to extract is buried a few layers inside the dictionary. This is a link to a website for each politician. The example below shows how it is stored inside the dictionary.
record = {
...
'links': [{'note': '...',
'url': '...'},], # Stored here
...
}
The bag of politician data is available in your environment as dict_bag
.
This exercise is part of the course
Parallel Programming with Dask in Python
Exercise instructions
- Complete the
extract_url()
function to extract the'url'
key from the dictionary, which is in the zeroth position in the list under the'links'
key, and assign this to the keyurl
. - Run the
extract_url()
function across all elements of the bag.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def extract_url(x):
# Extract the url and assign it to the key 'url'
x['url'] = x[____][____][____]
return x
# Run the function on all elements in the bag.
dict_bag = ____
print(dict_bag.take(1))