Joining multiple tables
You now want to pursue a different avenue and map player positions during punts. You may recall that the NextGenStats (NGS) system captures player positions and orientations 10 times per second for all players, on every play. It's a lot of data!
You'll be joining three data frames to prepare for analysis. Below are their names and descriptions.
games
: high-level data by GameKeypunts
: play-level data by GameKey and PlayIdngs
: position data by GameKey, PlayId, GSISID (player id), and Time
A member of your team has provided you with a list comprehension on line 2 to print the index of each data frame in one line of code. For more information about list comprehensions, check out Python Data Science Toolbox Part 2.
This exercise is part of the course
Pandas Joins for Spreadsheet Users
Exercise instructions
- Inner join the data frames on index using
games
as the primary data frame. - View the first 10 rows of the resulting data frame.
- Ensure the index of the new frame has no duplicates.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# List the index of each data frame
print([[n for n in df.index.names] for df in [games, punts, ngs]])
# Inner join the data frames
games_all = ____.____([punts, ____], how=____)
# View first 10 rows of new frame
print(____.head(10))
# Check index for duplicates
print(____.index.____.sum())