Aan de slagGa gratis aan de slag

Joining multiple tables

You now want to pursue a different avenue and map player positions during punts. You may recall that the NextGenStats (NGS) system captures player positions and orientations 10 times per second for all players, on every play. It's a lot of data!

You'll be joining three data frames to prepare for analysis. Below are their names and descriptions.

  • games: high-level data by GameKey
  • punts: play-level data by GameKey and PlayId
  • ngs: position data by GameKey, PlayId, GSISID (player id), and Time

A member of your team has provided you with a list comprehension on line 2 to print the index of each data frame in one line of code. For more information about list comprehensions, check out Python Data Science Toolbox Part 2.

Deze oefening maakt deel uit van de cursus

Pandas Joins for Spreadsheet Users

Cursus bekijken

Oefeninstructies

  • Inner join the data frames on index using games as the primary data frame.
  • View the first 10 rows of the resulting data frame.
  • Ensure the index of the new frame has no duplicates.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# List the index of each data frame
print([[n for n in df.index.names] for df in [games, punts, ngs]])

# Inner join the data frames
games_all = ____.____([punts, ____], how=____)

# View first 10 rows of new frame
print(____.head(10))

# Check index for duplicates
print(____.index.____.sum())
Code bewerken en uitvoeren