Get startedGet started for free

Working with time data

1. Working with time data

In this video we'll cover pandas functions designed for merging time series and other ordered data.

2. Joining with .merge_ordered()

Pandas has a special function called merge_ordered. It behaves similarly to pandas merge for ordered data such as dates. In this example we have two data frames, each with a column for game date. Although you can't tell by looking at it, the game dates are encoded by pandas as time series data in a way that ensures proper time order. Calling the merge_ordered function performs an outer merge by default. Note this is different from regular pandas merge, which defaults to an inner merge. It’s not a hard constraint though - you can use the ‘how’ parameter to change it.

3. Joining with .merge_ordered()

Under the default behavior of outer merge, the resulting data frame will contain Not-a-number, or NaN, values for rows not present in both input data frames.

4. Interpolating data

So why not just use pandas merge and sort values by the key column? Well, merge ordered has a useful parameter called ‘fill-method’. We can use this parameter to interpolate missing data. Here we see the same data joined as before with fill-method set to f-fill, which stands for "forward fill". For each instance of NaN, pandas will take the last known value and fill forward over the Nan. In our example, the resulting data frame contains values for all game dates except for the first temperature for Dallas.

5. Interpolating data

Let's look more closely at a specific instance. In the Cleveland data frame, index 2 shows a game played in Cleveland on 2016-9-18. There was not a game played in Dallas that day. An outer merge would normally return NaN for Temperature-Dallas. But with forward fill, the last observed value on 2016-9-11 carries forward. You can see in the result table that index 4, 2016-9-18, shows 85 degrees. Although it may have been warmer or cooler, having an imputed value can often be better than nothing.

6. Merging to nearest date-times

At times, you might need to join one dataset with another by date-time, even though the dates don't match. Pandas has a useful function called 'merge_asof' that can help. Merge_asof is similar to a sorted left-join except the match is on the nearest date rather than exact dates. In fact, it's like a VLOOKUP formula when range-lookup is set to TRUE. As long as both data frames are sorted by date and time, merge-asof will return the row based on the nearest match. Merge_asof has a 'direction' parameter that sets the logic for matching. There are three options. backward forward and nearest. The default value, 'backward', selects the row in the right-hand data frame with the closest date that is earlier than the target date. 'Forward' selects the closest row in the right data frame whose date is on or later than the target date. And 'Nearest' selects the row in the right Data Frame whose date is closest to the target date, regardless of time order.

7. Merge_asof Example

Here we have a data frame on the left with all game dates and temperatures, and another with only the games in which a concussion occurred. This second data frame includes a cumulative count of concussions for the season. You can use merge-asof to get total concussions as of each game date in the left table. Direction = backward is the best option here to return the cumulative count before each new game date.

8. Let's practice!

That last slide was a lot to take in. Let’s practice and see how it works for real.