1. Additional datetime methods in Pandas
In this final lesson, we will cover some additional Pandas methods for working with dates and times. By the end of this lesson, you will understand how to handle timezones in Pandas, as well as other common datetime operations.
2. Timezones in Pandas
First, a reminder of the importance of timezones. If we ask Pandas to tell us the smallest ride duration in seconds, using the dt-dot-total_seconds() method and then the dot-min() method, we get -3346 seconds, or -55 minutes. Yikes! Something is wrong, since our ride durations shouldn't ever be negative.
3. Timezones in Pandas
The answer, as it was when we looked at this data set in standard Python, is Daylight Saving.
Just like with standard Python, these datetime objects start off as timezone-naive. They're not tied to any absolute time with a UTC offset. Let's see the first three Start dates so we can see how they're displayed and check that there is no UTC offset.
To start, we want those same three datetimes to be put into a timezone. The method for this in Pandas is dt-dot-tz_localize(). Now when we look at the localized datetimes, we can see that they have a UTC offset.
4. Timezones in Pandas
However, if we try to convert our entire Start date column to the America/New_York timezone, Pandas will throw an AmbiguousTimeError. As expected, we have one datetime that occurs during the Daylight Saving shift.
Following the advice of the error message, we can set the ambiguous argument in the dt-dot-tz_localize() method. By default, it raises an error, as we saw before. We also can pass the string 'NaT', which says that if the converter gets confused, it should set the bad result as Not a Time. Pandas is smart enough to skip over NaTs when it sees them, so our dot-min() and other methods will just ignore this one row.
5. Timezones in Pandas
Now that we've fixed the timezones, we should recalculate our durations, in case any rides had been across Daylight Saving boundaries.
This time, when we take Durations, convert it to seconds, and take the minimum, we get a much more sensible 116-point-0 seconds, or about two minutes.
6. Timezones in Pandas
Just to know what we're looking at, let's pull up our problematic row. Here, both the start and end time were ambiguous, so they've been set to NaT. As a result, our Duration, since it's the difference of two undefined times, is also NaT.
7. Other datetime operations in Pandas
There are other datetime operations you should know about too.
The simplest are ones you're already familiar with: .year, .month, and so on. In Pandas, these are accessed with dt-dot-year, dt-dot-month, etc. For example, here is the year of the first three rows.
There are other useful things that Pandas gives you, some of which are not available in standard Python. For example, the method dt-dot-day_name() gives you the day of the week for each element in a datetime Series. You can even specify if you want weekday names in a language other than English.
These results can be aggregated with dot-groupby() call, to summarize data by year, month, day of the week, and so on.
8. Other parts of Pandas
Pandas also lets you shift rows up or down with the dot-shift() method. Here we've shifted the rides one row forward so that our zeroth row is now NaT, and our first row has the same value that our zeroth row had before.
This is useful if you want to, for example, line up the end times of each row with the start time of the next one. Now you can answer questions about how each ride compares to the previous one! You'll cover this in an exercise shortly.
9. Additional datetime methods in Pandas
In this lesson, we looked at additional methods in Pandas that are relevant to working with datetimes. Hopefully, this gave you a good taste of all the things Pandas is capable of! Time to try them out in the exercises.