1. Modifying imports: parsing dates
You've now worked with numeric, string, and Boolean data. This lesson will focus on one last data type you're likely to encounter: datetimes.
2. Date and Time Data
How computers handle dates and times is a rich topic, but what you need to know for now is that Python stores them as a special data type, datetime.
Datetimes can be translated into myriad text representations,
and there is a common set of codes used to describe how datetimes are formatted as strings.
3. pandas and Datetimes
By default, pandas loads datetime data as objects. If you want to arrange records by time, select within a timespan, or calculate intervals, though, you'll need datetime columns.
We use the parse dates keyword argument, not dtype, to specify datetime columns.
Parse dates accepts a list of column names or numbers to parse.
It also accepts a list of lists, where each sub-list is a group of columns that should be combined and parsed as one, such as separate day, month, and year columns.
Finally, to combine columns, parse them, and store the result as a new column, you can supply a dictionary, where each key is a new column name and each value is a list of columns to parse.
4. pandas and Datetimes
Let's see this with the New Developer Survey data, whose datetime columns have been modified for demonstration purposes.
5. pandas and Datetimes
Part1StartTime and Part1Endtime have data in standard year-month-day-hour-minute-second format.
6. pandas and Datetimes
Part2StartTime's data has been split into date and time columns.
7. pandas and Datetimes
Part2EndTime is in a nonstandard format.
8. Parsing Dates
To parse the dates in standard format, we pass the column names in a list
to read Excel's parse dates argument.
9. Parsing Dates
When we check the dtypes of the timestamp columns, we see the two columns were parsed successfully.
10. Parsing Dates
To parse the split-up timestamp columns,
we can add a list within the list, containing Part2StartDate and Part2StartTime,
and pass that to parse dates.
pandas creates a new combined datetime column, Part2StartDate underscore Part2StartTime.
11. Parsing Dates
But to control the column names, let's create a dictionary,
pass that instead,
and view the resulting column.
12. Non-Standard Dates
However, parse dates only works if the data is in a format that pandas understands. If you try to parse unusually-formatted dates with parse dates, like 123199 for December 31, 1999, you'll get the columns back as strings.
Instead, convert nonstandard dates after import with pandas' to datetime method.
To datetime takes the dataframe and column to convert, plus a format argument containing a string that describes how the data is formatted.
13. Datetime Formatting
Datetime formatting is described with a set of codes.
strftime.org is a valuable reference for them all.
14. Datetime Formatting
Some important codes are the ones for four-digit year,
zero-padded month,
zero-padded day,
hour on the 24-hour clock,
zero-padded minute,
and zero-padded second.
15. Parsing Non-Standard Dates
Let's use to datetime to parse Part2EndTime.
We review the data to build the string format description: percent lowercase m, percent lowercase d, percent uppercase Y, space, percent uppercase H, colon, percent uppercase M, colon, percent uppercase S.
We pass the dataframe and column to pd to datetime, and supply the format string to the format keyword argument. We reassign the result back to Part2EndTime.
16. Parsing Non-Standard Dates
When we check the column, we see the dates were parsed correctly.
17. Let's practice!
That's a lot of ways to parse a date! Now, it's your turn to practice. Good luck!