Get startedGet started for free

Modifying imports: parsing dates

1. Modifying imports: parsing dates

You've now worked with numeric, string, and Boolean data. This lesson will focus on one last data type you're likely to encounter: datetimes.

2. Date and Time Data

How computers handle dates and times is a rich topic, but what you need to know for now is that Python stores them as a special data type, datetime. Datetimes can be translated into myriad text representations, and there is a common set of codes used to describe how datetimes are formatted as strings.

3. pandas and Datetimes

By default, pandas loads datetime data as objects. If you want to arrange records by time, select within a timespan, or calculate intervals, though, you'll need datetime columns. We use the parse dates keyword argument, not dtype, to specify datetime columns. Parse dates accepts a list of column names or numbers to parse. It also accepts a list of lists, where each sub-list is a group of columns that should be combined and parsed as one, such as separate day, month, and year columns. Finally, to combine columns, parse them, and store the result as a new column, you can supply a dictionary, where each key is a new column name and each value is a list of columns to parse.

4. pandas and Datetimes

Let's see this with the New Developer Survey data, whose datetime columns have been modified for demonstration purposes.

5. pandas and Datetimes

Part1StartTime and Part1Endtime have data in standard year-month-day-hour-minute-second format.

6. pandas and Datetimes

Part2StartTime's data has been split into date and time columns.

7. pandas and Datetimes

Part2EndTime is in a nonstandard format.

8. Parsing Dates

To parse the dates in standard format, we pass the column names in a list to read Excel's parse dates argument.

9. Parsing Dates

When we check the dtypes of the timestamp columns, we see the two columns were parsed successfully.

10. Parsing Dates

To parse the split-up timestamp columns, we can add a list within the list, containing Part2StartDate and Part2StartTime, and pass that to parse dates. pandas creates a new combined datetime column, Part2StartDate underscore Part2StartTime.

11. Parsing Dates

But to control the column names, let's create a dictionary, pass that instead, and view the resulting column.

12. Non-Standard Dates

However, parse dates only works if the data is in a format that pandas understands. If you try to parse unusually-formatted dates with parse dates, like 123199 for December 31, 1999, you'll get the columns back as strings. Instead, convert nonstandard dates after import with pandas' to datetime method. To datetime takes the dataframe and column to convert, plus a format argument containing a string that describes how the data is formatted.

13. Datetime Formatting

Datetime formatting is described with a set of codes. strftime.org is a valuable reference for them all.

14. Datetime Formatting

Some important codes are the ones for four-digit year, zero-padded month, zero-padded day, hour on the 24-hour clock, zero-padded minute, and zero-padded second.

15. Parsing Non-Standard Dates

Let's use to datetime to parse Part2EndTime. We review the data to build the string format description: percent lowercase m, percent lowercase d, percent uppercase Y, space, percent uppercase H, colon, percent uppercase M, colon, percent uppercase S. We pass the dataframe and column to pd to datetime, and supply the format string to the format keyword argument. We reassign the result back to Part2EndTime.

16. Parsing Non-Standard Dates

When we check the column, we see the dates were parsed correctly.

17. Let's practice!

That's a lot of ways to parse a date! Now, it's your turn to practice. Good luck!