1. Extracting date and time components
Now that we have a good understanding of our data,
2. Data analysis process
it's time to begin the next phase of the data analysis process: cleaning and preparing the data for analysis.
3. Cleaning and preparing data
It's often said that data professionals spend 80% of their time cleaning data, and only 20% analyzing it.
A clean dataset has a structure and contents that can be easily processed during analysis, and will return valid conclusions.
Preparing data for analysis can be a tricky and time-consuming process, but a well-cleaned and prepared dataset will save much more time during analysis.
In this lesson, we'll look specifically at cleaning date and time data, and later in the chapter, we'll investigate text data.
4. Dates and times
Date and time data is often collected when our question or problem relates to measurements over a time period.
Date and time data is usually continuous, which means that it can take any value.
In many analyses, we may need to convert dates and times into discrete values, so they can only take a finite number of categories.
5. Discrete dates and times
Imagine you are the owner of a coffee shop, and want to analyze coffee purchases across each day of the week. For this analysis, we don't need to know the date and time of every coffee purchase, only which day of the week it was purchased on, so we can clean the data to extract only this component.
6. Extracting the year component
To extract the year component from a date, we can use the YEAR function.
Let's apply this to dates of manned moon landings.
7. Extracting the year component
We pass the reference of the first date, A2, to YEAR,
8. Extracting the year component
which returns 1969.
Copying the formula downwards,
9. Extracting the year component
populates the remaining cells.
10. Extracting the month component
We can extract the month using the MONTH function in the same way,
11. Extracting the month component
which returns the month as a number.
This might be the desired format, but in many cases, we may want the month displayed as text, like Jan or January.
12. Extracting the short month name component
We'll need to use the TEXT function to extract the component in this format. TEXT is a flexible function used to convert numerical data into different formats.
We start by passing the cell reference to TEXT,
13. Extracting the short month name component
then pass three lowercase m's in quotes to the format argument.
14. Extracting the short month name component
This produces the short month name format,
15. Extracting the short month name component
which we can copy downwards.
16. Extracting the long month name component
To extract the long month name format, we specify four m's instead of three.
17. Extracting the long month name component
This looks great! Let's try something that isn't as evident from the original date: the day of the week!
18. Extracting the weekday component
The WEEKDAY function returns the day of the week as a number.
Different cultures designate different days as the start of the week, the most common being Sunday and Monday. The type argument allows us to specify the first day of the week, and which number it should represent.
The default type value is one, which sets Sunday as the first day of the week with a value of one, and derives the other days by counting onward, so Monday is two, and so on.
Other formats can be used by specifying different numbers.
Let's use WEEKDAY on the moon landings, using the default numbering system.
19. Extracting the weekday component
The first moon landing returned four, which corresponds to a Wednesday. Our analysis might require numerical weekdays, but it might be useful to extract the weekday names.
20. Extracting the short weekday name
The TEXT function comes to the rescue once again!
Like with months, we can extract the short day name by passing a string of three d's to the format argument,
21. Extracting the long weekday name
and long weekday names using four d's.
22. Extracting time components
Although we've focused on dates, there are also many functions to extract time components, like HOUR, MINUTE, and SECOND.
23. Let's practice!
Grab your mop and bucket, it's time to get cleaning!