Get startedGet started for free

Basics of time series data transformation

1. Basics of time series data transformation

Welcome back! The topic of this session will be how to transform and clean time series data.

2. Context and importance

This session is foundational to working with time series data. More and more data sources are created each day, and most will be updated with come kind of time component. Each of these systems will have different ways of handling time, which might not mesh with other datasets.

3. ISO as the international standard

Before diving into transforming data, we should start with the different formats that dates can be in. The most commonly used formats comply with ISO 8601 standards. These are international guidelines on displaying and formatting dates so that time can be understood easily. These guidelines can translate into different locales, time zones, and other personalizations using the FORMAT DAX function. The examples on the right show how the same date looks different in each locale, but all follow ISO 8601 guidelines.

4. UNIX for high-precision timestamps

Another date format that likely will appear in your analysis is UNIX time. UNIX represents the number of seconds that have elapsed since the reference Epoch time at the beginning of 1970. UNIX time can also be represented in milliseconds and describes a very specific point in time.

5. Manipulating dates in Power BI

In DAX, there are two main functions that can manipulate date field values: DATEADD allows you to move any number of time intervals from a reference date. Here we have to examples of using DATEADD. The provided examples show how to move forward 30 days, and backwards 3 hours respectively. DATEDIFF will calculate the time that has passed between two date values. In this example, we are seeing how many months have passed between the first day of February in 2019, and the last day of April in 2020, which is fourteen months

6. Summarizing data with DAX

Most algorithms and approaches to time series analysis require that data be a continuous data spectrum throughout the time period. Many datasets do not come to us that way, so we have a couple techniques to remedy that problem. One approach to this is to aggregate our dataset to a higher order time interval. In Power BI, this uses the SUMMARIZE function. By doing this, we can still analyze trends at this higher order time frame, without running into errors with time gaps. In the example on the right, we see how we could aggregate a sales table by quarter and region and calculate the total sales. By doing this, we could analyze the trend in sales for each quarter and region, even if there were some days of data missing.

7. Handling missing data

Apart from summarizing data, we have two other techniques to handle missing data. Imputation uses context (like mean, mode, etc.) to "fill in the blanks" with missing data. For example, if we have information about data on the weekdays, we can likely impute data for the weekends. Dropping data removes data from either rows or columns. We generally only resort to dropping data when we are missing a significant amount of data, usually at least 50%.

8. Superstore dataset

In the following exercises, we are going to work with the Superstore dataset, a high quality dataset that represents sales from several locations around the United States. This dataset will provide the groundwork to really practice in a hands on way how to manipulate time series data. Here we can see a brief screenshot of the dataset in Power BI. We have information about the sales for each location's orders and when they ship to the customers. We will use what we learned in this session to manipulate the data to meet our analytical needs.

9. Let's practice!

With all of that, let's go ahead and dive into the Superstore dataset and practice some time series transformations!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.