Get startedGet started for free

The data preparation journey

1. The data preparation journey

Welcome to this course on data preparation in Excel! My name is Iason, and I will be your instructor.

2. Excel knowledge

This course will help build on some of Excel's basic functions and features while also introducing you to how Excel can be used to prepare data for analysis. Therefore, we recommend that you complete the following course before you start.

3. What is data preparation?

Data preparation involves cleaning, transforming, and organizing any raw data you will use to perform data analysis. To provide good quality analysis, you must ensure your data is high quality and "clean". This will ensure that your final results and outcomes are accurate. Performing this process at the beginning ensures that you are working efficiently and reduces any possible surprises during the analysis stages of your work.

4. Doing laundry

You can think of data preparation in Excel, like doing laundry. There are certain steps you have to go through to complete your task. When you want to do laundry, you will collect your worn and unclean clothes and sort them by color or fabric type. You would then select the correct settings for the washing machine, like the temperature or the spin cycle speed. You can then start the cleaning process by turning on the washing machine. Now that your dirty clothes are clean, you can dry your clothes, then fold and organize them accordingly.

5. Doing data preparation

Data preparation also has certain steps to go through. Once you have your raw data in Excel, you can sort or filter it to view and focus on the data more easily. You then want to set the correct formats for your data and the correct data types. Cleaning data is a key step in data preparation. It can involve removing duplicates, correcting errors in the data, and even filling in missing data. Finally, you can check the quality of your data and find ways to summarize your data better.

6. Gathering raw data

We must get raw data into Excel to carry out any data preparation! Different data sources can export information as Excel files with data in a tabular format ready for preparation processes. You can also enter data into Excel manually, but dealing with hundreds or thousands of records is very time-consuming. One easy option is to import data into your Excel file. Several ways exist, including bringing data in from CSV, text, and even web files. This process becomes quicker if the data in these files is in tabular format.

7. Removing duplicates

Once your raw data is ready, you can start your data preparation processes. Removing duplicates is an important part of cleaning raw data. Excel has a feature that lets you do this with great ease. You can identify and select the columns to remove duplicates from your data. When removing duplicates, you want to remove records added more than once in error to avoid counting them when performing analysis.

8. Fill options

Fill features, such as flash-fill and advanced Fill Series, can also be used for data preparation in Excel to ensure all the necessary data is available and correct. Flash fill lets you easily fill your data when it senses a pattern. This works best when there is existing data so that Excel can identify patterns easier. This feature is great for combining first and last name columns into a full name column. There are some some basic fill series features in Excel for filling in data. However, more advanced Fill Series features allow you to select either rows or columns to fill, the type of data, and even the intervals between the values. For example, you can enter the date in the first cell of a new column and then use Fill Series to populate the whole column with weekdays up to your desired end date.

9. Dataset

In the course exercises, you will deal with a dataset from a fictitious supply chain named DataCo. You have recently been hired as an analyst within the company, and your first project involves cleaning and preparing raw data for an analysis task. The data will be provided in different file types and contains information about products, orders, and customers.

10. Let's practice!

Now it's time to put some of this knowledge to the test!