Get startedGet started for free

Load and clean data

1. Load and clean data

Before diving into the Excel work, let’s introduce the data we will use for these videos. We will be using data from a car dealership business based in India. Our initial file has information on different cars sold by the dealership. It includes the car make and model, fuel type, and other important details the business captures. However, we still need two important fields - the year the car was manufactured and the price it was sold at. The additional information is stored in a separate text file. To import data into our Excel file, we go to the Data menu, and under the Get & Transform Data section, we click on From Text/CSV. A new window appears that allows us to select the file we want to load into the file. Once we’ve selected our file, a new Import window will appear with options for the import process. At the top, we can see some of the different settings for the file origin, the delimiter used in the value to separate the values, and the data type detection. Excel has automatically identified the different options, so we will leave the settings as they are. We can see in the preview that the data we are bringing in has three columns. The preview of the data looks good, so let's click Load. Perfect. We have our additional data imported into our file. We can close the Queries & Connections pane that has opened on the right to show the connection we created. The data has been imported into Excel as a table, but we can format it to make it easier to view. First, we will unwrap the text in the columns by clicking Wrap Text in the home menu and resizing the columns. Now that we have brought the data into one file, let’s start cleaning it. In the new sheet, we can see that there are several duplicate records. Select a cell in the table, then click on data on the menu ribbon at the top of the file, and find the option to Remove Duplicates. Click on it. We get a new window appearing that lists all the columns which have been selected. If we only selected one column, Excel would delete rows only where duplicates are in that one column. We will remove duplicate values based on all columns. Perfect, we have removed duplicate rows only. Now let's fix this Car column. We have the make and model all together, but we want them separated. In D1, let’s type a header - Car Make. As the data is formatted as a table, the new column is formatted like the rest of the table. Let’s type the car make for row 2 - which is Tata. In Column A, anything before the comma is the car make. So for row 3, let's type Maruti Suzuki. We have over 1000 rows - it would take ages to type every row. So let's select a cell in Column D, click on Data at the top, and click Flash-fill. How handy is that? When dealing with data in sheets that can be used to look up additional information, it is good practice to have a unique identifier. Let’s insert a new column at the start and give it the header Car ID. In A2, we will enter one as the first ID number. We want to fill the column with numbers that increase by two every time. So let's select the first cell and then all the cells we want to be filled, and on the Home menu, click the Fill button and select Series. Here we can select different options to fill our column. First, we will select Columns. The data type has been pre-selected as Linear. The step value will be two - so the number will increase by two each time. We don’t need a stop value as we have a certain number of rows to fill this time. We can now click ok. Perfect, we now have a unique ID for each car! Now it’s your turn to try things out.

2. Let's practice!