Get startedGet started for free

Introduction to data analysis

1. Introduction to data analysis

Hello, and welcome to Data Analysis in Excel. My name is Nick Edwards. I will be your instructor, and I am very excited to have you join me in this course.

2. What to expect

In this course, we will review some more intermediate functionality in Excel that you will find extremely useful in data analysis. Because this course covers more intermediate material, you should already be familiar with some of the basics; therefore, we recommend that you take these prerequisite courses before starting this course.

3. That's Bananas!

Throughout this course, we will act as consultants for a fictional SaaS start-up company, Bananas. SaaS stands for "software as a service", and Bananas offers subscriptions to their online platform for project management. They need help analyzing their sales data to find trends and predict sales. To do this, we will explore Bananas' sales data from their beginning in July 2019 through June 2021, which is organized by Customer ID and Sales Month.

4. Where be the data, matey?

The first and most important step in data analysis is to understand the dataset. Exploratory Data Analysis, or EDA, is the process of understanding the dataset. Just like exploring a treasure map to find interesting places and understand hidden secrets, in EDA, we explore the data to find interesting patterns and hidden relationships. Without doing EDA first analysts are prone to make more mistakes, so always explore the data before beginning any analysis.

5. Performing exploratory data analysis

Performing Exploratory Data Analysis (EDA) typically involves three basic steps. Start by preparing the data for analysis. This includes collecting the relevant data required for analysis and cleaning it by handling missing values, outliers, and inconsistencies.

6. Performing exploratory data analysis

Once the data is prepped, we explore the data by learning about each variable, calculating summary statistics to find correlations and trends, and visualizing the data.

7. Performing exploratory data analysis

Finally, once we have a good understanding of the underlying data, we formulate initial hypotheses or ideas about relationships or patterns in the data based on the observations and insights gained during exploration. For example, we could hypothesize that sales should increase over time and then analyze if this is true or not. This, in-turn, could lead to more questions about the data and cause further analysis. All of this is to gain a deeper understanding of the dataset.

8. Summary statistics

Summary statistics are measures that summarize the main characteristics and properties of a dataset. Mean, median, mode, and range are basic statistical measurements that can be implemented in EDA. Measures of central tendency are values that describe the middle of a data set. Knowing these values helps uncover which values should be expected within the dataset and which values are outliers. The mean is the average value of the dataset, and is calculated by the sum of all values in a dataset divided by the total number of values. The median is the middle value in an ordered dataset. It divides the data into two equal halves, indicating the central value that separates the higher and lower values. The mode is the most frequently occurring value in a dataset. It is the most common number that appears. The range is the difference between the maximum and minimum values in a dataset. It provides a simple measure of the spread of the data.

9. Let's practice!

Alright, let's put these ideas to use!