1. Diving into data
Great job on cleaning the data! Now, let's dive into data integration and analysis with Alteryx.
2. Recap Refresh
In the first chapter, you imported datasets, scanned and cleaned data, removed duplicate rows, and checked data completeness. You even calculated the average daily steps! Nice work!
3. Journey Highlights
In our journey so far, we found that all fields had the V_string data type. In WeightLogInfo, 65 NULL values were found in the Fat column. The count of Unique IDs demonstrated that DailySteps had 24, WeightLogInfo had eight, and SleepDay had 33 unique IDs.
SleepDay exhibited three duplicate rows. The average step count exceeded 8,000 for only 11 days.
4. Combine Datasets
Before we continue our analysis, it is better to join datasets into a single table to avoid working with separate datasets. We will use unique identifiers (IDs) and dates to join data. IDs help to match corresponding records across datasets, and Dates allow us to align events chronologically, providing insights into trends over time.
5. Calculate Body Mass Index (BMI)
One of the metrics we will be calculating is Body Mass Index (BMI). BMI categorizes individuals into different weight status categories, such as underweight, normal weight, overweight, and obese. The BMI formula is weight in kilograms divided by the square of the height in meters.
But why is BMI relevant to our analysis? Understanding BMI allows us to gain insights into the health and fitness profiles of the customer base.
6. Fitness Insights
To determine fitness behaviors, we will refer to the guidelines provided by the Centers for Disease Control and Prevention (CDC) and the National Institute of Health (NIH). We will use sitting thresholds to evaluate the health risks associated with prolonged sitting. Additionally, we will calculate a lifestyle index to assess the user's lifestyle based on average daily steps. These metrics will help us understand a user's fitness behaviors.
7. Aggregate and Compare
Once the datasets are integrated and metrics are calculated, the focus shifts to aggregation and analysis. We will compare our aggregated figures with standardized norms from reliable sources such as the CDC and NIH. This comparative analysis will provide insights into our customers’ health and fitness habits.
8. Flexibility in Alteryx
Alteryx is a flexible tool. There are numerous ways to solve a problem in Alteryx. Your solution and the provided solution may not match exactly, but they can still produce the same output. We encourage you to experiment with different methods throughout the course.
9. Let's practice!
It's time to get back to some exercises. You'll start by joining the datasets! Happy Analyzing!