Data preparation
1. Data preparation
Now it's time to focus on data preparation, and how we can carry out related tasks in Alteryx Designer Desktop.2. What is data preparation?
So, what exactly is data preparation, and why is it necessary? Data preparation is a crucial step in data analysis and involves cleaning, transforming, and organizing raw data. This "clean", higher quality data is essential to ensure accurate and meaningful analysis results.3. What is data preparation?
Some tasks we could perform to prepare and clean data include ensuring it's free from missing values, typos, and duplicate entries. It also involves confirming the data is relevant. Finally, ensuring the correct data types are applied to columns and using short, descriptive names for columns and tables is essential. By preparing data at the outset, you avoid surprises during analysis and work more efficiently, laying the foundation for insightful conclusions and effective decision-making.4. DC High School library
Data preparation processes are similar to some of the librarian's tasks at DC High School's library.5. DC High School library
When adding new books and journals to the library's collection, certain processes are required to organize everything. The new books must be classified based on criteria like author and genre. The books must also be tagged and cataloged appropriately to be easily organized, identified, and sorted into appropriate library areas for display. Similarly to how a library needs to be ordered, we need to order data to make sense of it.6. The tools
Alteryx Designer has numerous tools that can help us perform data preparation tasks7. The tools
and they are easily accessible from the preparation toolset.8. The tools
The main tools we will cover in this chapter include the Select tool, the Sort tool and the Sample tool.9. Data types in Alteryx Designer
Before looking at the tools, it is important we cover the topic of data types as part of our data preparation. We should always select the correct data type for each column in our dataset. It can play an important role in any analysis within our workflow and can allow additional calculations to be carried out using the columns. We can also better profile a column when it has the correct data type applied.10. Data types in Alteryx Designer
The five main categories of data types utilized in Alteryx Designer are Boolean11. Data types in Alteryx Designer
Numeric,12. Data types in Alteryx Designer
String,13. Data types in Alteryx Designer
DateTime,14. Data types in Alteryx Designer
and Spatial.15. Data types in Alteryx Designer
The dataset from DC High School contains only text and numbers, so we will only focus on those in this course.16. Data types in Alteryx Designer
Numeric data types include Byte, Integer, Fixed Decimal, and Double. This data type applies to numbers and allows datasets to store whole integers and numbers with decimals correctly. Bytes are used to classify the positive integers from zero to 255. So, a column containing test results out of 100 should be a numeric data type, and specifically a Byte.17. Data types in Alteryx Designer
String refers to a sequence of characters used to represent text. In Alteryx Designer, these can be classified as String, V_String, and V_WString. V_Sring is used for variable lengths of text data, so it can allow text ranging from a few characters to very large amounts of text. It is important to note that strings can include letters, numbers, symbols, and spaces.18. The tools
Now that we have covered data types, let’s return to the tools. The select tool is extremely useful for data preparation. Not only does it let us choose which columns we keep in our workflow, but it also allows us to select the appropriate data type for each column and even change the names of the columns and add descriptions.19. The tools
As the name suggests - the sort tool allows you to sort your data by values of one or more columns in your dataset.20. The tools
The sample tool allows you to create a sample of your dataset within the workflow, with a range of options available for determining the sample, such as the first 100 rows or the last 20 rows.21. Let's practice!
With the key concepts of data preparation covered, time for some exercises.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.