Introduction to Tablesaw
1. Introduction to Tablesaw
So far, we have been looking at concepts using standard Java. Let's now introduce Tablesaw, a popular Java library which is a useful tool for performing data analysis.2. Importing Tablesaw
Tablesaw is a powerful Java library for data manipulation. To use Tablesaw, add import tech.tablesaw.api.* to your code. This imports the core Table, Column, and Row classes. * means all classes are included. To import specific column types, use import tech.tablesaw.api.DoubleColumn or StringColumn. For statistical operations, use import tech.tablesaw.aggregate.*. These imports let us create tables, manipulate data, and analyze datasets efficiently.3. Tabular format
Tablesaw is designed for tabular data - data in rows and columns, like a spreadsheet. Each column represents a variable, and each row an observation. Tablesaw provides an intuitive framework for working with such data.4. Tabular format
In Java, we often use arrays or collections to store data, as we have in this example on screen now, but these can get complex and hard to keep track of very quickly. This course will cover creating Tablesaw tables from scratch and from external files.5. Creating a table
Tablesaw provides intuitive structures mirroring how we think about our data: tables with named columns containing specific data types. There are several ways to build a table: from scratch, from an external file, or from existing columns. When creating from scratch, we define the table structure and add data directly in our code. We can also build tables from existing Column objects, which is useful when we need to combine or transform data from other tables. You will notice two new methods in these examples: addColumns(), which adds columns to a table, and create(), which creates the columns themselves.6. Table metadata
Understanding our data starts with exploring its metadata, information about our data's structure. Tablesaw offers several methods for this, including shape(), columnNames(), structure(), first(), and last(). The shape method returns the dimensions of our table as rows and columns, as we can see in the output here.7. Table metadata
ColumnNames gives a list of all column headers, which is essential when working with unfamiliar datasets. Structure provides information about each column, including data types, and you can see the output of both of these methods on the screen.8. Table metadata
Finally, the first and last methods let us preview portions of large tables. These tools help us quickly get familiar with new datasets before diving into deeper analysis.9. Adding columns
Modifying your table structure is a common task in data analysis. To add columns, use the addColumns method, which accepts one or more Column objects. In this example, we add a column called "Bonus" to our employees table.10. Removing and renaming columns
For removing columns, removeColumns takes column names as arguments - so here we remove a column called StartDate. For renaming columns, use the column's setName method, so here we rename the Salary column to AnnualSalary. To get the type of a column, we use the type method. We will discuss column types in more detail in the next chapter. Remember that Tablesaw operations typically return the modified table, allowing us to chain operations, and all three examples shown here return a modified table rather than a new one. This is particularly useful when transforming data through multiple steps.11. Summary
This table shows all the syntax we covered, which may be useful for you to refer back to.12. Let's practice!
Now that you've seen the basics of tables in Tablesaw, let's practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.