Get startedGet started for free

Introduction to Tablesaw

1. Introduction to Tablesaw

So far, we have been looking at concepts using standard Java. Let's now introduce Tablesaw, a popular Java library which is a useful tool for performing data analysis.

2. Importing Tablesaw

Tablesaw is a powerful Java library for data manipulation. To use Tablesaw, add import tech.tablesaw.api.* to your code. This imports the core Table, Column, and Row classes. * means all classes are included. To import specific column types, use import tech.tablesaw.api.DoubleColumn or StringColumn. For statistical operations, use import tech.tablesaw.aggregate.*. These imports let us create tables, manipulate data, and analyze datasets efficiently.

3. Tabular format

Tablesaw is designed for tabular data - data in rows and columns, like a spreadsheet. Each column represents a variable, and each row an observation. Tablesaw provides an intuitive framework for working with such data.

4. Tabular format

In Java, we often use arrays or collections to store data, as we have in this example on screen now, but these can get complex and hard to keep track of very quickly. This course will cover creating Tablesaw tables from scratch and from external files.

5. Creating a table

Tablesaw provides intuitive structures mirroring how we think about our data: tables with named columns containing specific data types. There are several ways to build a table: from scratch, from an external file, or from existing columns. When creating from scratch, we define the table structure and add data directly in our code. We can also build tables from existing Column objects, which is useful when we need to combine or transform data from other tables. You will notice two new methods in these examples: addColumns(), which adds columns to a table, and create(), which creates the columns themselves.

6. Table metadata

Understanding our data starts with exploring its metadata, information about our data's structure. Tablesaw offers several methods for this, including shape(), columnNames(), structure(), first(), and last(). The shape method returns the dimensions of our table as rows and columns, as we can see in the output here.

7. Table metadata

ColumnNames gives a list of all column headers, which is essential when working with unfamiliar datasets. Structure provides information about each column, including data types, and you can see the output of both of these methods on the screen.

8. Table metadata

Finally, the first and last methods let us preview portions of large tables. These tools help us quickly get familiar with new datasets before diving into deeper analysis.

9. Adding columns

Modifying your table structure is a common task in data analysis. To add columns, use the addColumns method, which accepts one or more Column objects. In this example, we add a column called "Bonus" to our employees table.

10. Removing and renaming columns

For removing columns, removeColumns takes column names as arguments - so here we remove a column called StartDate. For renaming columns, use the column's setName method, so here we rename the Salary column to AnnualSalary. To get the type of a column, we use the type method. We will discuss column types in more detail in the next chapter. Remember that Tablesaw operations typically return the modified table, allowing us to chain operations, and all three examples shown here return a modified table rather than a new one. This is particularly useful when transforming data through multiple steps.

11. Summary

This table shows all the syntax we covered, which may be useful for you to refer back to.

12. Let's practice!

Now that you've seen the basics of tables in Tablesaw, let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.