Get startedGet started for free

Data structures in Tablesaw

1. Data structures in Tablesaw

Welcome back. In this lesson, we'll explore Tablesaw's core data structures for data analysis in Java.

2. Core data structures

Tablesaw organizes data in a hierarchical structure. At the top, we have Tables, which are similar to DataFrames in Python or R. As we have seen in the previous chapter, each table contains multiple columns, and each column holds values of a single, consistent type, like strings, integers, or dates. Rows represent individual records, where each column's value corresponds to a field for that record.

3. Table methods

Let's look at some useful table methods. We use .name() to get the table's name as a string, as you can see in the output. We can also use the .rowCount() and .columnCount() methods, which return the number of rows and columns in the table as an integer value. Here, we have one thousand rows and five columns. These tools help us quickly understand our dataset before analysis.

4. Column types

Tablesaw uses strong typing, meaning each column has a fixed data type. StringColumn stores text, IntColumn handles whole numbers, DoubleColumn handles decimals, and BooleanColumn stores true/false values. For dates, use DateColumn for calendar dates or DateTimeColumn when we need both date and time, as shown in the examples. Strong typing also improves performance and makes code easier to debug with IDE support and autocompletion.

5. Column type operations

Each column type provides specialized operations. Here, we calculate the mean on a DoubleColumn using the .mean() method. We use DoubleColumn because we know our Salary data is of the double type.

6. Accessing data

We can access data in a Table in several ways. To get a specific column, use .stringColumn("Name") if we know the type, or .column("Name") for a general column reference. It is better to use the specific type if we can, otherwise we do not get access to some methods associated with that type. Once we have a column, we can access individual values using .get(index), like names.get(0) to get the first name. To work with entire rows, use .row(index), which gives us a Row object. From that row, we can retrieve values using methods like .getDouble("Salary"). These different access methods let us choose the one that fits our task best.

7. Selections

A Selection is a set of row indices that match a condition. We create selections using column methods like .isGreaterThan(), .isEqualTo(), or .isAfter(). Here, we create a Selection containing all rows where the Salary column is greater than 70,000.

8. Filtering on a Selection

To filter the table, we then pass the selection into the .where() method, which returns a new table with only the matching rows.

9. Boolean operations

We can also use Boolean operations when filtering. Here, we first create a Selection for employees hired after January 1st, 2020. Then we use the .and() method to combine it with our previous highEarners selection, finding employees who are both high earners and recent hires. We can combine Selections using .and() and .or(). Since these operations return new tables, our original data remains unchanged.

10. Let's practice!

In this lesson, we learned how Tablesaw organizes data into tables, columns, and rows, and how to use selections to filter data. Now it's your turn to practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.