CSV processing with Tablesaw
1. CSV processing with Tablesaw
Welcome back! Now, we'll dive into the fundamentals of the Tablesaw package, starting with one of its most common tasks: working with CSV files.2. Reading CSV files
The core functionality we'll explore first is reading CSV files. The simplest approach uses the .read().csv() method with a file path. The method is imported with the Table class. Here, we are reading in the data.csv file. This method automatically detects column types and creates a structured Table object.3. CSV reading options
For more control, we can use CsvReadOptions. CsvReadOptions is a helper class in Tablesaw that lets us customize how a CSV file is read. For example, if our file has a different separator, missing headers, or a specific character encoding. To use CsvReadOptions, we first import it. We then call its .builder() method with the file name, then chain any settings we need before calling .build(). For example, we can use .separator() to change the column delimiter, .header() to indicate if the file includes column names, or .missingValueIndicator() to specify what Strings should be treated as missing values. Once we've built the options, we pass them to Table.read().csv() to load the data.4. Writing CSV files
Writing the table back to a CSV file follows a similar structure. The basic .write().csv() method writes with default options and is also part of the Table class. When writing, column types and structure are preserved, and special characters are automatically handled. In this example, we write the table dataTable to the file output.csv.5. CSV writing options
Just like when reading CSVs, we can also specify options for writing using CsvWriteOptions. The syntax of CsvWriteOptions is similar to CsvReadOption. We call its .builder() method with the file name, then chain any settings we want before calling .build(). For example, just like reading a CSV, we can use .header() to indicate whether to include column names, .separator() to change the column delimiter,.quoteAlways() to specify if all fields in the output CSV should be enclosed in double quotes, and finally .lineEnd() to set the line-ending format, which may need to be modified if you are swapping between Windows and Linux systems. Once we've built the options, we pass them to Table.write().csv() to write the CSV file.6. CSV workflow
Let's see a practical example of the complete read-inspect-process-write workflow. Remember that these operations are non-destructive, so we are creating a new file, not altering the original, which is read in. Here, we're reading a student dataset, examining its structure and then saving it to a new file. These diagnostic methods help us understand our data before processing it further.7. Let's practice!
Well done on finishing this lesson. CSV files are one of the most commonly used file types for storing data, and so being able to handle them is a useful skill. Let's practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.