In this chapter, we learn how to download data files from web servers via the command line. In the process, we also learn about documentation manuals, option flags, and multi-file processing.
We continue our data journey from data downloading to data processing. In this chapter, we utilize the command line library csvkit to convert, preview, filter and manipulate files to prepare our data for further analyses.
In this chapter, we dig deeper into all that csvkit library has to offer. In particular, we focus on database operations we can do on the command line, including table creation, data pull, and various ETL transformation.
In the last chapter, we bridge the connection between command line and other data science languages and learn how they can work together. Using Python as a case study, we learn to execute Python on the command line, to install dependencies using the package manager pip, and to build an entire model pipeline using the command line.