Get data from other flat files
While CSVs are the most common kind of flat file, you will sometimes find files that use different delimiters. read_csv()
can load all of these with the help of the sep
keyword argument. By default, pandas
assumes that the separator is a comma, which is why we do not need to specify sep
for CSVs.
The version of Vermont tax data here is a tab-separated values file (TSV), so you will need to use sep
to pass in the correct delimiter when reading the file. Remember that tabs are represented as \t
. Once the file has been loaded, the remaining code groups the N1
field, which contains income range categories, to create a chart of tax returns by income category.
This exercise is part of the course
Streamlined Data Ingestion with pandas
Exercise instructions
- Import
pandas
with the aliaspd
. - Load
vt_tax_data_2016.tsv
, making sure to set the correct delimiter with thesep
keyword argument.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import pandas with the alias pd
____
# Load TSV using the sep keyword argument to set delimiter
data = ____(____, ____)
# Plot the total number of tax returns by income group
counts = data.groupby("agi_stub").N1.sum()
counts.plot.bar()
plt.show()