Load multiple data files
It's perfectly fine to manually import multiple datasets. However, there will be times when you'd want to import a bunch of datasets without having to make multiple read_csv()
calls.
You can use the glob
library that is built into Python to look for files that match a pattern.
The library is called "glob" because "globbing" is the way patterns are specified in the Bash shell.
The glob()
function returns a list of filenames that match a specified pattern.
You can then use a list comprehension to import multiple files into a list, and then you can extract the DataFrame of interest.
This is a part of the course
“Python for R Users”
Exercise instructions
- Obtain a list of all csv files in your current directory and assign it to
csv_files
. - Write a list comprehension that reads in all the
csv
files into a list,dfs
. - Write a list comprehension that looks at the
.shape
of each DataFrame in the list.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
import glob
import pandas as pd
# Get a list of all the csv files
csv_files = glob.____('*.csv')
# List comprehension that loads of all the files
dfs = [pd.read_csv(____) for ____ in ____]
# List comprehension that looks at the shape of all DataFrames
print(____)