Get startedGet started for free

Importing flat files using pandas

1. Importing flat files using pandas

Congrats! You're now able to import a bunch of different types of flat files into Python as NumPy arrays. Although arrays are incredibly powerful and serve a number of essential purposes, they cannot fulfill one of the most basic needs of a Data Scientist:

2. What a data scientist needs

to have "[two]-dimensional labeled data structure[s] with columns of potentially different types" that you can easily perform a plethora of Data Sciencey type things on: manipulate, slice, reshape, groupby, join, merge, perform statistics in a missing-value-friendly manner, deal with times series. The need for such a data structure, among other issues,

3. Pandas and the DataFrame

prompted Wes McKinney to develop the

4. Pandas and the DataFrame

pandas library for Python. Nothing speaks to the project of pandas more than the documentation itself:

5. Pandas and the DataFrame

"Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R". The data structure most relevant to the data manipulation and analysis workflow that pandas offers is the DataFrame and it is the Pythonic analogue of R's data frame.

6. Pandas and the DataFrame

As Hadley Wickham tweeted, "A matrix has rows and columns. A data frame has observations and variables."

7. Manipulating pandas DataFrames

Manipulating DataFrames in pandas can be useful in all steps of the data scientific method, from exploratory data analysis to data wrangling, preprocessing, building models and visualization. Here, we will see its great utility in importing flat files, even merely in the way that it deals with missing data, comments along with the many other issues that plague working data scientists. For all of these reasons, it is now standard and best practice in Data Science to use pandas to import flat files as DataFrames. Later in this course, we'll see how many other types of data, whether they're stored in relational databases, hdf5, MATLAB or excel files, can easily be imported as DataFrames.

8. Importing using pandas

To use pandas, you first need to import it. Then, if we wish to import a CSV in the most basic case all we need to do is to call the function read_csv() and supply it with a single argument, the name of the file. Having assigned the DataFrame to the variable data, we can check the first 5 rows of the DataFrame, including the header, with the command 'data.head'. We can also easily convert the DataFrame to a numpy array. Now it's your turn to play around importing flat files using Python.

9. You'll experience:

You'll get experience importing a flat file that is straightforward and you'll also get experience importing a flat file that has a few issues, such as one that contains comments and strings that should be interpreted as missing values.

10. Let's practice!

Have fun importing!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.