1. Packages
Python's built-in modules are extensive, but if they don't offer what we need, we have another option.
2. Modules are Python files
Remember that a module is a single Python file. This means that, while modules come built into Python, anyone can make their own Python file!
3. Packages
This is exactly what many developers have done, and they've even gone a step further by creating collections of files. These collections are known as packages. We might also hear them referred to as libraries.
Packages are publicly available, free of charge!
We can use packages in the same way as Python's built-in modules, with one extra step. To access a package, we first need to download it from the Python Package Index, known as PyPI, which is essentially a directory of packages.
Afterwards, we can import the package, or parts of a package, using the same approach as we took when working with modules.
4. Installing a package
To download a package from PyPI we'll need to open a terminal in MacOS or Command Prompt in Windows. A terminal is a program that allows us to run commands on our computer. Once we've opened a terminal we type the command python3 -m pip install, followed by the package name, and press Return.
Python3 is used to execute Python code from the terminal, while pip stands for preferred installer program, a tool used to install Python packages.
5. Installing a package
Let's look at a popular package for working with data called pandas, which is used for data cleaning, manipulation, and analysis!
We can install pandas using the following command.
6. Importing with an alias
We can now use pandas by importing it, just like we did with the os module.
However, when using pandas functions, it would be great if we didn't have to write pandas-dot every time. It's common to use an alias when importing some packages, including pandas.
To assign an alias, we import pandas, adding the as keyword followed by our alias name. The convention is to name it pd, helping us shorten our code.
7. Creating a DataFrame
Say we have a dictionary containing sales data. It has user_id and order_value keys, each containing lists with respective values.
We can use pandas to turn this into tabular data, similar to what we see in a spreadsheet.
To do this, we use the pandas DataFrame function, writing pd-dot before we call it because we have imported pandas under the alias of pd.
Outputting the variable shows the data is now organized as a table, with each row showing the user_id and their order value.
8. Reading in a CSV file
We might have our data saved in a CSV file and want to read it into Python as a pandas DataFrame.
To do this, we call the pd.read_csv function, passing the file name as a string.
Checking the type confirms it is a pandas DataFrame.
9. Previewing the file
If there are many rows in our data, we might not want to look at all of them. Instead, we could preview the first five rows.
pandas DataFrames have a method for this.
We call the sales_df.head method, and the first five rows are displayed!
As pandas is so widely used, Datacamp has several courses teaching how to use this package!
10. Functions versus methods
We've discussed functions and methods, but let's recap on how they differ.
A function is called to perform a task. A method is a function that is specific to a data type.
11. Functions versus methods
Functions and methods appear similar, so let's distinguish them.
Function examples include the built-in sum function or the pandas function called DataFrame, where we use the syntax of package name-dot-function.
For methods, dot-head is a method that is specific to a data type, in this case a pandas DataFrame.
Therefore, it won't work if we try to use it with other data types like lists.
12. Let's practice!
Time to practice working with packages!