1
Python Data Structures and String Manipulation
Free
In this chapter, we'll refresh our knowledge of the main data structures used in Python. We'll cover how to deal with lists, tuples, sets, and dictionaries. We'll also consider strings and how to write regular expressions to retrieve specific character sequences from a given text.
2
Iterable objects and representatives
This chapter focuses on iterable objects. We'll refresh the definition of iterable objects and explain, how to identify one. Next, we'll cover list comprehensions, which is a very special feature of Python programming language to define lists. Then, we'll recall how to combine several iterable objects into one. Finally, we'll cover how to create custom iterable objects using generators.
3
Functions and lambda expressions
This chapter will focus on the functional aspects of Python. We'll start by defining functions with a variable amount of positional as well as keyword arguments. Next, we'll cover lambda functions and in which cases they can be helpful. Especially, we'll see how to use them with such functions as map(), filter(), and reduce(). Finally, we'll recall what is recursion and how to correctly implement one.
4
Python for scientific computing
This chapter will cover topics on scientific computing in Python. We'll start by explaining the difference between NumPy arrays and lists. We'll define why the former ones suit better for complex calculations. Next, we'll cover some useful techniques to manipulate with pandas DataFrames. Finally, we'll do some data visualization using scatterplots, histograms, and boxplots.

Simple use of .apply()

Let's get some handful experience with .apply()!

You are given the full scores dataset containing students' performance as well as their background information.

Your task is to define the prevalence() function and apply it to the groups_to_consider columns of the scores DataFrame. This function should retrieve the most prevalent group/category for a given column (e.g. if the most prevalent category in the lunch column is standard, then prevalence() should return standard).

The reduce() function from the functools module is already imported.

Tip: pd.Series is an Iterable object. Therefore, you can use standard operations on it.

Create a tuple list with unique items from passed object series and their counts.
Extract a tuple with the highest counts using reduce().
Return the item with the highest counts.
Apply the prevalence function on the scores DataFrame using columns specified in groups_to_consider.

script.py

IPython Shell