Get startedGet started for free

Select random rows

1. Select random rows

Welcome back! In the last lesson of this first chapter, we will explore a couple of different ways to randomly select rows and columns in a pandas DataFrame.

2. Sampling random rows using pandas

Sometimes, we may not be interested in a specific entry or feature of a DataFrame. Instead, we may want to select one or several completely random ones. In the poker dataset, we select 100 random rows, which correspond to 100 poker hands. We will use pandas .sample() function, which was written for that specific reason. The syntax for this function is very straightforward; we denote how many samples we want to obtain, and the axis we want to sample from; in this case, we use 0 for rows.

3. Sampling random rows using numpy

NumPy includes a random integer generation function, random.randint(), which we can use to get random numbers from a predefined range. We can then use these numbers as indices for the DataFrame's rows. The syntax of this function is very simple: we give the lowest and highest numbers we want to sample from with arguments low and high, and the number of samples we want with argument size. Built-in functions are always handy because they are optimized for specific tasks. We can see that the pandas sample() function is about 37% faster than NumPy's random integer generator.

4. Sampling random columns

For the same poker dataset, we are interested in selecting three random features from all the features in the dataset. As we did when sampling random rows, we can still use pandas' .sample() function, just changing the axis to 1 to now select from features instead of rows. Similarly, we can slightly modify NumPy's random integer generator to generate random column indices, and use the .iloc() function to locate all the rows and the selected columns. As we can see, pandas' .sample() function performs 60% faster than NumPy's random integer generator.

5. Let's do it!

Now it's your turn to try your hand at randomly sampling rows!