Aan de slagGa gratis aan de slag

Random row selection

In this exercise, you will compare the two methods described for selecting random rows (entries) with replacement in a pandas DataFrame:

  • The built-in pandas function .random()
  • The NumPy random integer number generator np.random.randint()

Generally, in the fields of statistics and machine learning, when we need to train an algorithm, we train the algorithm on the 75% of the available data and then test the performance on the remaining 25% of the data.

For this exercise, we will randomly sample the 75% percent of all the played poker hands available, using each of the above methods, and check which method is more efficient in terms of speed.

Deze oefening maakt deel uit van de cursus

Writing Efficient Code with pandas

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Extract number of rows in dataset
N=poker_hands.shape[0]

# Select and time the selection of the 75% of the dataset's rows
rand_start_time = time.time()
poker_hands.iloc[np.random.randint(____=0, high=____, ____=int(0.75 * N))]
print("Time using Numpy: {} sec".format(time.time() - rand_start_time))
Code bewerken en uitvoeren