When / Otherwise
This requirement is similar to the last, but now you want to add multiple values based on the voter's position. Modify your voter_df DataFrame to add a random number to any voting member that is defined as a Councilmember. Use 2 for the Mayor and 0 for anything other position.
The voter_df Data Frame is defined and available to you. The pyspark.sql.functions library is available as F. You can use F.rand() to generate the random value.
Latihan ini adalah bagian dari kursus
Cleaning Data with PySpark
Petunjuk latihan
- Add a column to
voter_dfnamedrandom_valwith the results of theF.rand()method for any voter with the title Councilmember. Setrandom_valto 2 for the Mayor. Set any other title to the value 0. - Show some of the Data Frame rows, noting whether the clauses worked.
- Use the
.filterclause to find 0 inrandom_val.
Latihan interaktif praktis
Cobalah latihan ini dengan menyelesaikan kode contoh berikut.
# Add a column to voter_df for a voter based on their position
voter_df = voter_df.____('random_val',
when(voter_df.TITLE == 'Councilmember', ____)
.____(____, 2)
____
# Show some of the DataFrame rows
voter_df.show()
# Use the .filter() clause with random_val
voter_df.____(____).show()