Binning streams into tiers
Now that you know the popularity distribution, leadership wants to categorize albums by total streams using fixed thresholds rather than quantiles. This ensures tier definitions stay consistent as new data arrives. The Spotify dataset now includes a streams_billions column. Bin albums into three tiers with breaks at 2.5 and 4.0 billion streams.
polars is loaded as pl. The DataFrame spotify is available with a streams_billions column.
Latihan ini merupakan bagian dari kursus
Data Transformation with Polars
Instruksi latihan
- Bin
streams_billionsinto three tiers with breaks at2.5and4.0. - Add the third label
"blockbuster"for albums above 4.0 billion streams.
Latihan interaktif langsung praktik
Cobalah latihan ini dengan melengkapi kode contoh ini.
# Bin streams_billions into three labeled bands
result = spotify.with_columns(
pl.col("streams_billions")
.____(
breaks=[____, ____],
labels=["emerging", "established", "____"],
)
.alias("stream_band")
)
print(result.head())