Aan de slagGa gratis aan de slag

Binning streams into tiers

Now that you know the popularity distribution, leadership wants to categorize albums by total streams using fixed thresholds rather than quantiles. This ensures tier definitions stay consistent as new data arrives. The Spotify dataset now includes a streams_billions column. Bin albums into three tiers with breaks at 2.5 and 4.0 billion streams.

polars is loaded as pl. The DataFrame spotify is available with a streams_billions column.

Deze oefening maakt deel uit van de cursus

Data Transformation with Polars

Cursus bekijken

Oefeninstructies

  • Bin streams_billions into three tiers with breaks at 2.5 and 4.0.
  • Add the third label "blockbuster" for albums above 4.0 billion streams.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Bin streams_billions into three labeled bands
result = spotify.with_columns(
    pl.col("streams_billions")
    .____(
        breaks=[____, ____],
        labels=["emerging", "established", "____"],
    )
    .alias("stream_band")
)

print(result.head())
Code bewerken en uitvoeren