Exercise

Preparing the preference dataset

In this exercise, you'll work with a dataset which contains human feedback in the form of "chosen" and "rejected" outputs. Your task is to extract the prompts from the "chosen" column and prepare the data for training a reward model.

The load_dataset function from datasets has been pre-imported

Instructions

100 XP

Load the trl-internal-testing/hh-rlhf-helpful-base-trl-style dataset from Hugging Face.
Write a function that extracts the prompt from the 'content' field, assuming that the prompt is found at the 0 index of the input to the function.
Apply the function that extracts the prompt to the 'chosen' dataset subset.

.css-6su6fj{-webkit-flex-shrink:0;-ms-flex-negative:0;flex-shrink:0;}Exercise

Instructions

Exercise