Filtering datasets for evaluation
You are building a training and evaluation pipeline for your company's health care chatbot, which is used by hospitals to onboard new patients.
Your task is to create a pipeline to load the MedQuad-MedicalQnADataset
to evaluate an LLM on its ability to answer medical questions. You are asked to load the dataset in the ds
variable, and only include the first 500 samples of the train
split of the dataset stored in dataset_name
as your evaluation set.
This exercise is part of the course
Fine-Tuning with Llama 3
Exercise instructions
- Import necessary functions and classes from
datasets
. - Load the dataset in the
ds
variable. - Manipulate
ds
to include the first 500 samples of thetrain
split of the dataset stored indataset_name
as your evaluation set.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load neccesary imports from library
from datasets import ____, ____
# Load the training split of the dataset
ds = load_dataset(dataset_name, split=____)
# Filter for the first 500 samples of the dataset
filtered_ds = ____