Creating training samples
As part of a customer service chatbot that your team is building, you are creating a pipeline to preprocess a dataset that will eventually be used to fine-tune a language model so that it can predict the intent of a customer's question and route the requests to the correct team for processing.
You are given a dataset with the customer's question and intent in separate columns, and you want to preprocess the dataset so that you have merged each example containing the question and intent into a single string with your formatted prompt.
The dataset is already loaded in dataset
. The dataset contains the columns instruction
with the customer question, and intent
for the user's intent.
This exercise is part of the course
Fine-Tuning with Llama 3
Exercise instructions
- Create a prompt string with the instruction and intent in the form
"Query: {instruction}\nIntent: {intent}"
. - Fill out the function call with the dataset to apply the
create_intent_example
to each row. - Extract and print out the value in the
intent_example
column in the first row of the dataset.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
def create_intent_example(row):
# Fill out the columns in the prompt
row['intent_example'] = ____
return row
# Call the ds method to apply our preprocessing function to all rows
processed_dataset = dataset.____(____)
# Print the intent_example in the first row of the processed data
print(processed_dataset[____][____])