Creating training samples

As part of a customer service chatbot that your team is building, you are creating a pipeline to preprocess a dataset that will eventually be used to fine-tune a language model so that it can predict the intent of a customer's question and route the requests to the correct team for processing.

You are given a dataset with the customer's question and intent in separate columns, and you want to preprocess the dataset so that you have merged each example containing the question and intent into a single string with your formatted prompt.

The dataset is already loaded in dataset. The dataset contains the columns instruction with the customer question, and intent for the user's intent.

Create a prompt string with the instruction and intent in the form "Query: {instruction}\nIntent: {intent}".
Fill out the function call with the dataset to apply the create_intent_example to each row.
Extract and print out the value in the intent_example column in the first row of the dataset.

Preparing for Llama fine-tuning

Fine-tuning with SFTTrainer on Hugging Face

Exercise

Creating training samples

Instructions