Get startedGet started for free

String evaluation

Time to really evaluate the final output by comparing it to an answer written by a subject matter expert. You'll use LangSmith's LangChainStringEvaluator class to perform this string comparison evaluation.

A prompt_template for string evaluation has already been written for you as:

You are an expert professor specialized in grading students' answers to questions.
You are grading the following question:{query}
Here is the real answer:{answer}
You are grading the following predicted answer:{result}
Respond with CORRECT or INCORRECT:
Grade:

The output from the RAG chain is stored as predicted_answer and the expert's response is stored as ref_answer.

All of the necessary classes have been imported for you.

This exercise is part of the course

Retrieval Augmented Generation (RAG) with LangChain

View Course

Exercise instructions

  • Create the LangSmith QA string evaluator using the eval_llm and prompt_template provided.
  • Evaluate the RAG output, predicted_answer, by comparing it with the expert's response to the query, which is stored as ref_answer.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create the QA string evaluator
qa_evaluator = ____(
    "____",
    config={
        "llm": ____,
        "prompt": ____
    }
)

query = "How does RAG improve question answering with LLMs?"

# Evaluate the RAG output by evaluating strings
score = qa_evaluator.evaluator.____(
    prediction=____,
    reference=____,
    input=____
)

print(f"Score: {score}")
Edit and Run Code