Core Concepts
Sample
The Sample
class is the fundamental data structure representing an evaluation instance:
Sample sample = Sample.builder()
.withQuestion("What is the capital of France?")
.withAnswer("Paris is the capital of France.")
.withGroundTruth("Paris is the capital and largest city of France.")
.withContextsList(Arrays.asList("Paris is the capital of France..."))
.build();
A Sample typically consists of:
- A question: the prompt or input to the language model.
- An answer: the model-generated response.
- A ground truth: the expected or correct answer.
- Contexts (optional): additional information related to the question.
Evaluators
Each evaluator implements the Evaluator
interface and focuses on a specific aspect of evaluation:
public interface Evaluator {
Evaluation evaluate(Sample sample);
}
Evaluation
The Evaluation class represents the result of a single metric assessment:
Evaluation result = evaluator.evaluate(sample);
String metricName = result.getName(); // e.g., "Answer correctness"
double score = result.getValue(); // Score between 0 and 1
Evaluation Aggregation
Results from multiple evaluators can be combined using the EvaluationAggregator
:
public class EvaluationAggregator {
public static EvaluationAggregation evaluateAll(Sample sample, Evaluator... evaluators);
}
Example Usage
Here's a complete example demonstrating how to evaluate an LLM response using multiple metrics:
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.embedding.EmbeddingModel;
public class EvaluationExample {
public static void main(String[] args) {
ChatLanguageModel chatModel = /* Any Langchain4j ChatLanguageModel */
EmbeddingModel embeddingModel = /* Any Langchain4j EmbeddingModel */
Evaluator relevanceEvaluator = new AnswerRelevanceEvaluator(chatModel, embeddingModel);
Evaluator correctnessEvaluator = new AnswerCorrectnessEvaluator(chatModel);
Evaluator faithfulnessEvaluator = new FaithfulnessEvaluator(chatModel);
Evaluator similarityEvaluator = new AnswerSemanticSimilarityEvaluator(embeddingModel);
Sample sample = Sample.builder()
.withQuestion("What are the main features of Java?")
.withAnswer("Java is object-oriented, platform-independent, and has automatic memory management.")
.withGroundTruth("Java's main features include object-oriented programming, platform independence through JVM, automatic memory management (garbage collection), and strong type safety.")
.withContextsList(Arrays.asList(
"Java is a popular programming language...",
"Key features of Java include..."
))
.build();
EvaluationAggregation results = EvaluationAggregator.evaluateAll(sample,
relevanceEvaluator,
correctnessEvaluator,
faithfulnessEvaluator,
similarityEvaluator
);
// Access results
System.out.println("Relevance score: " + results.get("Answer relevance"));
System.out.println("Correctness score: " + results.get("Answer correctness"));
System.out.println("Faithfulness score: " + results.get("Faithfulness"));
System.out.println("Semantic similarity: " + results.get("Answer semantic similarity"));
}
}