Skip to main content

Examples

Example: Testing Answer Correctness

RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.when()
.answer(model.generate(QUESTION))
.then()
.assertAnswerCorrectness(0.7);

This example demonstrates how to use the assertAnswerCorrectness feature. It checks if the model's generated answer meets a correctness threshold of 0.7 compared to the defined ground truth.

Example: Testing Faithfulness

RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.contextList(List.of(ANSWER))
.when()
.answer(model::generate)
.then()
.assertFaithfulness(0.7);

This example illustrates the use of assertFaithfulness, ensuring that the generated answer adheres to the provided context and retains at least 0.7 faithfulness compared to the ground truth.

Example: Testing Semantic Similarity

RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.when()
.answer(model::generate)
.then()
.assertSemanticSimilarity(0.7);

In this example, assertSemanticSimilarity is used to verify that the semantic similarity score between the model's answer and the ground truth is at least 0.7.

Example: Testing Answer Relevance

RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.contextList(CONTEXT)
.when()
.answer(model::generate)
.then()
.assertAnswerRelevance(0.7);

This example uses the assertAnswerRelevance feature, checking that the model's answer is relevant to the context provided, with a relevance score of at least 0.7.

Example: Concatenation of multiple assertions

    RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.when()
.answer(model.generate(QUESTION))
.then()
.assertAnswerCorrectness(0.7)
.then()
.assertSemanticSimilarity(0.7);

This example demonstrates how to apply multiple assertions to a single LLM-generated answer. Assertions can be chained, allowing you to combine different evaluation metrics such as correctness and semantic similarity. This is the recommended approach for testing one answer against multiple metrics.