Skip to main content

Examples

Example: Testing Answer Correctness

RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.when()
.answer(model::generate)
.then()
.assertAnswerCorrectness(0.7);

This example demonstrates how to use the assertAnswerCorrectness feature. It checks if the model's generated answer meets a correctness threshold of 0.7 compared to the defined ground truth.

Example: Testing Faithfulness

RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.contextList(List.of(ANSWER))
.when()
.answer(model::generate)
.then()
.assertFaithfulness(0.7);

This example illustrates the use of assertFaithfulness, ensuring that the generated answer adheres to the provided context and retains at least 0.7 faithfulness compared to the ground truth.

Example: Testing Semantic Similarity

RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.when()
.answer(model::generate)
.then()
.assertSemanticSimilarity(0.7);

In this example, assertSemanticSimilarity is used to verify that the semantic similarity score between the model's answer and the ground truth is at least 0.7.

Example: Testing Answer Relevance

RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.when()
.answer(model::generate)
.then()
.assertAnswerRelevance(0.7);

This example uses the assertAnswerRelevance feature, checking that the model's answer is relevant to the context provided, with a relevance score of at least 0.7.

Example: Testing Bleu Score

RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.when()
.answer(model::generate)
.then()
.assertBleuScore(0.7);

This example uses the assertBleuScore feature, testing that the exact n-gram overlap of the model's answer against a ground truth has a precision of at least 0.7.

Example: Testing Rouge Score

RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.when()
.answer(model::generate)
.then()
assertRougeScore(0.9, RougeScoreEvaluator.RougeType.ROUGE_L_SUM, RougeScoreEvaluator.MeasureType.PRECISION));

This example uses the assertRougeScore feature, with the ROUGE_L_SUM metric. This example makes sure that the LCS across multiple sentences yields a precision of 0.9.

Example: Concatenation of multiple assertions

    RageAssert rageAssert = new OpenAiLLMBuilder().fromApiKey(key);
rageAssert.given()
.question(QUESTION)
.groundTruth(GROUND_TRUTH)
.when()
.answer(model.generate(QUESTION))
.then()
.assertAnswerCorrectness(0.7)
.then()
.assertSemanticSimilarity(0.7);

This example demonstrates how to apply multiple assertions to a single LLM-generated answer. Assertions can be chained, allowing you to combine different evaluation metrics such as correctness and semantic similarity. This is the recommended approach for testing one answer against multiple metrics.