Skip to main content

Getting Started

Welcome to RAGE4J (RAG Evaluations for Java) - Your Java toolkit for evaluating LLM outputs! 🎉

What is RAGE4J?​

RAGE4J is a Java library suite for evaluating Large Language Model (LLM) outputs. It consists of four modules:

  • RAGE4J-Core: The foundation library providing evaluation metrics and tools
  • RAGE4J-Assert: Testing extensions for integrating LLM evaluations into your test suite
  • RAGE4J-Persist: Persistence layer for saving evaluation results to files
  • RAGE4J-Persist-JUnit5: JUnit 5 extension for automatic persistence lifecycle management

Core Features​

RAGE4J helps you assess LLM outputs across six key dimensions:

  • Correctness: Measures factual accuracy by comparing claims in the LLM output against a ground truth
  • Relevance: Evaluates if the response actually answers the question asked
  • Faithfulness: Checks if the LLM's statements are supported by the provided context
  • Semantic Similarity: Computes how closely the meaning matches a reference answer
  • Bleu score: Computes how close an LLM's response is to a ground truth using n-gram overlap.
  • Rouge score: Provides multiple metrics to evaluate recall, precision and F1 score of unigram, bigram, and LCS overlap between an LLM response and a ground truth.

Library Structure​

  • RAGE4J-Core

    • Evaluation metrics
    • Sample handling
    • Result aggregation
    • Utility functions
  • RAGE4J-Assert

    • Fluent assertion API
    • LLM builder integration
    • Evaluation/strict modes
  • RAGE4J-Persist

    • EvaluationStore interface
    • JsonLinesStore implementation
    • CompositeStore for multiple outputs
  • RAGE4J-Persist-JUnit5

    • @Rage4jPersistConfig annotation
    • Automatic store lifecycle
    • Parameter injection

Explore more about RAGE4J:

  1. RAGE4j-Core
  2. RAGE4j-Assert
  3. RAGE4j-Persist
  4. RAGE4j-Persist-JUnit5
  5. Contribution guide