Back to Discover

🚀 RAG Passage Evaluator

Strictly grades an LLM-returned passage against a ground-truth answer for a given user query, assigning probabilistic precision and recall scores to quantify how completely and cleanly the passage covers the required information, provided under the MIT license license

Prompt

You will be given a USER_QUERY (e.g., user question), a GROUND_TRUTH_PASSAGE (the ideal response), and an LLM_RETURNED_PASSAGE returned by an LLM as part of a Retrieval-Augmented Generation (RAG) process. Your task is to rate the LLM_RETURNED_PASSAGE on multiple metrics to evaluate its quality compared to the GROUND_TRUTH_PASSAGE. **Please evaluate strictly and avoid scoring leniently.** ### Evaluation Steps: 1. **Read the given USER_QUERY, GROUND_TRUTH_PASSAGE, and LLM_RETURNED_PASSAGE carefully.** 2. **Assess the LLM_RETURNED_PASSAGE based on the following criteria:**    - **RECALL (1-5)**: Evaluate whether the LLM_RETURNED_PASSAGE contains all the necessary information present in the GROUND_TRUTH_PASSAGE to fully address the USER_QUERY. Consider whether any crucial information is missing. A higher score indicates that the LLM_RETURNED_PASSAGE covers all or most of the required information found in the GROUND_TRUTH_PASSAGE.    - **PRECISION (1-5)**: Evaluate how focused the LLM_RETURNED_PASSAGE is compared to the GROUND_TRUTH_PASSAGE in addressing the USER_QUERY. Consider the amount of unnecessary or irrelevant information present in the LLM_RETURNED_PASSAGE that is not in the GROUND_TRUTH_PASSAGE. A higher score indicates that the LLM_RETURNED_PASSAGE contains minimal or no extraneous information compared to the GROUND_TRUTH_PASSAGE. Lower scores should be given to LLM_RETURNED_PASSAGE that include significant amounts of off-topic or unnecessary details not present in the GROUND_TRUTH_PASSAGE. 3. **For each criterion, follow these steps:**    - Analyze the LLM_RETURNED_PASSAGE and determine which pair of adjacent scores (e.g., 2-3, 3-4, or 4-5) best represents its quality compared to the GROUND_TRUTH_PASSAGE for this criterion.    - Estimate the probability distribution between these two adjacent scores, ensuring they sum to 100%.    - Provide reasoning for your choice, highlighting specific aspects of the LLM_RETURNED_PASSAGE in relation to the GROUND_TRUTH_PASSAGE that influenced your decision.    - **Be strict in your assessment and avoid lenient scoring.** 4. **Calculate the Weighted_Summed_Score for each criterion:**    - Multiply each of the two neighboring scores by its estimated probability.    - Sum these two products to get the final Weighted_Summed_Score.    - Example: If you estimate 70% probability for a score of 3 and 30% for a score of 4:      Weighted_Summed_Score = (3 * 0.7) + (4 * 0.3) = 3.3 5. **Format your evaluation as shown in the Example Output below.** ### Example Output: - **RECALL_Reasoning**: The LLM_RETURNED_PASSAGE provides most of the essential information found in the GROUND_TRUTH_PASSAGE, covering the main points and some relevant background. However, it's missing a few minor details that are present in the GROUND_TRUTH_PASSAGE. This results in a higher probability for a score of 4 (80%) and a lower probability for a score of 5 (20%). - **RECALL_Formula**: (4 * 0.8) + (5 * 0.2) - **RECALL_Weighted_Summed_Score**: 4.2 - **PRECISION_Reasoning**: The LLM_RETURNED_PASSAGE is generally as focused as the GROUND_TRUTH_PASSAGE, but it contains some unnecessary information not present in the GROUND_TRUTH_PASSAGE. There are a few instances where the content slightly deviates from the core information provided in the GROUND_TRUTH_PASSAGE. This warrants a higher probability for a score of 3 (90%) and a lower probability for a score of 4 (10%). - **PRECISION_Formula**: (3 * 0.9) + (4 * 0.1) - **PRECISION_Weighted_Summed_Score**: 3.1 --- **Ensure your evaluations are consistent across different passages and maintain a high standard throughout the assessment process. Always compare the LLM_RETURNED_PASSAGE to the GROUND_TRUTH_PASSAGE while considering the USER_QUERY.** USER_QUERY: {{query}} GROUND_TRUTH_PASSAGE: {{passage_gt}} LLM_RETURNED_PASSAGE: {{passage_llm}}