--- license: mit language: - en tags: - rag - evaluation - metrics - triad - answer relevance - context relevance - groundedness - nuclia - retrieval augmented generation library_name: nuclia-eval --- # Model Card for REMi (**R**AG **E**valuation **M**etr**i**cs) v0

Nuclia, the all-in-one RAG as a service platform.

> [!IMPORTANT] > ❗ > We only support using REMi with the `nuclia-eval` library and not through Huggingface's `transformers`. This is due to Mistral's [comment](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3#:~:text=%E2%9D%97%20We%20recommend%20using%20mistral_common%20for%20tokenization%20as%20the%20transformers%20tokenizer%20has%20not%20been%20tested%20by%20the%20Mistral%20team%20and%20might%20give%20incorrect%20results.) on the Huggingface tokenizer not being tested with their v3 models as well as the differences in LoRa implementation differences between Mistral's finetuning library and Huggingface's `PEFT` library. ## Model Description REMi-v0 (**R**AG **E**valuation **M**etr**i**cs) is a LoRa adapter for the [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) model. It has been finetuned by the team at [**nuclia**](https://nuclia.com) to evaluate the quality of the overall RAG experience. Its evaluation follows the RAG triad as proposed by [TruLens](https://www.trulens.org/trulens_eval/getting_started/core_concepts/rag_triad/): ![rag triad](assets/RAG_Triad.jpg) In summary, the metrics REMi provides for a RAG Experience involving a **question**, an **answer** and N pieces of **context** are: * **Answer Relevance**: Answer relevance refers to the directness and appropriateness of the response in addressing the specific question asked, providing accurate, complete, and contextually suitable information. * **score**: A score between 0 and 5 indicating the relevance of the answer to the question. * **reason**: A string explaining the reason for the score. * For each of the N pieces of context: * **Context Relevance Score**: The context relevance is the relevance of the **context** to the **question**, on a scale of 0 to 5. * **Groundedness Score**: Groundedness is defined as the degree of information overlap to which the **answer** contains information that is substantially similar or identical to that in the **context** piece. The score is between 0 and 5. ## Usage ### With [`nuclia-eval`](https://github.com/nuclia/nuclia-eval) **Installation** ```bash pip install nuclia-eval ``` **Usage** ```python from nuclia_eval import REMi evaluator = REMi() query = "By how many Octaves can I shift my OXYGEN PRO 49 keyboard?" context1 = """\ * Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up. * Oxygen Pro 61's keyboard can be shifted 3 octaves down or 3 octaves up. To change the transposition of the keyboard, press and hold Shift, and then use the Key Octave –/+ buttons to lower or raise the keybed by one one, respectively. The display will temporarily show TRANS and the current transposition (-12 to 12).""" context2 ="""\ To change the octave of the keyboard, use the Key Octave –/+ buttons to lower or raise the octave, respectively The display will temporarily show OCT and the current octave shift.\n\nOxygen Pro 25's keyboard can be shifted 4 octaves down or 5 octaves up""" context3 = """\ If your DAW does not automatically configure your Oxygen Pro series keyboard, please follow the setup steps listed in the Oxygen Pro DAW Setup Guides. To set the keyboard to operate in Preset Mode, press the DAW/Preset Button (on the Oxygen Pro 25) or Preset Button (on the Oxygen Pro 49 and 61). On the Oxygen Pro 25 the DAW/Preset button LED will be off to show that Preset Mode is selected. On the Oxygen Pro 49 and 61 the Preset button LED will be lit to show that Preset Mode is selected.""" answer = "Based on the context provided, The Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up." result = evaluator.evaluate_rag(query=query, answer=answer, contexts=[context1, context2, context3]) answer_relevance, context_relevances, groundednesses = result print(f"{answer_relevance.score}, {answer_relevance.reason}") # 5, The response directly answers the query by specifying the range of octave shifts for the Oxygen Pro 49 keyboard. print([cr.score for cr in context_relevances]) # [5, 1, 0] print([g.score for g in groundednesses]) # [2, 0, 0] ``` **Granularity** The **REMi** evaluator provides a fine-grained evaluation and strict of the RAG triad. For instance if we slightly modify the answer to the query: ```diff - answer = "Based on the context provided, The Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up." + answer = "Based on the context provided, the Oxygen Pro 49's keyboard can be shifted 4 octaves down or 4 octaves up." ... print([g.score for g in groundednesses]) # [0, 0, 0] ``` As the information provided in the answer is not present in any of the contexts, the groundedness score is 0 for all contexts. What if the information in the answer does not answer the question? ```diff - answer = "Based on the context provided, The Oxygen Pro 49's keyboard can be shifted 3 octaves down or 4 octaves up." + answer = "Based on the context provided, the Oxygen Pro 61's keyboard can be shifted 3 octaves down or 4 octaves up." ... print(f"{answer_relevance.score}, {answer_relevance.reason}") # 1, The response is relevant to the entire query but incorrectly mentions the Oxygen Pro 61 instead of the Oxygen Pro 49 ``` **Computing individual metrics** We can also compute each metric separately: ```python ... answer_relevance = evaluator.answer_relevance(query=query, answer=answer) context_relevances = evaluator.context_relevance(query=query, contexts=[context1, context2, context3]) groundednesses = evaluator.groundedness(answer=answer, contexts=[context1, context2, context3]) ... ``` ### With Huggingface's transformers As noted in the comment on top of the model card, using REMi through Huggingface's transformers library is not supported ## Feedback and Community For feedback, questions, or to get in touch with the **nuclia** team, we are available on our [community Slack channel](https://join.slack.com/t/nuclia-community/shared_invite/zt-2l7jlgi6c-Oohv8j3ygdKOvD_PwZhfdg). ## About Nuclia We are [**Nuclia**](nuclia.com), the all-in-one RAG as a service platform. We are the catalyst for the new generation of Search through the transformative advancements of Generative AI. We focus on providing answers, fundamentally shifting how search is perceived and applied. Nuclia has been built to be your data swiss army knife; a ‘no-code’ solution for you to use out of the box OR as a toolset, a platform on which you can build your own applications. Whichever path you follow, we will support you every step of the way with world class processes and dedicated people to help drive value for you and your customers.