@anakin87 on Hugging Face: "🧪 RAG Evaluation with 🔥 Prometheus 2 + Haystack 📝 Blog post:…"

Post

932

🧪 RAG Evaluation with 🔥 Prometheus 2 + Haystack

📝 Blog post: https://haystack.deepset.ai/blog/rag-evaluation-with-prometheus-2
📓 Notebook: https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/prometheus2_evaluation.ipynb

─── ⋆⋅☆⋅⋆ ───

When evaluating LLMs' responses, 𝐩𝐫𝐨𝐩𝐫𝐢𝐞𝐭𝐚𝐫𝐲 𝐦𝐨𝐝𝐞𝐥𝐬 like GPT-4 are commonly used due to their strong performance.
However, relying on closed models presents challenges related to data privacy 🔒, transparency, controllability, and cost 💸.

On the other hand, 𝐨𝐩𝐞𝐧 𝐦𝐨𝐝𝐞𝐥𝐬 typically do not correlate well with human judgments and lack flexibility.

🔥 Prometheus 2 is a new family of open-source models designed to address these gaps:
🔹 two variants: prometheus-eval/prometheus-7b-v2.0; prometheus-eval/prometheus-8x7b-v2.0
🔹 trained on open-source data
🔹 high correlation with human evaluations and proprietary models
🔹 highly flexible: capable of performing direct assessments and pairwise rankings, and allowing the definition of custom evaluation criteria.

See my experiments with RAG evaluation in the links above.

Join the conversation