@m-ric on Hugging Face: "𝗨𝘀𝗶𝗻𝗴 𝗟𝗟𝗠-𝗮𝘀-𝗮-𝗷𝘂𝗱𝗴𝗲 🧑‍⚖️ 𝗳𝗼𝗿 𝗮𝗻 𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗲𝗱…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

m-ric

posted an update Mar 19

Post

2041

𝗨𝘀𝗶𝗻𝗴 𝗟𝗟𝗠-𝗮𝘀-𝗮-𝗷𝘂𝗱𝗴𝗲 🧑‍⚖️ 𝗳𝗼𝗿 𝗮𝗻 𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗲𝗱 𝗮𝗻𝗱 𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗹𝗲 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻

Evaluating LLM outputs is often hard, since many tasks require open-ended answers for which no deterministic metrics work: for instance, when asking a model to summarize a text, there could be hundreds of correct ways to do it. The most versatile way to grade these outputs is then human evaluation, but it is very time-consuming, thus costly.

🤔 Then 𝘄𝗵𝘆 𝗻𝗼𝘁 𝗮𝘀𝗸 𝗮𝗻𝗼𝘁𝗵𝗲𝗿 𝗟𝗟𝗠 𝘁𝗼 𝗱𝗼 𝘁𝗵𝗲 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻, by providing it relevant rating criteria? 👉 This is the idea behind LLM-as-a-judge.

⚙️ To implement a LLM judge correctly, you need a few tricks.
✅ So 𝗜'𝘃𝗲 𝗷𝘂𝘀𝘁 𝗽𝘂𝗯𝗹𝗶𝘀𝗵𝗲𝗱 𝗮 𝗻𝗲𝘄 𝗻𝗼𝘁𝗲𝗯𝗼𝗼𝗸 𝘀𝗵𝗼𝘄𝗶𝗻𝗴 𝗵𝗼𝘄 𝘁𝗼 𝗶𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁 𝗶𝘁 𝗽𝗿𝗼𝗽𝗲𝗿𝗹𝘆 𝗶𝗻 𝗼𝘂𝗿 𝗛𝘂𝗴𝗴𝗶𝗻𝗴 𝗙𝗮𝗰𝗲 𝗖𝗼𝗼𝗸𝗯𝗼𝗼𝗸! (you can run it instantly in Google Colab)
➡️ 𝗟𝗟𝗠-𝗮𝘀-𝗮-𝗷𝘂𝗱𝗴𝗲 𝗰𝗼𝗼𝗸𝗯𝗼𝗼𝗸: https://huggingface.co/learn/cookbook/llm_judge

The Cookbook is a great collection of notebooks demonstrating recipes (thus the "cookbook") for common LLM usages. I recommend you to go take a look!
➡️ 𝗔𝗹𝗹 𝗰𝗼𝗼𝗸𝗯𝗼𝗼𝗸𝘀: https://huggingface.co/learn/cookbook/index

Thank you @MariaK for your support!

AtAndDev

Mar 19

LLM-as-a-judge is really helpful when creating a DPO dataset as we can determine which response is better.

CultriX

Mar 26

Really cool project!

In this post