@clefourrier on Hugging Face: "🏅 New top model on the GAIA benchmark! Called FRIDAY, it's a mysterious…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

clefourrier

posted an update Jan 24, 2024

Post

🏅 New top model on the GAIA benchmark!

Called FRIDAY, it's a mysterious new autonomous agent, which got quite good performances on both the public validation set *and* the private test set.
It notably passed 10 points for the val and 5 points for the test set on our hardest questions (level 3): they require to take arbitrarily long sequences of actions, use any number of tools, and access the world in genera! ✨

The GAIA benchmark evaluates next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc) and was co authored by @gregmialz @ThomasNLG @ylecun @thomwolf and myself: gaia-benchmark/leaderboard