1061 40 168

Clémentine Fourrier

clefourrier

http://clefourrier.github.io

AI & ML interests

None yet

Recent Activity

updated a dataset 2 days ago

gaia-benchmark/results_public

new activity 5 days ago

open-llm-leaderboard/open_llm_leaderboard:Proposal for new column

liked a model 5 days ago

utter-project/EuroLLM-1.7B

View all activity

Articles

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

Jun 18

• 43

Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens tokens and 11 languages

May 24

• 25

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

May 24

• 21

Let's talk about LLM evaluation

May 23

• 140

Introducing the Open Arabic LLM Leaderboard

May 14

• 76

Introducing the Open Leaderboard for Hebrew LLMs!

May 5

• 32

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

May 3

• 13

Improving Prompt Consistency with Structured Generations

Apr 30

• 58

Introducing the Open Chain of Thought Leaderboard

Apr 23

• 27

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

Apr 19

• 126

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

Apr 16

• 14

Introducing the Chatbot Guardrails Arena

Mar 21

• 4

Introducing ConTextual: How well can your Multimodal model jointly reason over text and image in text-rich scenes?

Mar 5

• 4

TTS Arena: Benchmarking Text-to-Speech Models in the Wild

Feb 27

• 41

Introducing the Red-Teaming Resistance Leaderboard

Feb 23

• 13

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

Feb 20

• 3

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

Feb 2

• 3

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

Jan 31

• 3

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

Jan 29

• 17

A guide to setting up your own Hugging Face leaderboard: an end-to-end example with Vectara's hallucination leaderboard

Jan 12

• 6

2023, year of open LLMs

Dec 18, 2023

• 5

Open LLM Leaderboard: DROP deep dive

Dec 1, 2023

• 5

Overview of natively supported quantization schemes in 🤗 Transformers

Sep 12, 2023

• 11

What's going on with the Open LLM Leaderboard?

Jun 23, 2023

• 23

Introduction to Graph Machine Learning

Jan 3, 2023

• 19

Organizations

Posts 16

Post

5438

In a basic chatbots, errors are annoyances. In medical LLMs, errors can have life-threatening consequences 🩸

It's therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.

This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.

openlifescienceai/open_medical_llm_leaderboard

Congrats to @aaditya and @pminervini !
Learn more in the blog: https://huggingface.co/blog/leaderboard-medicalllm

Post

4418

Contamination free code evaluations with LiveCodeBench! 🖥️

LiveCodeBench is a new leaderboard, which contains:
- complete code evaluations (on code generation, self repair, code execution, tests)
- my favorite feature: problem selection by publication date 📅

This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! 🚀

Check it out!

Blog: https://huggingface.co/blog/leaderboard-livecodebench
Leaderboard: livecodebench/leaderboard

Congrats to @StringChaos @minimario @xu3kev @kingh0730 and @FanjiaYan for the super cool leaderboard!

View all posts

Collections 2

Papers 7

spaces 1

pinned

Paused

🥇

Backend

models 2

clefourrier/graphormer-base-pcqm4mv1

Graph Machine Learning • Updated Feb 7, 2023 • 105 • 4

clefourrier/graphormer-base-pcqm4mv2

Graph Machine Learning • Updated Feb 7, 2023 • 1.06k • 65

datasets

None public yet