Atla

company

Verified

https://www.atla-ai.com

Atla_AI

atla-ai

AI & ML interests

Scalable oversight

Recent Activity

kaikaidai updated a Space 11 days ago

AtlaAI/judge-arena

kaikaidai new activity 14 days ago

AtlaAI/judge-arena:Promotion to get more voters

kaikaidai new activity 27 days ago

AtlaAI/judge-arena:About adding a judge model to the leaderboard

View all activity

AtlaAI's activity

kaikaidai

updated a Space 11 days ago

Judge Arena

kaikaidai

in AtlaAI/judge-arena 14 days ago

Promotion to get more voters

#7 opened 16 days ago by

kaikaidai

posted an update 26 days ago

Post

1026

📈 Early results on the 8B evaluation model we've been training...

@NinaCalvi wrote about the progress we've made this quarter towards training the best 'LLM-as-a-judge' evaluator. We've significantly improved against the baseline and are approaching state-of-the-art evaluation performance with an 8B model.

Next up: training Llama-3.1-70B 👀

Here's the full article: https://www.atla-ai.com/post/evaluating-the-evaluator

2 replies

·

kaikaidai

in AtlaAI/judge-arena 27 days ago

About adding a judge model to the leaderboard

#6 opened 28 days ago by

kaikaidai

in AtlaAI/judge-arena about 1 month ago

Which models do you want to see on here?

#2 opened about 1 month ago by

Apply for community grant: Company project (gpu and storage)

#5 opened about 1 month ago by

What are Meta-Llama-3.1-Instruct "Turbo" models?

#4 opened about 1 month ago by

mbartolo

authored 9 papers 8 months ago

Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension

Paper • 2002.00293 • Published Feb 2, 2020

Interpretation of Natural Language Rules in Conversational Machine Reading

Paper • 1809.01494 • Published Aug 28, 2018

Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality

Paper • 2204.03162 • Published Apr 7, 2022 • 1

Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

Paper • 2104.08678 • Published Apr 18, 2021

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

Paper • 2204.01906 • Published Apr 5, 2022

Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants

Paper • 2112.09062 • Published Dec 16, 2021

DMLR: Data-centric Machine Learning Research -- Past, Present and Future

Paper • 2311.13028 • Published Nov 21, 2023 • 1

Human Feedback is not Gold Standard

Paper • 2309.16349 • Published Sep 28, 2023 • 5

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Paper • 2404.16019 • Published Apr 24 • 1

mbartolo

authored a paper 11 months ago

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

Paper • 2402.06619 • Published Feb 9 • 54