Organization Card

Welcome to HAERAE

We are a non-profit research lab focused on the interpretability and evaluation of Korean language models. Our mission is to advance the field with insightful benchmarks and tools. Below is an overview of our projects.

High-Quality Korean Corpora

Korean WebText : A collection of 2B tokens of Korean text collected from the web.
Korean SyntheticText : A collection of 1.5B tokens of Korean text synthetically generated.

Evaluation Benchmarks

HAE_RAE_BENCH Series:
- HAE_RAE_BENCH_1.0: An evaluation suite for Korean knowledge. See paper for further information.
- HAE_RAE_BENCH_1.1: An ongoing project to refine the HAE_RAE_BENCH 1.0, enhancing its depth and coverage.
KMMLU:
- KMMLU: A Korean reimplementation of MMLU, focusing on comprehensive language understanding across a wide range of subjects. See paper for further information.
- KMMLU-HARD: A subset of KMMLU, with CoT samples.

Bias and Fairness

QARV : An ongoing project aiming to benchmark regional bias in Large Language Models (LLMs).

If you have any inquiries or are interested in joining our team, please contact me at spthsrbwls123@yonsei.ac.kr.

spaces 1

Sleeping

📉

Open Research Questions

models

None public yet

datasets 22

HAE-RAE

AI & ML interests

Recent Activity

Welcome to HAERAE

High-Quality Korean Corpora

Evaluation Benchmarks

Bias and Fairness

spaces 1

Open Research Questions

models

datasets 22

HAERAE-HUB/HRM8K

HAERAE-HUB/butterflies_and_moths_vqa

HAERAE-HUB/hret_agent_idavidrein_gpqa_diamond_translated

HAERAE-HUB/HRMCR

HAERAE-HUB/KHJ-RB-Format

HAERAE-HUB/KUDGE

HAERAE-HUB/HR-Instruct-Math-v0.1

HAERAE-HUB/KOREAN-SyntheticText-1.5B

HAERAE-HUB/Korean-Human-Judgements

HAERAE-HUB/HAE_RAE_BENCH_2.0

AI & ML interests

Recent Activity

Team members 54

Welcome to HAERAE

High-Quality Korean Corpora

Evaluation Benchmarks

Bias and Fairness

spaces 1

Open Research Questions

models

datasets 22 Sort: Recently updated

datasets 22