BEIR

university

AI & ML interests

BEIR (Benchmarking IR) consists of a homogenous benchmark for diverse sentence or passage level IR tasks. It provides a common and easy framework for the cross-domain evaluation of your retrieval models.

Recent Activity

nthakur authored a paper about 2 months ago

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems

nthakur authored a paper 4 months ago

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

nthakur authored a paper 12 months ago

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

View all activity

BeIR's activity

nthakur

authored a paper about 2 months ago

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems

Paper • 2410.13716 • Published Oct 17, 2024

nthakur

authored a paper 4 months ago

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Paper • 2406.16828 • Published Jun 24, 2024

nthakur

posted an update 8 months ago

Post

3300

🦢 The SWIM-IR dataset contains 29 million text-retrieval training pairs across 27 diverse languages. It is one of the largest synthetic multilingual datasets generated using PaLM 2 on Wikipedia! 🔥🔥

SWIM-IR dataset contains three subsets :
- Cross-lingual:nthakur/swim-ir-cross-lingual
- Monolingual: nthakur/swim-ir-monolingual
- Indic Cross-lingual: nthakur/indic-swim-ir-cross-lingual

Check it out:
https://huggingface.co/collections/nthakur/swim-ir-dataset-662ddaecfc20896bf14dd9b7

nthakur

authored 9 papers 12 months ago

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

Paper • 2306.07471 • Published Jun 13, 2023

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

Paper • 2312.11361 • Published Dec 18, 2023 • 1

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

Paper • 2307.16883 • Published Jul 31, 2023

Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

Paper • 2010.08240 • Published Oct 16, 2020

Evaluating Embedding APIs for Information Retrieval

Paper • 2305.06300 • Published May 10, 2023 • 1

GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

Paper • 2112.07577 • Published Dec 14, 2021

Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

Paper • 2210.09984 • Published Oct 18, 2022 • 2

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Paper • 2104.08663 • Published Apr 17, 2021 • 3

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

Paper • 2311.05800 • Published Nov 10, 2023 • 3

nreimers

authored a paper almost 2 years ago

MTEB: Massive Text Embedding Benchmark

Paper • 2210.07316 • Published Oct 13, 2022 • 6

nreimers

updated a model over 3 years ago

BeIR/sparta-msmarco-distilbert-base-v1

Feature Extraction • Updated Oct 1, 2021 • 70 • 2

AI & ML interests

Recent Activity

Team members 2

BeIR's activity