Establishing Baselines for Text Classification in Low-Resource Languages Paper • 2005.02068 • Published May 5, 2020
Improving Large-scale Language Models and Resources for Filipino Paper • 2111.06053 • Published Nov 11, 2021
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published Oct 16, 2024 • 32
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 1 day ago • 66
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 9
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 59
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback Paper • 2410.19133 • Published Oct 24, 2024 • 11
M-RewardBench: Evaluating Reward Models in Multilingual Settings Paper • 2410.15522 • Published Oct 20, 2024 • 12
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper • 2407.14933 • Published Jul 20, 2024 • 12
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14, 2024 • 32
Near to Mid-term Risks and Opportunities of Open-Source Generative AI Paper • 2404.17047 • Published Apr 25, 2024 • 1
Simplifying Paragraph-level Question Generation via Transformer Language Models Paper • 2005.01107 • Published May 3, 2020
Evaluating Language Model Finetuning Techniques for Low-resource Languages Paper • 1907.00409 • Published Jun 30, 2019
Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets Paper • 2010.11574 • Published Oct 22, 2020
Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings Paper • 2204.03251 • Published Apr 7, 2022
Multilingual Large Language Models Are Not (Yet) Code-Switchers Paper • 2305.14235 • Published May 23, 2023
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark Paper • 2406.05967 • Published Jun 10, 2024 • 6
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14, 2024 • 32
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages Paper • 2406.10118 • Published Jun 14, 2024 • 32