Lj V. Miranda's picture

Lj V. Miranda

ljvmiranda921

·

https://ljvmiranda921.github.io

AI & ML interests

NLP - multilinguality, data-centric AI

Recent Activity

updated a dataset 9 days ago

UD-Filipino/UD_Tagalog-NewsCrawl

updated a model 11 days ago

UD-Filipino/tl_mdeberta_v3_transition

updated a model 11 days ago

UD-Filipino/tl_hash_transition

View all activity

Organizations

ljvmiranda921's activity

upvoted a paper 14 days ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 15 days ago • 334

upvoted a collection 23 days ago

Multilingual LLM Evaluation

Multilingual Evaluation Benchmarks • 6 items • Updated 21 days ago • 9

upvoted a collection 27 days ago

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark S

SEACrowd is a community movement project aimed at centralizing and standardizing AI resources for Southeast Asian languages, cultures, and/or regions. • 3 items • Updated Jun 18, 2024 • 6

upvoted a collection about 1 month ago

OLMo 2

Artifacts for the second set of OLMo models. • 17 items • Updated Nov 27, 2024 • 59

upvoted a paper about 1 month ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 56

upvoted a collection about 1 month ago

Tulu 3 Datasets

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated Nov 27, 2024 • 64

upvoted a paper 2 months ago

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Paper • 2410.19133 • Published Oct 24, 2024 • 11

upvoted a collection 2 months ago

Multilingual RewardBench

Multilingual Reward Model Evaluation Dataset and Results • 2 items • Updated Oct 26, 2024 • 4

upvoted a paper 2 months ago

M-RewardBench: Evaluating Reward Models in Multilingual Settings

Paper • 2410.15522 • Published Oct 20, 2024 • 11

upvoted a paper 4 months ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29, 2024 • 56

upvoted a paper 5 months ago

Consent in Crisis: The Rapid Decline of the AI Data Commons

Paper • 2407.14933 • Published Jul 20, 2024 • 12

upvoted a collection 6 months ago

Reward Bench

Datasets, spaces, and models for the reward model benchmark! • 5 items • Updated Nov 27, 2024 • 9

upvoted a paper 6 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 65

upvoted a paper 7 months ago

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Paper • 2406.10118 • Published Jun 14, 2024 • 30

upvoted a collection about 1 year ago

State-of-the-Art NER models - Tagalog

2 items • Updated Feb 27, 2024 • 2

upvoted 2 papers about 1 year ago

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Paper • 2311.09122 • Published Nov 15, 2023 • 7

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 27