ml-fw-prerelease

Enterprise

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

SivilTaram authored a paper about 1 month ago

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

SivilTaram authored a paper about 2 months ago

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

SivilTaram authored a paper about 2 months ago

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

View all activity

ml-fw-prerelease's activity

alielfilali01

posted an update 4 days ago

Post

1636

~75% on the challenging GPQA with only 40M parameters 🔥🥳

GREAT ACHIEVEMENT ! Or is it ?

This new Work, "Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation", take out the mystery about many models i personally suspected their results. Speacially on leaderboards other than the english one, Like the Open Arabic LLM Leaderbaord OALL/Open-Arabic-LLM-Leaderboard.

The authors of this work, first started by training a model on the GPQA data, which, unsurprisingly, led to the model achieving 100% performance.

Afterward, they trained what they referred to as a 'legitimate' model on legitimate data (MedMCQA). However, they introduced a distillation loss from the earlier, 'cheated' model.

What they discovered was fascinating: the knowledge of GPQA leaked through this distillation loss, even though the legitimate model was never explicitly trained on GPQA during this stage.

This raises important questions about the careful use of distillation in model training, especially when the training data is opaque. As they demonstrated, it’s apparently possible to (intentionally or unintentionally) leak test data through this method.

Find out more: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation (2412.15255)

1 reply

alielfilali01

posted an update 21 days ago

Post

3377

Unpopular opinion: Open Source takes courage to do !

Not everyone is brave enough to release what they have done (the way they've done it) to the wild to be judged !
It really requires a high level of "knowing wth are you doing" ! It's kind of a super power !

Cheers to the heroes here who see this!

3 replies

alielfilali01

posted an update 25 days ago

Post

1504

Apparently i forgot to put this here !

Well, this is a bit late but consider given our recent blog a read if you are interested in Evaluation.

You don't have to be into Arabic NLP in order to read it, the main contribution we are introducing is a new evaluation measure for NLG. We made the fisrt application of this measure on Arabic for now and we will be working with colleagues from the community to expand it to other languages.

Blog:
Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard
https://huggingface.co/blog/leaderboard-3c3h-aragen

Space:
inceptionai/AraGen-Leaderboard

Give it a read and let me know your thoughts 🤗

BramVanroy

posted an update 28 days ago

Post

458

In the spirit of "Better late than never", I've finally written a brief overview paper for GEITje 7B Ultra. Initially released 10 months ago (oops), but still reaching around 1300 monthly downloads across the HF ecosystem (not including ollama).

GEITje 7B Ultra: A Conversational Model for Dutch (2412.04092)

While the paper discusses the model a little bit, I especially wanted to write about the datasets, which to this day seem an important asset for Dutch LLM training (SFT and preference tuning). We have a long way to go for Dutch, but publishing transparent and reproducible artefacts seems an important step to me, alongside having open discussions about data, bias, architectures.

In that spirit, thanks are in order for the creation of GEITje 7B Ultra and all related datasets:

- Michiel Buisman and UWV for providing the means to create the datasets
- Flemish Supercomputer Center (VSC) for the compute
- The Hugging Face Fellows and rest of the team for their discussions and insights
- The Dutch NLP community, notably @Rijgersberg for building the base GEITje model and the fruitful discussions we've had

More to come, step by step!

BramVanroy/geitje-7b-ultra-65c1ee010ad80fd1f6a8f208

SivilTaram

authored a paper about 1 month ago

When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training

Paper • 2411.13476 • Published Nov 20, 2024 • 15

SivilTaram

authored 2 papers about 2 months ago

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Paper • 2407.10956 • Published Jul 15, 2024 • 6

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published Nov 7, 2024 • 111

alielfilali01

posted an update about 2 months ago

Post

2183

Unpopular opinion : o1-preview is more stupid than 4o and Qwen2.5-72B-Instruct in extremely underrated !

2 replies

Fakhraddin

authored a paper about 2 months ago

Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks

Paper • 2411.01192 • Published Nov 2, 2024 • 3

3ebdola

authored a paper about 2 months ago

Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks

Paper • 2411.01192 • Published Nov 2, 2024 • 3

alielfilali01

posted an update 2 months ago

Post

1701

I feel like this incredible resource hasn't gotten the attention it deserves in the community!

@clefourrier and generally the HuggingFace evaluation team put together a fantastic guidebook covering a lot about 𝗘𝗩𝗔𝗟𝗨𝗔𝗧𝗜𝗢𝗡 from basics to advanced tips.

link : https://github.com/huggingface/evaluation-guidebook

I haven’t finished it yet, but i'am enjoying every piece of it so far. Huge thanks @clefourrier and the team for this invaluable resource !

3 replies

ouhenio

authored 2 papers 3 months ago

Large Language Models are biased to overestimate profoundness

Paper • 2310.14422 • Published Oct 22, 2023

Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness

Paper • 2309.15991 • Published Sep 27, 2023

pere

authored 3 papers 3 months ago

Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model

Paper • 2104.09617 • Published Apr 19, 2021

Boosting Norwegian Automatic Speech Recognition

Paper • 2307.01672 • Published Jul 4, 2023 • 1

Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges

Paper • 2402.01917 • Published Feb 2, 2024

RefalMachine

authored a paper 3 months ago

Impact of Tokenization on LLaMa Russian Adaptation

Paper • 2312.02598 • Published Dec 5, 2023 • 5

pere

authored a paper 3 months ago

COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter

Paper • 2005.07503 • Published May 15, 2020

SivilTaram

authored a paper 3 months ago

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

Paper • 2410.07137 • Published Oct 9, 2024 • 7

alielfilali01

posted an update 3 months ago

Post

1828

Why nobdoy is talking about the new training corpus released by MBZUAI today.

TxT360 is +15 Trillion tokens corpus outperforming FineWeb on several metrics. Ablation studies were done up to 1T tokens.

Read blog here : LLM360/TxT360
Dataset : LLM360/TxT360

2 replies

AI & ML interests

Recent Activity

Team members 27

ml-fw-prerelease's activity