Gabriel Martín Blázquez's picture

Gabriel Martín Blázquez

gabrielmbmb

·

https://gabrielmb.com

AI & ML interests

ML Engineer

Recent Activity

authored a paper about 7 hours ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

liked a model about 9 hours ago

Qwen/Qwen2.5-Math-RM-72B

upvoted a paper about 11 hours ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

View all activity

Organizations

Posts 4

Post

1851

Yesterday @mattshumer released mattshumer/Reflection-Llama-3.1-70B, an impressive model that achieved incredible results in benchmarks like MMLU. The model was fine-tuned using Reflection-Tuning and the dataset used wasn't released, but I created a small recipe with distilabel that allows generating a dataset with a similar output format:

1. We use MagPie 🐦 in combination with https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct to generate reasoning instructions.
2. We generate a response again using https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct, but we steer the LLM to generate an specific output format using a custom system prompt. In the system prompt, we instruct the LLM that it will have first to think 💭 and have reflections that will help resolving ambiguities. After that, we instruct the LLM to generate an output based on the previous thinking

In this dataset gabrielmbmb/distilabel-reflection-tuning you can found 5 rows that I generated with this recipe. You can also found the code of the pipeline in the file called reflection.py.

Articles 1

Article

33

How we leveraged distilabel to create an Argilla 2.0 Chatbot

View all Articles

Collections 5

Papers 1

arxiv:2502.02737

spaces 1

Argilla

models 14

gabrielmbmb/smollm2-1.7B-8k-mix7-ep2-v2-qlora-r16-a16-lr3e4-mix1-dpo

Updated Nov 5, 2024 • 2

gabrielmbmb/SmolLM-1.7B-Instruct-Summarization-Adapter_r32_alpha64_lr3e-4_rslorafalse

Updated Oct 16, 2024 • 2

gabrielmbmb/SmolLM-1.7B-Instruct-Summarization-Adapter_r32_alpha128_lr3e-4_rslorafalse

Updated Oct 16, 2024 • 3

gabrielmbmb/SmolLM-1.7B-Instruct-Summarization-Adapter_r16_alpha16_lr5e-4_rsloratrue

Updated Oct 16, 2024 • 2

gabrielmbmb/SmolLM-1.7B-Instruct-Summarization-Adapter_r16_alpha64_lr5e-4_rsloratrue

Updated Oct 16, 2024 • 3

gabrielmbmb/SmolLM-1.7B-Instruct-IFEval

Text Generation • Updated Oct 1, 2024 • 7

gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B-LoRA-merged-32-2000-steps-adapter

Updated Mar 31, 2024 • 4

gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B-LoRA-merged-32-2000-steps

Text Generation • Updated Mar 31, 2024 • 5

gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B-LoRA-merged-32

Text Generation • Updated Mar 31, 2024 • 11

gabrielmbmb/Upcycled-Qwen1.5-MoE2.7B

Text Generation • Updated Mar 30, 2024 • 11 • 2

datasets 68

gabrielmbmb/reward-scores

Viewer • Updated 1 day ago • 250 • 8

gabrielmbmb/finemath-qa-extraction-5

Viewer • Updated 13 days ago • 10k • 29

gabrielmbmb/finemath-qa-extraction-4

Viewer • Updated 14 days ago • 10k • 27

gabrielmbmb/finemath-qa-extraction-3

Viewer • Updated 14 days ago • 10k • 29

gabrielmbmb/finemath-qa-extraction-2

Viewer • Updated 14 days ago • 10k • 31

gabrielmbmb/finemath-qa-extraction-1

Viewer • Updated 14 days ago • 10k • 36

gabrielmbmb/finemath-qa-extraction

Viewer • Updated 14 days ago • 10k • 77

gabrielmbmb/distilabel-example

Viewer • Updated 14 days ago • 10 • 64

gabrielmbmb/qa-math-generation

Viewer • Updated 15 days ago • 1k • 36

gabrielmbmb/prompt-logprobs

Viewer • Updated 22 days ago • 100 • 62