alkinun's picture

alkinun

AtAndDev

·

AI & ML interests

LLMs, Alignment, Merging, Unsloth, DPO, SFT, ORPO, SPIN..

Recent Activity

liked a model 1 day ago

OpenGVLab/InternVL3-14B

upvoted a paper 1 day ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

reacted to hesamation's post with 👍 1 day ago

this paper has been blowing up they train an open-source multimodal LLM (InternVL3) that can compete with GPT-4o and Claude 3.5 Sonnet by: > training text and vision on a single stage > a novel V2PE positional encoding > SFT & mixed preference optimization Paper: https://huggingface.co/papers/2504.10479 > test-time scaling

View all activity

Organizations

AtAndDev's activity

liked a model 1 day ago

OpenGVLab/InternVL3-14B

Image-Text-to-Text • Updated about 12 hours ago • 26.5k • 26

upvoted a paper 1 day ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 3 days ago • 209

reacted to hesamation's post with 👍🔥❤️ 1 day ago

Post

2588

this paper has been blowing up
they train an open-source multimodal LLM (InternVL3) that can compete with GPT-4o and Claude 3.5 Sonnet by:
> training text and vision on a single stage
> a novel V2PE positional encoding
> SFT & mixed preference optimization
Paper: InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models (2504.10479)
> test-time scaling

liked a dataset 8 days ago

madrylab/gsm8k-platinum

Viewer • Updated Mar 11 • 1.21k • 6.45k • 34

replied to Steven10429's post 8 days ago

It seems like they randomly filter out some ppl for some reason...

upvoted a collection 8 days ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated 17 days ago • 444

liked a model 8 days ago

Qwen/Qwen2.5-0.5B-Instruct

Text Generation • Updated Sep 25, 2024 • 1.04M • 304

upvoted a paper 8 days ago

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme

Paper • 2504.02587 • Published 14 days ago • 30

reacted to AdinaY's post with 👍 8 days ago

Post

1763

MAYE🎈a from-scratch RL framework for Vision Language Models, released by GAIR - an active research group from the Chinese community.

✨Minimal & transparent pipeline with standard tools
✨Standardized eval to track training & reflection
✨Open Code & Dataset

Code:
https://github.com/GAIR-NLP/MAYE?tab=readme-ov-file
Dataset:
ManTle/MAYE
Paper:
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme (2504.02587)

1 reply

·

reacted to merterbak's post with ❤️👀🔥 8 days ago

Post

4640

Qwen 3 can launch very soon. 👀

https://github.com/ggml-org/llama.cpp/pull/12828

3 replies

·

replied to their post 9 days ago

No one has

posted an update 11 days ago

Post

2886

Llama 4 is out...

3 replies

·

upvoted an article 11 days ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

13 days ago

• 140

reacted to BestWishYsh's post with 👀🔥 12 days ago

Post

2604

🚨 Hot Take: GPT-4o might NOT be a purely autoregressive model! 🚨

There’s a high chance it has a diffusion head. 🤯 If true, this could be a game-changer for AI architecture. What do you think? 🤔👇

Code: https://github.com/PicoTrex/GPT-ImgEval
Dataset: Yejy53/GPT-ImgEval
Paper: GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation (2504.02782)