1 4 24

Burning ray

adarksky

aeryskyB

AI & ML interests

None yet

Recent Activity

updated a model 10 days ago

adarksky/Qwen2.5-0.5B-sft-lora-rel-therapy

published a model 11 days ago

adarksky/Qwen2.5-0.5B-sft-lora-rel-therapy

liked a model 15 days ago

openai/whisper-tiny

View all activity

Organizations

adarksky's activity

updated a model 10 days ago

adarksky/Qwen2.5-0.5B-sft-lora-rel-therapy

Text2Text Generation • Updated 10 days ago

published a model 11 days ago

adarksky/Qwen2.5-0.5B-sft-lora-rel-therapy

Text2Text Generation • Updated 10 days ago

liked a model 15 days ago

openai/whisper-tiny

Automatic Speech Recognition • Updated Feb 29, 2024 • 511k • • 284

upvoted a paper 15 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published 17 days ago • 105

upvoted a paper 18 days ago

Humanity's Last Exam

Paper • 2501.14249 • Published 22 days ago • 62

liked a model 18 days ago

deepseek-ai/Janus-Pro-1B

Any-to-Any • Updated 14 days ago • 101k • 359

liked a model 26 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 6 days ago • 3.71M • • 8.88k

updated a model about 1 month ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 13 days ago • 518k • 3.15k

New activity in hexgrad/Kokoro-82M about 1 month ago

Update kokoro.py

#43 opened about 1 month ago by

adarksky

liked a model about 1 month ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 13 days ago • 518k • 3.15k

liked a model about 2 months ago

deepseek-ai/Janus-1.3B

Any-to-Any • Updated 19 days ago • 186k • 573

reacted to merve's post with 🔥 2 months ago

Post

2676

small but mighty 🔥
you can fine-tune SmolVLM on an L4 with batch size of 4 and it will only take 16.4 GB VRAM 🫰🏻 also with gradient accumulation simulated batch size is 16 ✨
I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work 💝 https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

liked a model 3 months ago

Qwen/Qwen2.5-Coder-32B-Instruct

Text Generation • Updated Jan 12 • 99.5k • • 1.6k

updated a model 3 months ago

adarksky/pokemon-DDPM

Unconditional Image Generation • Updated Nov 11, 2024 • 64

liked a model 3 months ago

tencent/Tencent-Hunyuan-Large

Text Generation • Updated 27 days ago • 511 • 564

updated a model 3 months ago

adarksky/bart-base-rel-therapy

Text2Text Generation • Updated Nov 11, 2024 • 110

liked 4 datasets 3 months ago