Paraskevi Kivroglou's picture

Paraskevi Kivroglou

KvrParaskevi

AI & ML interests

I am looking forward into a world full of AI innovation. By having small ideas in new projects, I want to take the next step and give them life.

Recent Activity

liked a dataset about 21 hours ago
claudios/code_search_net
liked a model about 1 month ago
deepseek-ai/DeepSeek-V3
liked a dataset about 1 month ago
semeru/code-text-python
View all activity

Organizations

GEM benchmark's profile picture lora concepts library's profile picture Blog-explorers's profile picture ZeroGPU Explorers's profile picture INNOVA AI's profile picture Cognitive Computations's profile picture

KvrParaskevi's activity

reacted to reach-vb's post with 🚀 4 months ago
view post
Post
3045
Smol models ftw! AMD released AMD OLMo 1B - beats OpenELM, tiny llama on MT Bench, Alpaca Eval - Apache 2.0 licensed 🔥

> Trained with 1.3 trillion (dolma 1.7) tokens on 16 nodes, each with 4 MI250 GPUs

> Three checkpoints:

- AMD OLMo 1B: Pre-trained model
- AMD OLMo 1B SFT: Supervised fine-tuned on Tulu V2, OpenHermes-2.5, WebInstructSub, and Code-Feedback datasets
- AMD OLMo 1B SFT DPO: Aligned with human preferences using Direct Preference Optimization (DPO) on UltraFeedback dataset

Key Insights:
> Pre-trained with less than half the tokens of OLMo-1B
> Post-training steps include two-phase SFT and DPO alignment
> Data for SFT:
- Phase 1: Tulu V2
- Phase 2: OpenHermes-2.5, WebInstructSub, and Code-Feedback

> Model checkpoints on the Hub & Integrated with Transformers ⚡️

Congratulations & kudos to AMD on a brilliant smol model release! 🤗

amd/amd-olmo-6723e7d04a49116d8ec95070
replied to di-zhang-fdu's post 4 months ago
view reply

Awesome work. Can we finetune further this reasoning model?

reacted to di-zhang-fdu's post with 👍 4 months ago
view post
Post
6406
LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace
Large Reasoning Models powered by Monte Carlo Tree Search (MCTS), Self-Play Reinforcement Learning, PPO, AlphaGo Zero's dua policy paradigm and Large Language Models!
https://github.com/SimpleBerry/LLaMA-O1/

What will happen when you compound MCTS ❤ LLM ❤ Self-Play ❤RLHF?
Just a little bite of strawberry!🍓

Past related works:
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884)
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)
  • 2 replies
·