view article Article Janus Pro: DeepSeek's Revolutionary Multimodal AI Model By LLMhacker โข 3 days ago โข 27
view article Article Introduction to Quantization cooked in ๐ค with ๐๐งโ๐ณ By merve โข Aug 25, 2023 โข 28
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper โข 2501.11873 โข Published 10 days ago โข 61
FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper โข 2501.09747 โข Published 14 days ago โข 23
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x โข May 7, 2024 โข 45
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 โข 135
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 โข 175
PokerBench: Training Large Language Models to become Professional Poker Players Paper โข 2501.08328 โข Published 16 days ago โข 14
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper โข 2501.08313 โข Published 16 days ago โข 271
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper โข 2501.07301 โข Published 18 days ago โข 89
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper โข 2501.03895 โข Published 24 days ago โข 48
view article Article Deploying Your FastAPI Applications on Huggingface Via Docker By HemanthSai7 โข Dec 11, 2023 โข 19
Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages Paper โข 2411.12240 โข Published Nov 19, 2024 โข 6