Qwen2.5-Coder Technical Report
Paper
•
2409.12186
•
Published
•
136
Note Apple DCLM
Note Mistral's MoE Model
Note Mistral's 7B Model
Note Google DeepMind Gemma Team
Note Google Gemini 1.5
Note DeepMind Gopher Model
Note OpenAI GPT-3
Note Meta LLaMa
Note OpenAI-CodeX
Note Chinchilla-DeepMind-2022.3 we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled.
Note OpenAI-InstructGPT/ChatGPT-2022.3
Note DeepSeek-2024.1