12 1 111

Wolf While

snoopynoob

straikersumy@gmail.com

AI & ML interests

None yet

Recent Activity

liked a model 27 days ago

bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF

liked a model about 1 month ago

Qwen/Qwen2.5-14B-Instruct-1M

liked a model 2 months ago

Lewdiculous/Nera_Noctis-12B-GGUF-ARM-Imatrix

View all activity

Organizations

None yet

snoopynoob's activity

liked a model 27 days ago

bartowski/cognitivecomputations_Dolphin3.0-R1-Mistral-24B-GGUF

Text Generation • Updated 27 days ago • 48.6k • 58

liked a model about 1 month ago

Qwen/Qwen2.5-14B-Instruct-1M

Text Generation • Updated Jan 29 • 54.4k • 275

liked 3 models 2 months ago

liked 2 models 3 months ago

inflatebot/MN-12B-Mag-Mell-R1

Text Generation • Updated Dec 19, 2024 • 6.97k • 124

nbeerbower/Lyra4-Gutenberg-12B

Text Generation • Updated Sep 14, 2024 • 105 • 20

liked 3 models 4 months ago

VongolaChouko/Starcannon-Unleashed-12B-v1.0

Text Generation • Updated Nov 2, 2024 • 146 • 45

mradermacher/UnslopNemo-12B-v4.1-GGUF

Updated Oct 29, 2024 • 262 • 3

mradermacher/UnslopNemo-12B-v4.1-i1-GGUF

Updated Dec 4, 2024 • 1.43k • 6

New activity in mradermacher/model_requests 4 months ago

TheDrummer/UnslopNemo-12B-v4.1

#401 opened 4 months ago by

snoopynoob

liked 2 models 4 months ago

TheDrummer/UnslopNemo-12B-v4.1-GGUF

Updated Oct 23, 2024 • 20.4k • 24

TheDrummer/UnslopNemo-12B-v4-GGUF

Updated Oct 23, 2024 • 910 • 13

liked 5 models 5 months ago

MarinaraSpaghetti/SillyTavern-Settings

Updated 22 days ago • 149

invisietch/L3.1-EtherealRainbow-v1.0-rc1-8B

Text Generation • Updated Oct 23, 2024 • 50 • 8

Lewdiculous/MN-BackyardAI-Party-12B-v1-GGUF-IQ-ARM-Imatrix

Updated Oct 3, 2024 • 1.53k • 15

nbeerbower/mistral-nemo-gutenberg-12B-v4

Text Generation • Updated Sep 6, 2024 • 150 • 20

nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B

Text Generation • Updated Sep 26, 2024 • 23 • 5

reacted to m-ric's post with 🔥 6 months ago

Post

3396

🔥 𝐐𝐰𝐞𝐧 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐭𝐡𝐞𝐢𝐫 𝟐.𝟓 𝐟𝐚𝐦𝐢𝐥𝐲 𝐨𝐟 𝐦𝐨𝐝𝐞𝐥𝐬: 𝐍𝐞𝐰 𝐒𝐎𝐓𝐀 𝐟𝐨𝐫 𝐚𝐥𝐥 𝐬𝐢𝐳𝐞𝐬 𝐮𝐩 𝐭𝐨 𝟕𝟐𝐁!

The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.

And they didn't sleep: the performance is top of the game for each weight category!

𝐊𝐞𝐲 𝐢𝐧𝐬𝐢𝐠𝐡𝐭𝐬:

🌐 All models have 𝟭𝟮𝟴𝗸 𝘁𝗼𝗸𝗲𝗻 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵

📚 Models pre-trained on 18T tokens, even longer than the 15T of Llama-3

💪 The flagship 𝗤𝘄𝗲𝗻𝟮.𝟱-𝟳𝟮𝗕 𝗶𝘀 ~𝗰𝗼𝗺𝗽𝗲𝘁𝗶𝘁𝗶𝘃𝗲 𝘄𝗶𝘁𝗵 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟰𝟬𝟱𝗕, 𝗮𝗻𝗱 𝗵𝗮𝘀 𝗮 𝟯-𝟱% 𝗺𝗮𝗿𝗴𝗶𝗻 𝗼𝗻 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟳𝟬𝗕 𝗼𝗻 𝗺𝗼𝘀𝘁 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀.

🇫🇷 On top of this, it 𝘁𝗮𝗸𝗲𝘀 𝘁𝗵𝗲 #𝟭 𝘀𝗽𝗼𝘁 𝗼𝗻 𝗺𝘂𝗹𝘁𝗶𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝘁𝗮𝘀𝗸𝘀 so it might become my standard for French

💻 Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!

🧮 Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."

📄 Technical report to be released "very soon"

🔓 All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"

🤗 All models are available on the HF Hub! ➡️ Qwen/qwen25-66e81a666513e518adb90d9e

2 replies

liked a model 6 months ago

mistralai/Mistral-Small-Instruct-2409

Updated Oct 16, 2024 • 226k • 381