Sorokin Evgeny

DeathGodlike

AI & ML interests

None yet

Recent Activity

reacted to AdinaY's post with 🔥 2 days ago

It’s not just a flood of model releases, papers are dropping just as fast 🚀 Here are the 10 most upvoted papers from the Chinese community: 👉 https://huggingface.co/collections/zh-ai-community/2025-january-papers-679933cbf0f3ced11f5a168a

reacted to fdaudens's post with 🔥 3 days ago

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after: - Original release: 8 models, 540K downloads. Just the beginning... - The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals. The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient. When you empower builders, innovation explodes. For everyone. 🚀 The most popular community model? @bartowski's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.

reacted to fdaudens's post with ❤️ 3 days ago

View all activity

Organizations

None yet

DeathGodlike's activity

reacted to AdinaY's post with 🔥 2 days ago

Post

2572

It’s not just a flood of model releases, papers are dropping just as fast 🚀

Here are the 10 most upvoted papers from the Chinese community:
👉 zh-ai-community/2025-january-papers-679933cbf0f3ced11f5a168a

reacted to fdaudens's post with 🔥❤️ 3 days ago

Post

7592

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5M—nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. 🚀

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version — 1M downloads alone.

4 replies

reacted to davanstrien's post with 👀 3 days ago

Post

1900

🌍 Big step for multilingual AI data!

The Hugging Face community has rated educational content in languages spoken by 1.6 billion people! New additions:
• Japanese
• Italian
• Old High German

Learn more and contribute: https://huggingface.co/blog/davanstrien/fineweb2-community

These ratings can help enhance training data for major world languages.

1 reply

reacted to onekq's post with 🔥 9 days ago

Post

4628

🐋DeepSeek 🐋 is the real OpenAI 😯

6 replies

reacted to alibabasglab's post with 👍 9 days ago

Post

1734

ClearerVoice-Studio: your one-step speech processing platform for speech enhancement, speech separation, speech super-resolution, and audio-visual target speaker extraction. Say goodbye to noise and hello to clarity!

Online demo: alibabasglab/ClearVoice .
Github repo: https://github.com/modelscope/ClearerVoice-Studio

reacted to tomaarsen's post with 🔥❤️ 16 days ago

Post

4438

🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.

We apply our recipe to train 2 Static Embedding models that we release today! We release:
2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
📜 my training scripts, using the Sentence Transformers library
📊 my Weights & Biases reports with losses & metrics
📕 my list of 30 training and 13 evaluation datasets

The 2 Static Embedding models have the following properties:
🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
📏 No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
📐 Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
🪆 Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings

The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.

Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1

1 reply

reacted to prithivMLmods's post with 👍🤗 about 2 months ago

Post

2657

Milestone for Flux.1 Dev 🔥

💢The Flux.1 Dev model has crossed 1️⃣0️⃣,0️⃣0️⃣0️⃣ creative public adapters! 🎈
🔗 https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev

💢This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges

💢 Here’s the 10,000th public adapter : 😜
+ strangerzonehf/Flux-3DXL-Partfile-0006

💢 Page :
+ https://huggingface.co/strangerzonehf

💢 Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be

reacted to openfree's post with 👍 4 months ago

Post

3990

MixGen3 is an innovative image generation service that utilizes LoRA (Low-Rank Adaptation) models. Its key features include:

Integration of various LoRA models: Users can explore and select multiple LoRA models through a gallery.
Combination of LoRA models: Up to three LoRA models can be combined to express unique styles and content.
User-friendly interface: An intuitive interface allows for easy model selection, prompt input, and image generation.
Advanced settings: Various options are provided, including image size adjustment, random seed, and advanced configurations.

Main applications of MixGen3:

Content creation
Design and illustration
Marketing and advertising
Education and learning

Value of MixGen3:

Enhancing creativity
Time-saving
Collaboration possibilities
Continuous development

Expected effects:

Increased content diversity
Lowered entry barrier for creation
Improved creativity
Enhanced productivity

MixGen3 is bringing a new wave to the field of image generation by leveraging the advantages of LoRA models. Users can experience the service for free at
https://openfree-mixgen3.hf.space

contacts: arxivgpt@gmail.com

1 reply

reacted to singhsidhukuldeep's post with 👀 4 months ago

Post

2163

While Google's Transformer might have introduced "Attention is all you need," Microsoft and Tsinghua University are here with the DIFF Transformer, stating, "Sparse-Attention is all you need."

The DIFF Transformer outperforms traditional Transformers in scaling properties, requiring only about 65% of the model size or training tokens to achieve comparable performance.

The secret sauce? A differential attention mechanism that amplifies focus on relevant context while canceling out noise, leading to sparser and more effective attention patterns.

How?
- It uses two separate softmax attention maps and subtracts them.
- It employs a learnable scalar λ for balancing the attention maps.
- It implements GroupNorm for each attention head independently.
- It is compatible with FlashAttention for efficient computation.

What do you get?
- Superior long-context modeling (up to 64K tokens).
- Enhanced key information retrieval.
- Reduced hallucination in question-answering and summarization tasks.
- More robust in-context learning, less affected by prompt order.
- Mitigation of activation outliers, opening doors for efficient quantization.

Extensive experiments show DIFF Transformer's advantages across various tasks and model sizes, from 830M to 13.1B parameters.

This innovative architecture could be a game-changer for the next generation of LLMs. What are your thoughts on DIFF Transformer's potential impact?

1 reply

reacted to Felladrin's post with 👍 4 months ago

Post

3010

MiniSearch is celebrating its 1st birthday! 🎉

Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!

HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space

1 reply

reacted to nyuuzyou's post with ❤️👀 4 months ago

Post

1974

🎓 Introducing Doc4web.ru Documents Dataset - nyuuzyou/doc4web

Dataset highlights:
- 223,739 documents from doc4web.ru, a document hosting platform for students and teachers
- Primarily in Russian, with some English and potentially other languages
- Each entry includes: URL, title, download link, file path, and content (where available)
- Contains original document files in addition to metadata
- Data reflects a wide range of educational topics and materials
- Licensed under Creative Commons Zero (CC0) for unrestricted use

The dataset can be used for analyzing educational content in Russian, text classification tasks, and information retrieval systems. It's also valuable for examining trends in educational materials and document sharing practices in the Russian-speaking academic community. The inclusion of original files allows for in-depth analysis of various document formats and structures.

reacted to m-ric's post with 🔥 4 months ago

Post

1332

🇨🇳⛵️ 出海: Chinese AI is expanding globally

Fact: Chinese LLMs are heavily underrated, for instance recently the excellent Deepseek-v2.5 or Qwen models.

Luckily for us, @AdinaY just wrote an excellent blog post explaining the Chinese AI ecosystem!

My key takeaways:

Since Google, OpenAI and Anthropic models are not available in China, local companies are fighting for the market. A really good market - AI has much higher penetration there than in the rest of the world, both with companies and individual users!

💰 But since Deepseek heavily cut prices in May 24, this spiraled into a price war that created a cut-throat environment with unsustainably low prices.

📋 On top of this, the local regulation is stringent: models must undergo licensing from a local censor (the Cyberspace Administration of China), that for instance requires models to refuse answering certain questions on the CCP. Although this is certainly simpler to implement than certain condition of the European AI Act.

💸 If this wasn't enough, VC investment in AI is drying out: By mid-2024, Chinese AI startups raised approximately $4.4 billion, vs $55B for US startups just in Q2 24.

📱 To get profitability companies have shifted from foundational models to model + application, for instance PopAI from [01.AI](http://01.ai/) with millions of users and high profitability.

⛏️ They also try to drill down specific industries: but these niches are also getting crowded.

➡️ Since their home market is becoming both too crowded and unhospitable, Chinese companies are now going for international market, "Sailing abroad" following the expression consacred for Zheng He's legendary journey in 1500.

There, they'll have to adapt to different infrastructures and regulations, but they have bright prospects for growth!

Read her post 👉 https://huggingface.co/blog/AdinaY/chinese-ai-global-expansion

reacted to TuringsSolutions's post with 😎👍 4 months ago

Post

3195

I solved the biggest math problem associated with the Attention Mechanism. it works, better than I ever expected. Test it all yourself. Everything you need is linked from this video: https://youtu.be/41dF0yoz0qo

Sorry the audio quality sucks, I will buy a new microphone today. Why does some moron like me solve these things and not you? I know more about how computers work than you do, that's it. Swarm algorithms were big in the 90's and early 2000's. Computers were absolute dog doo doo then in one specific way, compared to now. That one way, which everyone overlooks, is the entire secret behind why swarm algorithms are so good.

liked 2 models 5 months ago

arcee-ai/Llama-3.1-SuperNova-Lite

Text Generation • Updated 14 days ago • 9.93k • 188

bartowski/Llama-3.1-SuperNova-Lite-exl2

Text Generation • Updated Sep 11, 2024 • 7 • 2