Sorokin Evgeny's picture
3

Sorokin Evgeny

DeathGodlike
Β·

AI & ML interests

None yet

Recent Activity

View all activity

Organizations

None yet

DeathGodlike's activity

reacted to AdinaY's post with πŸ”₯ 2 days ago
reacted to fdaudens's post with πŸ”₯❀️ 3 days ago
view post
Post
7592
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:

- Original release: 8 models, 540K downloads. Just the beginning...

- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mβ€”nearly 5X the originals.

The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.

When you empower builders, innovation explodes. For everyone. πŸš€

The most popular community model? @bartowski 's DeepSeek-R1-Distill-Qwen-32B-GGUF version β€” 1M downloads alone.
Β·
reacted to davanstrien's post with πŸ‘€ 3 days ago
view post
Post
1900
🌍 Big step for multilingual AI data!

The Hugging Face community has rated educational content in languages spoken by 1.6 billion people! New additions:
β€’ Japanese
β€’ Italian
β€’ Old High German

Learn more and contribute: https://huggingface.co/blog/davanstrien/fineweb2-community

These ratings can help enhance training data for major world languages.
  • 1 reply
Β·
reacted to onekq's post with πŸ”₯ 9 days ago
view post
Post
4628
πŸ‹DeepSeek πŸ‹ is the real OpenAI 😯
Β·
reacted to alibabasglab's post with πŸ‘ 9 days ago
reacted to tomaarsen's post with πŸ”₯❀️ 16 days ago
view post
Post
4438
🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.

We apply our recipe to train 2 Static Embedding models that we release today! We release:
2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
πŸ“œ my training scripts, using the Sentence Transformers library
πŸ“Š my Weights & Biases reports with losses & metrics
πŸ“• my list of 30 training and 13 evaluation datasets

The 2 Static Embedding models have the following properties:
🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
πŸ“ No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
πŸ“ Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
πŸͺ† Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings

The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.

Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1
  • 1 reply
Β·
reacted to prithivMLmods's post with πŸ‘πŸ€— about 2 months ago
view post
Post
2657
Milestone for Flux.1 Dev πŸ”₯

πŸ’’The Flux.1 Dev model has crossed 1️⃣0️⃣,0️⃣0️⃣0️⃣ creative public adapters! 🎈
πŸ”— https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev

πŸ’’This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges

πŸ’’ Here’s the 10,000th public adapter : 😜
+ strangerzonehf/Flux-3DXL-Partfile-0006

πŸ’’ Page :
+ https://huggingface.co/strangerzonehf

πŸ’’ Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
reacted to openfree's post with πŸ‘ 4 months ago
view post
Post
3990
MixGen3 is an innovative image generation service that utilizes LoRA (Low-Rank Adaptation) models. Its key features include:

Integration of various LoRA models: Users can explore and select multiple LoRA models through a gallery.
Combination of LoRA models: Up to three LoRA models can be combined to express unique styles and content.
User-friendly interface: An intuitive interface allows for easy model selection, prompt input, and image generation.
Advanced settings: Various options are provided, including image size adjustment, random seed, and advanced configurations.

Main applications of MixGen3:

Content creation
Design and illustration
Marketing and advertising
Education and learning

Value of MixGen3:

Enhancing creativity
Time-saving
Collaboration possibilities
Continuous development

Expected effects:

Increased content diversity
Lowered entry barrier for creation
Improved creativity
Enhanced productivity

MixGen3 is bringing a new wave to the field of image generation by leveraging the advantages of LoRA models. Users can experience the service for free at
https://openfree-mixgen3.hf.space

contacts: arxivgpt@gmail.com
  • 1 reply
Β·
reacted to singhsidhukuldeep's post with πŸ‘€ 4 months ago
view post
Post
2163
While Google's Transformer might have introduced "Attention is all you need," Microsoft and Tsinghua University are here with the DIFF Transformer, stating, "Sparse-Attention is all you need."

The DIFF Transformer outperforms traditional Transformers in scaling properties, requiring only about 65% of the model size or training tokens to achieve comparable performance.

The secret sauce? A differential attention mechanism that amplifies focus on relevant context while canceling out noise, leading to sparser and more effective attention patterns.

How?
- It uses two separate softmax attention maps and subtracts them.
- It employs a learnable scalar Ξ» for balancing the attention maps.
- It implements GroupNorm for each attention head independently.
- It is compatible with FlashAttention for efficient computation.

What do you get?
- Superior long-context modeling (up to 64K tokens).
- Enhanced key information retrieval.
- Reduced hallucination in question-answering and summarization tasks.
- More robust in-context learning, less affected by prompt order.
- Mitigation of activation outliers, opening doors for efficient quantization.

Extensive experiments show DIFF Transformer's advantages across various tasks and model sizes, from 830M to 13.1B parameters.

This innovative architecture could be a game-changer for the next generation of LLMs. What are your thoughts on DIFF Transformer's potential impact?
  • 1 reply
Β·
reacted to Felladrin's post with πŸ‘ 4 months ago
view post
Post
3010
MiniSearch is celebrating its 1st birthday! πŸŽ‰

Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!

HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space
  • 1 reply
Β·
reacted to nyuuzyou's post with β€οΈπŸ‘€ 4 months ago
view post
Post
1974
πŸŽ“ Introducing Doc4web.ru Documents Dataset - nyuuzyou/doc4web

Dataset highlights:
- 223,739 documents from doc4web.ru, a document hosting platform for students and teachers
- Primarily in Russian, with some English and potentially other languages
- Each entry includes: URL, title, download link, file path, and content (where available)
- Contains original document files in addition to metadata
- Data reflects a wide range of educational topics and materials
- Licensed under Creative Commons Zero (CC0) for unrestricted use

The dataset can be used for analyzing educational content in Russian, text classification tasks, and information retrieval systems. It's also valuable for examining trends in educational materials and document sharing practices in the Russian-speaking academic community. The inclusion of original files allows for in-depth analysis of various document formats and structures.
reacted to m-ric's post with πŸ”₯ 4 months ago
view post
Post
1332
πŸ‡¨πŸ‡³β›΅οΈ ε‡Ίζ΅·: Chinese AI is expanding globally

Fact: Chinese LLMs are heavily underrated, for instance recently the excellent Deepseek-v2.5 or Qwen models.

Luckily for us, @AdinaY just wrote an excellent blog post explaining the Chinese AI ecosystem!

My key takeaways:

Since Google, OpenAI and Anthropic models are not available in China, local companies are fighting for the market. A really good market - AI has much higher penetration there than in the rest of the world, both with companies and individual users!

πŸ’° But since Deepseek heavily cut prices in May 24, this spiraled into a price war that created a cut-throat environment with unsustainably low prices.

πŸ“‹ On top of this, the local regulation is stringent: models must undergo licensing from a local censor (the Cyberspace Administration of China), that for instance requires models to refuse answering certain questions on the CCP. Although this is certainly simpler to implement than certain condition of the European AI Act.

πŸ’Έ If this wasn't enough, VC investment in AI is drying out: By mid-2024, Chinese AI startups raised approximately $4.4 billion, vs $55B for US startups just in Q2 24.

πŸ“± To get profitability companies have shifted from foundational models to model + application, for instance PopAI from [01.AI](http://01.ai/) with millions of users and high profitability.

⛏️ They also try to drill down specific industries: but these niches are also getting crowded.

➑️ Since their home market is becoming both too crowded and unhospitable, Chinese companies are now going for international market, "Sailing abroad" following the expression consacred for Zheng He's legendary journey in 1500.

There, they'll have to adapt to different infrastructures and regulations, but they have bright prospects for growth!

Read her post πŸ‘‰Β https://huggingface.co/blog/AdinaY/chinese-ai-global-expansion
reacted to TuringsSolutions's post with πŸ˜ŽπŸ‘ 4 months ago
view post
Post
3195
I solved the biggest math problem associated with the Attention Mechanism. it works, better than I ever expected. Test it all yourself. Everything you need is linked from this video: https://youtu.be/41dF0yoz0qo

Sorry the audio quality sucks, I will buy a new microphone today. Why does some moron like me solve these things and not you? I know more about how computers work than you do, that's it. Swarm algorithms were big in the 90's and early 2000's. Computers were absolute dog doo doo then in one specific way, compared to now. That one way, which everyone overlooks, is the entire secret behind why swarm algorithms are so good.