Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
3
Sorokin Evgeny
DeathGodlike
Follow
21world's profile picture
1 follower
ยท
21 following
AI & ML interests
None yet
Recent Activity
Reacted to
openfree
's
post
with ๐
about 1 month ago
MixGen3 is an innovative image generation service that utilizes LoRA (Low-Rank Adaptation) models. Its key features include: Integration of various LoRA models: Users can explore and select multiple LoRA models through a gallery. Combination of LoRA models: Up to three LoRA models can be combined to express unique styles and content. User-friendly interface: An intuitive interface allows for easy model selection, prompt input, and image generation. Advanced settings: Various options are provided, including image size adjustment, random seed, and advanced configurations. Main applications of MixGen3: Content creation Design and illustration Marketing and advertising Education and learning Value of MixGen3: Enhancing creativity Time-saving Collaboration possibilities Continuous development Expected effects: Increased content diversity Lowered entry barrier for creation Improved creativity Enhanced productivity MixGen3 is bringing a new wave to the field of image generation by leveraging the advantages of LoRA models. Users can experience the service for free at https://openfree-mixgen3.hf.space contacts: arxivgpt@gmail.com
Reacted to
singhsidhukuldeep
's
post
with ๐
about 1 month ago
While Google's Transformer might have introduced "Attention is all you need," Microsoft and Tsinghua University are here with the DIFF Transformer, stating, "Sparse-Attention is all you need." The DIFF Transformer outperforms traditional Transformers in scaling properties, requiring only about 65% of the model size or training tokens to achieve comparable performance. The secret sauce? A differential attention mechanism that amplifies focus on relevant context while canceling out noise, leading to sparser and more effective attention patterns. How? - It uses two separate softmax attention maps and subtracts them. - It employs a learnable scalar ฮป for balancing the attention maps. - It implements GroupNorm for each attention head independently. - It is compatible with FlashAttention for efficient computation. What do you get? - Superior long-context modeling (up to 64K tokens). - Enhanced key information retrieval. - Reduced hallucination in question-answering and summarization tasks. - More robust in-context learning, less affected by prompt order. - Mitigation of activation outliers, opening doors for efficient quantization. Extensive experiments show DIFF Transformer's advantages across various tasks and model sizes, from 830M to 13.1B parameters. This innovative architecture could be a game-changer for the next generation of LLMs. What are your thoughts on DIFF Transformer's potential impact?
Reacted to
Felladrin
's
post
with ๐
about 1 month ago
MiniSearch is celebrating its 1st birthday! ๐ Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too! HF Space: https://huggingface.co/spaces/Felladrin/MiniSearch Embeddable URL: https://felladrin-minisearch.hf.space
View all activity
Organizations
None yet
DeathGodlike
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
2 models
3 months ago
arcee-ai/Llama-3.1-SuperNova-Lite
Text Generation
โข
Updated
Oct 2
โข
9.86k
โข
174
bartowski/Llama-3.1-SuperNova-Lite-exl2
Text Generation
โข
Updated
Sep 11
โข
9
โข
2
liked
a model
4 months ago
Annuvin/Lumimaid-v0.2-12B-5.0bpw-exl2
Text Generation
โข
Updated
Jul 27
โข
6
โข
1