DmitryRyumin (Dmitry Ryumin)

reacted to nyuuzyou's post with 🤯 about 1 month ago

Post

2780

its over

41 replies

·

reacted to TuringsSolutions's post with 🔥 about 2 months ago

Post

3118

Sentence Transformers received huge updates today! Do you like giving your model access to web search and document search? That's Sentence Transformers. Hugging Face makes it beyond easy to add this functionality to any model. You can be up and running with Sentence Transformers in seconds. Check out this video for a deeper explanation and sample code: https://youtu.be/2hR3D8_kqZE

reacted to tomaarsen's post with 🔥 about 2 months ago

Post

5416

I just released Sentence Transformers v3.3.0 & it's huge! 4.5x speedup for CPU with OpenVINO int8 static quantization, training with prompts for a free perf. boost, PEFT integration, evaluation on NanoBEIR, and more! Details:

1. We integrate Post-Training Static Quantization using OpenVINO, a very efficient solution for CPUs that processes 4.78x as many texts per second on average, while only hurting performance by 0.36% on average. There's a new export_static_quantized_openvino_model method to quantize a model.

2. We add the option to train with prompts, e.g. strings like "query: ", "search_document: " or "Represent this sentence for searching relevant passages: ". It's as simple as using the prompts argument in SentenceTransformerTrainingArguments. Our experiments show that you can easily reach 0.66% to 0.90% relative performance improvement on NDCG@10 at no extra cost by adding "query: " before each training query and "document: " before each training answer.

3. Sentence Transformers now supports training PEFT adapters via 7 new methods for adding new adapters or loading pre-trained ones. You can also directly load a trained adapter with SentenceTransformer as if it's a normal model. Very useful for e.g. 1) training multiple adapters on 1 base model, 2) training bigger models than otherwise possible, or 3) cheaply hosting multiple models by switching multiple adapters on 1 base model.

4. We added easy evaluation on NanoBEIR, a subset of BEIR a.k.a. the MTEB Retrieval benchmark. It contains 13 datasets with 50 queries and up to 10k documents each. Evaluation is fast, and can easily be done during training to track your model's performance on general-purpose information retrieval tasks.

Additionally, we also deprecate Python 3.8, add better compatibility with Transformers v4.46.0, and more. Read the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.3.0

reacted to singhsidhukuldeep's post with 🔥 2 months ago

Post

1844

Good folks at @nvidia have released exciting new research on normalized Transformers (nGPT) for faster and more efficient language modeling!

Here is what they are proposing:

1. Remove all normalization layers, like RMSNorm or LayerNorm, from the standard Transformer architecture.

2. Normalize all matrices along their embedding dimension after each training step. This includes input and output embeddings, attention matrices (Q, K, V), output projection matrices, and MLP matrices.

3. Replace the standard residual connections with normalized update equations using learnable eigen learning rates for the attention and MLP blocks.

4. Change the softmax scaling factor in the attention mechanism from 1/sqrt of d_k to sqrt of d_k.

5. Implement rescaling and optional normalization of query (q) and key (k) vectors in the attention mechanism using learnable scaling factors.

6. Rescale the intermediate states of the MLP block using learnable scaling factors.

7. Implement rescaling of the output logits using learnable scaling factors.

8. Remove weight decay and learning rate warmup from the optimization process.

9. Initialize the eigen learning rates and scaling factors with appropriate values as specified in the paper.

10. During training, treat all vectors and matrices as residing on a unit hypersphere, interpreting matrix-vector multiplications as cosine similarities.

11. Implement the update equations for the hidden states using the normalized outputs from attention and MLP blocks, controlled by the eigen learning rates.

12. After each forward pass, normalize all parameter matrices to ensure they remain on the unit hypersphere.

13. Use the Adam optimizer without weight decay for training the model.

14. When computing loss, apply the learnable scaling factor to the logits before the softmax operation.

15. During inference, follow the same normalization and scaling procedures as in training.

Excited to see how it scales to larger models and datasets!

reacted to merve's post with 🔥 2 months ago

Post

1666

Tencent released a new depth model that generates temporally consistent depth maps over videos ⏯️

Model: tencent/DepthCrafter
Demo: tencent/DepthCrafter
Paper: DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos (2409.02095)

You don't need to input anything other than video itself, no need for optical flow or camera poses! 🤩

reacted to albertvillanova's post with 👍 3 months ago

Post

1955

🚨 We’ve just released a new tool to compare the performance of models in the 🤗 Open LLM Leaderboard: the Comparator 🎉
open-llm-leaderboard/comparator

Want to see how two different versions of LLaMA stack up? Let’s walk through a step-by-step comparison of LLaMA-3.1 and LLaMA-3.2. 🦙🧵👇

1/ Load the Models' Results
- Go to the 🤗 Open LLM Leaderboard Comparator: open-llm-leaderboard/comparator
- Search for "LLaMA-3.1" and "LLaMA-3.2" in the model dropdowns.
- Press the Load button. Ready to dive into the results!

2/ Compare Metric Results in the Results Tab 📊
- Head over to the Results tab.
- Here, you’ll see the performance metrics for each model, beautifully color-coded using a gradient to highlight performance differences: greener is better! 🌟
- Want to focus on a specific task? Use the Task filter to hone in on comparisons for tasks like BBH or MMLU-Pro.

3/ Check Config Alignment in the Configs Tab ⚙️
- To ensure you’re comparing apples to apples, head to the Configs tab.
- Review both models’ evaluation configurations, such as metrics, datasets, prompts, few-shot configs...
- If something looks off, it’s good to know before drawing conclusions! ✅

4/ Compare Predictions by Sample in the Details Tab 🔍
- Curious about how each model responds to specific inputs? The Details tab is your go-to!
- Select a Task (e.g., MuSR) and then a Subtask (e.g., Murder Mystery) and then press the Load Details button.
- Check out the side-by-side predictions and dive into the nuances of each model’s outputs.

5/ With this tool, it’s never been easier to explore how small changes between model versions affect performance on a wide range of tasks. Whether you’re a researcher or enthusiast, you can instantly visualize improvements and dive into detailed comparisons.

🚀 Try the 🤗 Open LLM Leaderboard Comparator now and take your model evaluations to the next level!

reacted to m-ric's post with 👀 3 months ago

Post

1701

By far the coolest release of the day!
> The Open LLM Leaderboard, most comprehensive suite for comparing Open LLMs on many benchmarks, just released a comparator tool that lets you dig into the detail of differences between any models.

Here's me checking how the new Llama-3.1-Nemotron-70B that we've heard so much compares to the original Llama-3.1-70B. 🤔🔎

Try it out here 👉 open-llm-leaderboard/comparator

2 replies

·

reacted to TuringsSolutions's post with 👍 3 months ago

Post

1649

Microsoft released a method that allows you to vectorize word vectors themselves! It is called VPTQ. You can check out their full paper including the method and all of the math for the algorithm, or you can watch this video where I did all of that for you, then reconstructed their entire method within Python!

https://youtu.be/YwlKzV1y62s

10 replies

·

reacted to nyuuzyou's post with 👀 3 months ago

Post

1965

🎓 Introducing Doc4web.ru Documents Dataset - nyuuzyou/doc4web

Dataset highlights:
- 223,739 documents from doc4web.ru, a document hosting platform for students and teachers
- Primarily in Russian, with some English and potentially other languages
- Each entry includes: URL, title, download link, file path, and content (where available)
- Contains original document files in addition to metadata
- Data reflects a wide range of educational topics and materials
- Licensed under Creative Commons Zero (CC0) for unrestricted use

The dataset can be used for analyzing educational content in Russian, text classification tasks, and information retrieval systems. It's also valuable for examining trends in educational materials and document sharing practices in the Russian-speaking academic community. The inclusion of original files allows for in-depth analysis of various document formats and structures.

reacted to tomaarsen's post with 🔥 3 months ago

Post

6921

📣 Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost.

1️⃣ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2️⃣ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.

Usage is as simple as SentenceTransformer("all-MiniLM-L6-v2", backend="onnx"). Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later 😉

🔒 Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:

1️⃣ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with from_model2vec or with from_distillation where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.
2️⃣ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

1 reply

·

reacted to merve's post with 🔥 3 months ago

Post

3770

Meta AI vision has been cooking @facebook
They shipped multiple models and demos for their papers at @ECCV 🤗

Here's a compilation of my top picks:
- Sapiens is family of foundation models for human-centric depth estimation, segmentation and more, all models have open weights and demos 👏

All models have their demos and even torchscript checkpoints!
A collection of models and demos: facebook/sapiens-66d22047daa6402d565cb2fc
- VFusion3D is state-of-the-art consistent 3D generation model from images

Model: facebook/vfusion3d
Demo: facebook/VFusion3D

- CoTracker is the state-of-the-art point (pixel) tracking model

Demo: facebook/cotracker
Model: facebook/cotracker

reacted to m-ric's post with 👍 3 months ago

Post

3055

📜 𝐎𝐥𝐝-𝐬𝐜𝐡𝐨𝐨𝐥 𝐑𝐍𝐍𝐬 𝐜𝐚𝐧 𝐚𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐫𝐢𝐯𝐚𝐥 𝐟𝐚𝐧𝐜𝐲 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬!

Researchers from Mila and Borealis AI just have shown that simplified versions of good old Recurrent Neural Networks (RNNs) can match the performance of today's transformers.

They took a fresh look at LSTMs (from 1997!) and GRUs (from 2014). They stripped these models down to their bare essentials, creating "minLSTM" and "minGRU". The key changes:
❶ Removed dependencies on previous hidden states in the gates
❷ Dropped the tanh that had been added to restrict output range in order to avoid vanishing gradients
❸ Ensured outputs are time-independent in scale (not sure I understood that well either, don't worry)

⚡️ As a result, you can use a “parallel scan” algorithm to train these new, minimal RNNs, in parallel, taking 88% more memory but also making them 200x faster than their traditional counterparts for long sequences

🔥 The results are mind-blowing! Performance-wise, they go toe-to-toe with Transformers or Mamba.

And for Language Modeling, they need 2.5x fewer training steps than Transformers to reach the same performance! 🚀

🤔 Why does this matter?

By showing there are simpler models with similar performance to transformers, this challenges the narrative that we need advanced architectures for better performance!

💬 François Chollet wrote in a tweet about this paper:

“The fact that there are many recent architectures coming from different directions that roughly match Transformers is proof that architectures aren't fundamentally important in the curve-fitting paradigm (aka deep learning)”

“Curve-fitting is about embedding a dataset on a curve. The critical factor is the dataset, not the specific hard-coded bells and whistles that constrain the curve's shape.”

It’s the Bitter lesson by Rich Sutton striking again: don’t need fancy thinking architectures, just scale up your model and data!

Read the paper 👉 Were RNNs All We Needed? (2410.01201)

2 replies

·

reacted to merve's post with 🔥 3 months ago

Post

2728

NVIDIA just dropped a gigantic multimodal model called NVLM 72B 🦖
nvidia/NVLM-D-72B
Paper page NVLM: Open Frontier-Class Multimodal LLMs (2409.11402)

The paper contains many ablation studies on various ways to use the LLM backbone 👇🏻

🦩 Flamingo-like cross-attention (NVLM-X)
🌋 Llava-like concatenation of image and text embeddings to a decoder-only model (NVLM-D)
✨ a hybrid architecture (NVLM-H)

Checking evaluations, NVLM-D and NVLM-H are best or second best compared to other models 👏

The released model is NVLM-D based on Qwen-2 Instruct, aligned with InternViT-6B using a huge mixture of different datasets

You can easily use this model by loading it through transformers' AutoModel 😍

reacted to merve's post with 🔥 3 months ago

Post

4018

If you feel like you missed out for ECCV 2024, there's an app to browse the papers, rank for popularity, filter for open models, datasets and demos 📝

Get started at ECCV/ECCV2024-papers ✨

reacted to their post with 🤗🔥 3 months ago

Post

2614

🔥🎭🌟 New Research Alert - HeadGAP (Avatars Collection)! 🌟🎭🔥
📄 Title: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors 🔝

📝 Description: HeadGAP introduces a novel method for generating high-fidelity, animatable 3D head avatars from few-shot data, using Gaussian priors and dynamic part-based modelling for personalized and generalizable results.

👥 Authors: @zxz267 , @walsvid , @zhaohu2 , Weiyi Zhang, @hellozhuo , Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, @yongjie-zhang-mail , Guidong Wang, and Lan Xu

📄 Paper: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors (2408.06019)

🌐 Github Page: https://headgap.github.io

🚀 CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

🚀 WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

🚀 ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

🔍 Keywords: #HeadGAP #3DAvatar #FewShotLearning #GaussianPriors #AvatarCreation #3DModeling #MachineLearning #ComputerVision #ComputerGraphics #GenerativeAI #DeepLearning #AI

posted an update 3 months ago

Post

2614

🔥🎭🌟 New Research Alert - HeadGAP (Avatars Collection)! 🌟🎭🔥
📄 Title: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors 🔝

📝 Description: HeadGAP introduces a novel method for generating high-fidelity, animatable 3D head avatars from few-shot data, using Gaussian priors and dynamic part-based modelling for personalized and generalizable results.

👥 Authors: @zxz267 , @walsvid , @zhaohu2 , Weiyi Zhang, @hellozhuo , Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, @yongjie-zhang-mail , Guidong Wang, and Lan Xu

📄 Paper: HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors (2408.06019)

🌐 Github Page: https://headgap.github.io

🚀 CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

🚀 WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

🚀 ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

🔍 Keywords: #HeadGAP #3DAvatar #FewShotLearning #GaussianPriors #AvatarCreation #3DModeling #MachineLearning #ComputerVision #ComputerGraphics #GenerativeAI #DeepLearning #AI

reacted to abidlabs's post with ❤️ 3 months ago

Post

4692

👋 Hi Gradio community,

I'm excited to share that Gradio 5 will launch in October with improvements across security, performance, SEO, design (see the screenshot for Gradio 4 vs. Gradio 5), and user experience, making Gradio a mature framework for web-based ML applications.

Gradio 5 is currently in beta, so if you'd like to try it out early, please refer to the instructions below:

---------- Installation -------------

Gradio 5 depends on Python 3.10 or higher, so if you are running Gradio locally, please ensure that you have Python 3.10 or higher, or download it here: https://www.python.org/downloads/

* Locally: If you are running gradio locally, simply install the release candidate with pip install gradio --pre
* Spaces: If you would like to update an existing gradio Space to use Gradio 5, you can simply update the sdk_version to be 5.0.0b3 in the README.md file on Spaces.

In most cases, that’s all you have to do to run Gradio 5.0. If you start your Gradio application, you should see your Gradio app running, with a fresh new UI.

-----------------------------

Fore more information, please see: https://github.com/gradio-app/gradio/issues/9463

2 replies

·

reacted to fdaudens's post with 🚀 3 months ago

Post

3353

🚀 1,000,000 public models milestone achieved on Hugging Face! 🤯

This chart by @cfahlgren1 shows the explosive growth of open-source AI. It's not just about numbers - it's a thriving community combining cutting-edge ML with real-world applications. cfahlgren1/hub-stats

Can't wait to see what's next!

2 replies

·

reacted to their post with 🤗 3 months ago

Post

2053

🚀🕺🌟 New Research Alert - ECCV 2024 (Avatars Collection)! 🌟💃🚀
📄 Title: Expressive Whole-Body 3D Gaussian Avatar 🔝

📝 Description: ExAvatar is a model that generates animatable 3D human avatars with facial expressions and hand movements from short monocular videos using a hybrid mesh and 3D Gaussian representation.

👥 Authors: Gyeongsik Moon, Takaaki Shiratori, and @psyth

📅 Conference: ECCV, 29 Sep – 4 Oct, 2024 | Milano, Italy 🇮🇹

📄 Paper: MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos (2407.08414)

📄 Paper: Expressive Whole-Body 3D Gaussian Avatar (2407.21686)

🌐 Github Page: https://mks0601.github.io/ExAvatar
📁 Repository: https://github.com/mks0601/ExAvatar_RELEASE

🚀 CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

🚀 WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

🚀 ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

🔍 Keywords: #ExAvatar #3DAvatar #FacialExpressions #HandMotions #MonocularVideo #3DModeling #GaussianSplatting #MachineLearning #ComputerVision #ComputerGraphics #DeepLearning #AI #ECCV2024

Dmitry Ryumin

AI & ML interests

Recent Activity

Organizations

DmitryRyumin's activity