victor (Victor Mustar)

reacted to AdinaY's post with 👀 8 days ago

Post

1661

Qilin 🔥a large scale multimodal dataset for search, recommendation and RAG research, released by Xiaohongshu & Tsinghua University

Dataset: THUIR/Qilin
Paper: Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions (2503.00501)

✨Multiple content modalities (text, images, video thumbnails)
✨Rich user interaction data ( from Xiaohongshu’s 300M+ MAUs, 70%+ search penetration)
✨Comprehensive evaluation metrics
✨Support for RAG system development

reacted to albertvillanova's post with 🔥 8 days ago

Post

3772

🚀 Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. 🦾🔒

Here's why this is a game-changer for agent-based systems: 🧵👇

1️⃣ Security First 🔐
Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.

2️⃣ Deterministic & Reproducible Runs 📦
By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable setting—no more environment mismatches or dependency issues!

3️⃣ Resource Control & Limits 🚦
Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents don’t spiral out of control.

4️⃣ Safer Code Execution in Production 🏭
Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.

5️⃣ Easy to Integrate 🛠️
With smolagents, you can simply configure your agent to use Docker or E2B as its execution backend—no need for complex security setups!

6️⃣ Perfect for Autonomous AI Agents 🤖
If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.

⚡ Get started now: https://github.com/huggingface/smolagents

What will you build with smolagents? Let us know! 🚀💡

reacted to Yehor's post with 👍 8 days ago

Post

2847

Published a stable version of Ukrainian Text-to-Speech library on GitHub and PyPI.

Features:

- Multi-speaker model: 2 female (Tetiana, Lada) + 1 male (Mykyta) voices;
- Fine-grained control over speech parameters, including duration, fundamental frequency (F0), and energy;
- High-fidelity speech generation using the RAD-TTS++ acoustic model;
- Fast vocoding using Vocos;
- Synthesizes long sentences effectively;
- Supports a sampling rate of 44.1 kHz;
- Tested on Linux environments and Windows/WSL;
- Python API (requires Python 3.9 or later);
- CUDA-enabled for GPU acceleration.

Repository: https://github.com/egorsmkv/tts_uk

reacted to DualityAI-RebekahBogdanoff's post with 🧠 8 days ago

Post

3189

Duality.ai just released a 1000 image dataset used to train a YOLOv8 model in multiclass object detection -- and it's 100% free!
duality-robotics/YOLOv8-Multiclass-Object-Detection-Dataset

Access the full size dataset by creating an EDU account here- https://falcon.duality.ai/secure/documentation/ex3-dataset?sidebarMode=learn

Or check it out in the linked HuggingFace dataset!

What makes this dataset unique, useful, and capable of bridging the Sim2Real gap?

💠 The digital twins are not generated by AI, but instead crafted by 3D artists to be INDISTINGUISHABLE from the physical-world objects. This allows the training from this data to transfer into real-world applicability

💠 The simulation software, called FalconEditor, can easily create thousands of images with varying lighting, posing, occlusions, backgrounds, camera positions, and more. This enables robust model training.

💠 The labels are created along with the data. This not only saves large amounts of time, but also ensures the labels are incredibly accurate and reliable.

This HuggingFace dataset is a 20 image and label sample, but you can get the rest at no cost by creating a FalconCloud account here: https://falcon.duality.ai/secure/documentation/ex3-dataset?sidebarMode=learn.

Once you verify your email, the link will redirect you to the dataset page.

reacted to prithivMLmods's post with 👍 8 days ago

Post

4918

SigLIP2 Image Classification 🧤

> https://huggingface.co/blog/prithivMLmods/siglip2-finetune-image-classification

reacted to Undi95's post with 👍 8 days ago

Post

4569

Hi there!

If you want to create your own thinking model or do a better MistralThinker, I just uploaded my entire dataset made on Deepseek R1 and the axolotl config. (well I made them public)

Axolotl config : Undi95/MistralThinker-v1.1

The dataset : Undi95/R1-RP-ShareGPT3

You can also read all I did on those two discord screenshot from two days ago, I'm a little lazy to rewrite all kek.

Hope you will use them!

5 replies

·

reacted to clem's post with 🔥 8 days ago

Post

5871

Super happy to welcome Nvidia as our latest enterprise hub customer. They have almost 2,000 team members using Hugging Face, and close to 20,000 followers of their org. Can't wait to see what they'll open-source for all of us in the coming months!

Nvidia's org: https://huggingface.co/nvidia
Enterprise hub: https://huggingface.co/enterprise

reacted to onekq's post with 🚀 10 days ago

Post

2502

I was puzzled by the scope of 🐋DeepSeek🐋 projects, i.e. why they built (then open sourced) so many pieces which are all over their technology stack. Good engineers are minimalists. They build only when they have to.

Then I realized that FP8 should be the main driving force here. So your raw inter-GPU bandwidth is cut in half (H800). But if you compress your data presentation from 16 bits to 8 bits, then the effective throughput of your workload stays unchanged!

The idea is simple but lots of work had to be done. Their v3 technical report will give you a wholistic view (better than reading the code). To summarize, data structure is the foundation to any software. Since FP8 was new and untried, the ecosystem wasn't there. So DeepSeek became the trailblazer. Before cooking your meals, you need to till the land, grow crops, and grind the flour 😅

reacted to eaddario's post with 👀 10 days ago

Post

2064

Squeezing out tensor bits, part II

At post time, watt-ai/watt-tool-70B continues to top the Berkeley Function-Calling Leaderboard, with the 8B version occupying the 4th place. A remarkable achievement for a model of that size!

The "squeezed" version is now available at eaddario/Watt-Tool-8B-GGUF

(For context please see: https://huggingface.co/posts/eaddario/832567461491467)

2 replies

·

reacted to davidberenstein1957's post with 👍 10 days ago

Post

2979

🫸 New release to push vector search to the Hub with vicinity and work with any serialisable objects.

🧑‍🏫 KNN, HNSW, USEARCH, ANNOY, PYNNDESCENT, FAISS, and VOYAGER.

🔗 Example Repo: minishlab/my-vicinity-repo

reacted to rizavelioglu's post with ❤️ 11 days ago

Post

3144

Comparing reconstruction quality of various VAEs with an interactive demo
rizavelioglu/vae-comparison

1 reply

·

reacted to mkurman's post with ❤️ 11 days ago

Post

3673

Introducing a new architecture, MedIT One – a single-token transformer with LSTM-like recurrence.

It is extremely fast in training and inference, but we lack funding for large-scale training. Enjoy 🍓

https://github.com/MedITSolutionsKurman/medit-one

reacted to singhsidhukuldeep's post with 👍 11 days ago

Post

6740

Exciting New Tool for Knowledge Graph Extraction from Plain Text!

I just came across a groundbreaking new tool called KGGen that's solving a major challenge in the AI world - the scarcity of high-quality knowledge graph data.

KGGen is an open-source Python package that leverages language models to extract knowledge graphs (KGs) from plain text. What makes it special is its innovative approach to clustering related entities, which significantly reduces sparsity in the extracted KGs.

The technical approach is fascinating:

1. KGGen uses a multi-stage process involving an LLM (GPT-4o in their implementation) to extract entities and relations from source text
2. It aggregates graphs across sources to reduce redundancy
3. Most importantly, it applies iterative LM-based clustering to refine the raw graph

The clustering stage is particularly innovative - it identifies which nodes and edges refer to the same underlying entities or concepts. This normalizes variations in tense, plurality, stemming, and capitalization (e.g., "labors" clustered with "labor").

The researchers from Stanford and University of Toronto also introduced MINE (Measure of Information in Nodes and Edges), the first benchmark for evaluating KG extractors. When tested against existing methods like OpenIE and GraphRAG, KGGen outperformed them by up to 18%.

For anyone working with knowledge graphs, RAG systems, or KG embeddings, this tool addresses the fundamental challenge of data scarcity that's been holding back progress in graph-based foundation models.

The package is available via pip install kg-gen, making it accessible to everyone. This could be a game-changer for knowledge graph applications!

reacted to Jaward's post with 🤗 11 days ago

Post

4937

made a few improvements on custom grpo trainer:
- added sequence similarity reward (seems to work)
- improved vllm support (5x inference speed)
- adjusted reward scores (this helped with format/accuracy)
- can now push to hf hub (already pushed mine lol: Jaward/smollm2_360m_grpo_gsm8k_reasoner)

Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

reacted to Kseniase's post with 🔥 11 days ago

Post

6073

9 types of "Chain-of-..." approaches:

Chain-of-Thought (CoT) prompting enhances reasoning in AI models by breaking down complex problems into step-by-step logical sequences. It continues proving its effectiveness, especially in top-performing reasoning models. However, there are other similar methods, that expand CoT and can be used for different purposes. Here are 9 of them:

1. Chain-of-Action-Thought (COAT) -> Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search (2502.02508)
Helps model decide when to keep thinking, double-check their work, or try a different approach, using special guiding tokens.

2. Chain of Draft (CoD) -> Chain of Draft: Thinking Faster by Writing Less (2502.18600)
It helps model generate short but meaningful reasoning steps, cutting costs and making processing faster

3. Chain-of-Agents -> Chain of Agents: Large Language Models Collaborating on Long-Context Tasks (2406.02818)
Uses multi-agent collaboration: Worker agents process text parts in a structured chain, and manager agent summarizes the results

4. Chain-of-RAG ->https://huggingface.co/papers/2501.14342
Creates retrieval chains, instead of retrieving all info at once. It can dynamically adjust its search process and its parameters like step number

5. Chain-of-Shot Prompting (CoS) -> CoS: Chain-of-Shot Prompting for Long Video Understanding (2502.06428)
Helps models pick frames crucial for understanding a video, using a binary video summary and video co-reasoning module.

6. Chain of Hindsight (CoH) -> Chain of Hindsight Aligns Language Models with Feedback (2302.02676)
Converts all feedback into sequences to fine-tune the model and refine outputs

7. Chain-of-Note (CoN) -> Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models (2311.09210)
Generates sequential reading notes for each retrieved document to assess relevance before integrating info into the final answer

8. Chain of Diagnosis (CoD) -> CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis (2407.13301)
Transforms the diagnostic process into a diagnostic chain

9. Chain(s)-of-Knowledge -> https://www.turingpost.com/p/cok
Enhance LLMs by dynamically pulling in external knowledge to improve accuracy and reduce errors

reacted to MohamedRashad's post with ❤️ 11 days ago

Post

3415

I think we have released the best Arabic model under 25B at least based on inceptionai/AraGen-Leaderboard

Yehia = ALLaM-AI/ALLaM-7B-Instruct-preview + GRPO

and its ranked number one model under the 25B parameter size mark.

Now, i said "i think" not "i am sure" because this model used the same metric of evaluation the AraGen developers use (the 3C3H) as a reward model to improve its responses and this sparks the question. Is this something good for users or is it another type of overfitting that we don't want ?

I don't know if this is a good thing or a bad thing but what i know is that you can try it from here:
Navid-AI/Yehia-7B-preview

or Download it for your personal experiments from here:
Navid-AI/Yehia-7B-preview

Ramadan Kareem 🌙

1 reply

·

reacted to elismasilva's post with 🔥 11 days ago

Post

2779

MoD ControlNet Tile Upscaler for SDXL: Upscale Your Images with Ease! 🚀

Meet the MoD ControlNet Tile Upscaler for SDXL, a powerful tool that uses advanced technology to upscale your images without losing quality! Our app is designed to process images in tiles without leaving them blurry or with visible lines between the tiles. The result? Upscaled images with preserved details and smooth, natural transitions—all through a user-friendly interface. ✨

What MoD Upscaler Offers:

🔍 Preserved Details: Unlike traditional upscalers, the MoD ControlNet Tile Upscaler enlarges your images while maintaining clarity and adding details that might otherwise be lost. Your photos gain more definition without sacrificing original quality.
🧩 Advanced Tiling Technology: We use a smart combination of techniques to ensure natural and smooth transitions between tiles. This means your upscaled images remain consistent and high-quality, even at higher resolutions. No more visible lines or imperfections!
⚡ Fast and Efficient: You don’t need a super-powered computer! Our app is optimized to run quickly and smoothly, even on simpler machines.
🎨 Easy-to-Use Interface: You don’t have to be an expert to use the MoD ControlNet Tile Upscaler. The interface is simple, intuitive, and designed so anyone can achieve professional-quality results without hassle.
Upscale your images without losing quality and with details preserved. Try the MoD ControlNet Tile Upscaler today! 👍

Demo App: elismasilva/mod-control-tile-upscaler-sdxl
Github Code: https://github.com/DEVAIEXP/mod-control-tile-upscaler-sdxl

We use Gradio amazing interfaces.
We use Hugging Face Diffusers to build this tool and Hugging Face Spaces to run this demo.

Thank you all! 🙏

reacted to MrOvkill's post with ❤️ 17 days ago

Post

2280

Hello!

I was just playing around with Python's MIDI library and Colab's code generation, accidentally cooked up a quick n' dirty audio synthesis template.
Have fun!

https://colab.research.google.com/drive/1d-AF6jygCwmoJvAa9nnEMe5ROidnMJNY?usp=sharing

-<3

3 replies

·

reacted to sometimesanotion's post with 👍 17 days ago

Post

4632

I'd like to draw your attention to a Lamarck-based experiment which uses Arcee AI's newly published arcee_fusion merge method for three out of its four merges. Yes, just four. This is a simple one, and its recipe is fully open:

https://huggingface.co/sometimesanotion/Lamarck-14B-v0.7-Fusion

It unifies three branches, all of which feature models which bring Lamarck-14B-v0.7 and Qwenvergence-14B-v12-Prose together. One side features @jpacifico 's jpacifico/Chocolatine-2-14B-Instruct-v2.0.3 and the other features @suayptalha 's suayptalha/Lamarckvergence-14B paired with my models which were their merge ancestors.

A fusion merge - of a fusion merge and a SLERP of a fusion and older merge - should demonstrate the new merge method's behavior in interesting ways, especially in the first 1/4th of the model where the SLERP has less impact.

I welcome you to kick the tires and learn from it. It has prose quality near Qwenvergence v12's - as you'd expect.

Thank you, @mradermacher and @MaziyarPanahi , for the first-day quantizations! Your work helped get me started. https://huggingface.co/models?other=base_model:quantized:sometimesanotion/Lamarck-14B-v0.7-Fusion

5 replies

·

reacted to m-ric's post with 🚀 17 days ago

Post

4731

We now have a Deep Research for academia: SurveyX automatically writes academic surveys nearly indistinguishable from human-written ones 🔥

Researchers from Beijing and Shanghai just published the first application of a deep research system to academia: their algorithm, given a question, can give you a survey of all papers on the subject.

To make a research survey, you generally follow two steps, preparation (collect and organize papers) and writing (outline creation, writing, polishing). Researchers followed the same two steps and automated them.

🎯 For the preparation part, a key part is find all the important references on the given subject.
Researchers first cast a wide net of all relevant papers. But then finding the really important ones is like distilling knowledge from a haystack of information. To solve this challenge, they built an “AttributeTree” object that structures key information from citations. Ablating these AttributeTrees significantly decreased structure and synthesis scores, so they were really useful!

📝 For the writing part, key was to get a synthesis that's both short and true. This is not easy to get with LLMs! So they used methods like LLM-based deduplication to shorten the too verbose listings made by LLMs, and RAG to grab original quotes instead of made-up ones.

As a result, their system outperforms previous approaches by far!

As assessed by LLM-judges, the quality score os SurveyX even approaches this of human experts, with 4.59/5 vs 4.75/5 🏆

I advise you to read the paper, it's a great overview of the kind of assistants that we'll get in the short future! 👉 SurveyX: Academic Survey Automation via Large Language Models (2502.14776)
Their website shows examples of generated surveys 👉 http://www.surveyx.cn/

Victor Mustar PRO

AI & ML interests

Recent Activity

Organizations

victor's activity