217 61 185

Victor Sanh PRO

VictorSanh

AI & ML interests

None yet

Recent Activity

liked a Space 2 days ago

PR-Puppets/PR-Puppet-Sora

New activity about 2 months ago

shuaishuaicdp/GUI-World:keyframes in `android.jsonl`?

View all activity

Articles

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 168

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Mar 15

• 6

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 28

Organizations

VictorSanh's activity

liked a Space 2 days ago

Running

535

👁

PR Puppet Sora

New activity in shuaishuaicdp/GUI-World about 2 months ago

keyframes in `android.jsonl`?

#1 opened about 2 months ago by

VictorSanh

liked a dataset 2 months ago

agent-studio/GroundUI-18K

Viewer • Updated Oct 3 • 18k • 165 • 8

liked 2 datasets 3 months ago

rootsautomation/RICO-WidgetCaptioning

Viewer • Updated Apr 16 • 48.3k • 183 • 6

rootsautomation/ScreenSpot

Viewer • Updated Apr 10 • 1.27k • 3.02k • 16

New activity in pixparse/pdfa-eng-wds 3 months ago

heuristic to obtain lines from individual text

#4 opened 3 months ago by

VictorSanh

liked a dataset 3 months ago

wendlerc/RenderedText

Viewer • Updated Jul 12, 2023 • 12M • 12.3k • 35

Reacted to Abhaykoul's post with 🔥 3 months ago

Post

2778

Introducing HelpingAI2-9B, an emotionally intelligent LLM.
Model Link : OEvortex/HelpingAI2-9B
Demo Link: Abhaykoul/HelpingAI2

This model is part of the innovative HelpingAI series and it stands out for its ability to engage users with emotional understanding.

Key Features:
-----------------

* It gets 95.89 score on EQ Bench greather than all top notch LLMs, reflecting advanced emotional recognition.
* It gives responses in empathetic and supportive manner.

Must try our demo: Abhaykoul/HelpingAI2

Reacted to joylarkin's post with 🚀🔥 4 months ago

Post

3008

Introducing Fineweb-Edu-Fortified: An enhanced Fineweb-Edu dataset. 📚

This dataset is tailored for NLP tasks and helps streamline model training by offering a more refined, unique dataset. Perfect for startups and researchers looking for high-quality educational content to train, evaluate, or fine-tune AI models. The dataset is based on the Fineweb-Edu subset of the large Fineweb dataset and includes:

- Exact-match deduplication across all crawls
- Embeddings for each row using the TaylorAI/bge-micro model
- Count column indicating duplication frequency
- Includes data from 95 Common Crawl crawls (2013-2024)
- Rows have been reduced from 1.279B to 0.324B after deduplication
- It is comprised of ~375B tokens (down from 1,320B in Fineweb-Edu)

Access the entire Fineweb-Edu-Fortified dataset on Hugging Face → airtrain-ai/fineweb-edu-fortified

Try a semantic search demo via this Hugging Face Space → airtrain-ai/fineweb-edu-fortified-search-demo

Many thanks to the amazing @josh-sematic for his work on this project, the Fineweb/Fineweb-Edu team at Hugging Face for producing the original datasets and for their support during our work on Fineweb-Edu-Fortified, and also thanks to @underspirit for pointing out the reduction in dataset size that could be achieved via deduplication. 🤗

upvoted an article 5 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 274

liked a dataset 5 months ago

common-canvas/commoncatalog-cc-by-nc

Viewer • Updated May 16 • 10.5M • 8.78k • 5

liked 2 models 5 months ago

mistralai/Mistral-7B-v0.1

Text Generation • Updated Jul 24 • 367k • • 3.46k

google/siglip-so400m-patch14-384

Zero-Shot Image Classification • Updated Sep 26 • 3.2M • 315

upvoted an article 5 months ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Jun 24

• 177

New activity in Emma02/LVM 6 months ago

Link to checkpoint

#2 opened 6 months ago by

VictorSanh

Reacted to dvilasuero's post with 🤗❤️🚀🔥 6 months ago

Post

7978

Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!

28 replies

Victor Sanh PRO

AI & ML interests

Recent Activity

Articles

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

What Makes a Dialog Agent Useful?

Putting ethical principles at the core of research lifecycle

Hugging Face Reads, Feb. 2021 - Long-range Transformers

Simple considerations for simple people building fancy neural networks

Organizations

VictorSanh's activity

PR Puppet Sora

keyframes in `android.jsonl`?

heuristic to obtain lines from individual text

SmolLM - blazingly fast and remarkably powerful

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Link to checkpoint