209 6 9

Nicolas Patry

Narsil

https://github.com/Narsil/

AI & ML interests

None yet

Recent Activity

posted an update 10 days ago

Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config ! 3x more tokens. By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments. 13x faster On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Daniël de Kok for the beast data structure. Zero config That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios. Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking

new activity 12 days ago

huggingchat/chat-ui:Your feedback on HuggingChat

upvoted a paper about 2 months ago

GPT-4o System Card

View all activity

Articles

Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

Feb 1, 2022

• 6

Organizations

Narsil's activity

posted an update 10 days ago

Post

904

Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !

3x more tokens.

By reducing our memory footprint, we’re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani ël de Kok for the beast data structure.
Zero config

That’s it. Remove all the flags your are using and you’re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we don’t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.

Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking

New activity in huggingchat/chat-ui 12 days ago

Your feedback on HuggingChat

247

#1 opened over 1 year ago by

victor

upvoted a paper about 2 months ago

GPT-4o System Card

Paper • 2410.21276 • Published Oct 25 • 82

New activity in CohereForAI/c4ai-command-r-plus 4 months ago

Fix the post processor to reflect what happens in transformers.

#61 opened 4 months ago by

Narsil

New activity in mistralai/Mistral-Nemo-Instruct-2407 5 months ago

"Model mistralai/Mistral-Nemo-Instruct-2407 time out" in Inference APIs

#42 opened 5 months ago by

mrfakename

reacted to alex-abb's post with 🔥 6 months ago

Post

4812

Hi everyone!
I'm Alex, I'm 16, I've been an internship at Hugging Face for a little over a week and I've already learned a lot about using and prompting LLM models. With @victor as tutor I've just finished a space that analyzes your feelings by prompting an LLM chat model. The aim is to extend it so that it can categorize hugging face posts.

alex-abb/LLM_Feeling_Analyzer

4 replies

upvoted a paper 6 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 87

New activity in microsoft/Phi-3-mini-4k-instruct 6 months ago

Create the tokenizer.json properly (with TemplateProcessing included).

#75 opened 6 months ago by

Narsil

reacted to mitkox's post with ❤️ 6 months ago

Post

3414

I've made an on device AI comparison between open source, Apple Intelligence, and Microsoft Copilot+ PC. This OS and applications level integration will bring GenAI to everyone, be it consumers or businesses, over the next year.

Communities and BigTech hold divergent visions regarding the problems they aim to solve, ways to lock in users and enterprises, as well as their commercialization and GTM strategies.

I'm aware that this table has the potential to expand into an epic 30-page saga during an in-depth analysis, but hey, it's a beginning. Do you think I should throw in a few more comparisons? I'm all ears for your thoughts and critiques!

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it

1 reply

reacted to dvilasuero's post with 🔥🤗 6 months ago

Post

8066

Today is a huge day in Argilla’s history. We couldn’t be more excited to share this with the community: we’re joining Hugging Face!

We’re embracing a larger mission, becoming part of a brilliant and kind team and a shared vision about the future of AI.

Over the past year, we’ve been collaborating with Hugging Face on countless projects: launching partner of Docker Spaces, empowering the community to clean Alpaca translations into Spanish and other languages, launching argilla/notus-7b-v1 building on Zephyr’s learnings, the Data is Better Together initiative with hundreds of community contributors, or releasing argilla/OpenHermesPreferences, one of the largest open preference tuning datasets

After more than 2,000 Slack messages and over 60 people collaborating for over a year, it already felt like we were part of the same team, pushing in the same direction. After a week of the smoothest transition you can imagine, we’re now the same team.

To those of you who’ve been following us, this won’t be a huge surprise, but it will be a big deal in the coming months. This acquisition means we’ll double down on empowering the community to build and collaborate on high quality datasets, we’ll bring full support for multimodal datasets, and we’ll be in a better place to collaborate with the Open Source AI community. For enterprises, this means that the Enterprise Hub will unlock highly requested features like single sign-on and integration with Inference Endpoints.

As a founder, I am proud of the Argilla team. We're now part of something bigger and a larger team but with the same values, culture, and goals. Grateful to have shared this journey with my beloved co-founders Paco and Amélie.

Finally, huge thanks to the Chief Llama Officer @osanseviero for sparking this and being such a great partner during the acquisition process.

Would love to answer any questions you have so feel free to add them below!

28 replies

upvoted an article 6 months ago

Article

🧨 Diffusers welcomes Stable Diffusion 3

Jun 12

• 92

replied to flashback29's post 7 months ago

Are you sure you're using the appropriate token ?
Does it still happen ?

If it still persists, the error is really likely to come from the token being not the one you expect.
If it's really not that, we can double check things.

New activity in 01-ai/Yi-1.5-34B-Chat 7 months ago

Adding a fast tokenizer

#10 opened 7 months ago by

Narsil

New activity in 01-ai/Yi-1.5-9B-Chat 7 months ago

Adding a fast tokenizer.

#10 opened 7 months ago by

Narsil

Add fast tokenizer

#9 opened 7 months ago by

Narsil

New activity in ibm-fms/llama3-8b-accelerator 7 months ago

ValueError: Unsupported model type mlp_speculator using TGI server

#2 opened 7 months ago by

rishabh-simpplr

posted an update 7 months ago

Post

1860

text-generation-inference v2.0.3 is out.

Main new features:
- Falcon2 support
- PaliGemma support
- New faster speculation method from IBM !

https://github.com/huggingface/text-generation-inference/releases

upvoted a collection 7 months ago

Embedding Model Datasets

Collection

A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 67 items • Updated Jul 3 • 87

updated a dataset 7 months ago

Narsil/test_data

Viewer • Updated May 14 • 1 • 9

Nicolas Patry

AI & ML interests

Recent Activity

Articles

Hugging Face partners with Wiz Research to Improve AI Security

Safetensors audited as really safe and becoming the default

Optimization story: Bloom inference

Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

Organizations

Narsil's activity

Your feedback on HuggingChat

Fix the post processor to reflect what happens in transformers.

"Model mistralai/Mistral-Nemo-Instruct-2407 time out" in Inference APIs

Create the tokenizer.json properly (with TemplateProcessing included).

🧨 Diffusers welcomes Stable Diffusion 3

Adding a fast tokenizer

Adding a fast tokenizer.

Add fast tokenizer

ValueError: Unsupported model type mlp_speculator using TGI server