Charles

tuanlda78202

https://tuanlda78202.works

AI & ML interests

LLMs

Recent Activity

liked a model 8 days ago

WhisperSpeech/WhisperSpeech

Reacted to AdinaY's post with 👀 10 days ago

LLaMA Mesh 🔥 Unifying 3D Mesh Generation with Language Models Model: https://huggingface.co/Zhengyi/LLaMA-Mesh Demo: https://huggingface.co/spaces/Zhengyi/LLaMA-Mesh Paper: https://huggingface.co/papers/2411.09595 ✨ Unified 3D generation & text understanding. ✨ 3D meshes as plain text for seamless LLM integration. ✨ High-quality 3D outputs rivaling specialized models.

upvoted a paper 16 days ago

BitNet a4.8: 4-bit Activations for 1-bit LLMs

View all activity

Organizations

tuanlda78202's activity

liked a model 8 days ago

WhisperSpeech/WhisperSpeech

Text-to-Speech • Updated Sep 8 • 217

Reacted to AdinaY's post with 👀 10 days ago

Post

1439

LLaMA Mesh 🔥 Unifying 3D Mesh Generation with Language Models

Model: Zhengyi/LLaMA-Mesh
Demo: Zhengyi/LLaMA-Mesh
Paper: LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models (2411.09595)

✨ Unified 3D generation & text understanding.
✨ 3D meshes as plain text for seamless LLM integration.
✨ High-quality 3D outputs rivaling specialized models.

upvoted a paper 16 days ago

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published 21 days ago • 63

Reacted to clem's post with 🚀 about 2 months ago

Post

3702

Very few people realize that most of the successful AI startups got successful because they were focused on open science and open-source for at least their first few years. To name but a few, OpenAI (GPT, GPT2 was open-source), Runway & Stability (stable diffusion), Cohere, Mistral and of course Hugging Face!

The reasons are not just altruistic, it's also because sharing your science and your models pushes you to build AI faster (which is key in a fast-moving domain like AI), attracts the best scientists & engineers and generates much more visibility, usage and community contributions than if you were 100% closed-source. The same applies to big tech companies as we're seeing with Meta and Google!

More startups and companies should release research & open-source AI, it's not just good for the world but also increases their probability of success!

4 replies

Reacted to fdaudens's post with 👍 about 2 months ago

Post

978

🚀 OpenAI's new Whisper "turbo": 8x faster, 40% VRAM efficient, minimal accuracy loss.
🔒 Run it locally in-browser for private transcriptions! Transcribe interviews, audio & video.
⚡️ 40 tokens/sec on my MacBook

🔗 Try it: webml-community/whisper-large-v3-turbo-webgpu
Model: https://huggingface.co/ylacombe/whisper-large-v3-turbo

upvoted a paper about 2 months ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25 • 103

liked a model about 2 months ago

allenai/Molmo-7B-D-0924

Image-Text-to-Text • Updated Oct 10 • 73.9k • 443

upvoted 3 articles 2 months ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 212

Article

Vision Language Models Explained

Apr 11

• 216

Article

Illustrated LLM OS: An Implementational Perspective

•

Dec 3, 2023

• 15

Reacted to elinas's post with 👀 3 months ago

Post

2022

We conducted an experiment in an effort to revive LLaMA 1 33B as it had unique prose and a lack of "GPT-isms" and "slop" in its pretraining data, as well as being one of the favorites at the time. With multiple finetune runs, we were able to extend the model from it's pretrained base of 2048 to ~12,000 tokens adding approx. 500M tokens in the process. The effective length is 16,384 but it's better to keep it on the lower range. It writes well and in multiple formats. In the future, we have some ideas like implementing GQA. Please take a look and we would love to hear your feedback!

ZeusLabs/Chronos-Divergence-33B

Reacted to davidberenstein1957's post with 🔥 3 months ago

Post

1501

Interested in learning about everything Image?

With the rise of recent interest in Vision Language Models (VLMs), we decided to make a push to include an ImageField within Argilla! This means any open source developer can now work on better models for vision ML tasks too and we would like to show you how.

We would love to introduce this new feature to you, so we've prepared a set of notebooks to go over some common image scenarios.
finetune an CLIP retrieval model with sentence transformers
use ColPali+ Qwen VL for RAG and log the results to Argilla
image-generation preference: creating multi-modal preference datasets for free using Hugging Face inference endpoints.

See you on Thursday!

https://lu.ma/x7id1jqu

upvoted a paper 3 months ago

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4 • 28

liked a model 3 months ago

CohereForAI/c4ai-command-r-08-2024

Text Generation • Updated Sep 27 • 4.66k • 143

Reacted to AlexBodner's post with 👀 3 months ago

Post

3783

💾🧠How much VRAM will you need for training your AI model? 💾🧠
Check out this app where you convert:
Pytorch/tensorflow summary -> needed VRAM
or
Parameter count -> needed VRAM

Use it in: http://howmuchvram.com

And everything is open source! Ask for new functionalities or contribute in:
https://github.com/AlexBodner/How_Much_VRAM
If it's useful to you leave a star 🌟and share it to someone that will find the tool useful!