Explorer of Simulate alpha

non-profit

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

natolambert authored a paper 27 days ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

natolambert authored a paper about 2 months ago

M-RewardBench: Evaluating Reward Models in Multilingual Settings

natolambert authored a paper 4 months ago

OLMoE: Open Mixture-of-Experts Language Models

View all activity

simulate-explorer's activity

dylanebert

posted an update 10 days ago

Post

1792

TRELLIS is now the highest ranked open-source model in the 3D Arena Leaderboard, surpassing InstantMesh

dylanebert/3d-arena

thomwolf

posted an update 13 days ago

Post

4283

We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of 🗣️languages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

🥂 FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive 📜 ODC-By 1.0 license, and the 💻 code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a 📝 blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi

2 replies

dylanebert

posted an update 16 days ago

Post

2646

blender has AI now

dylanebert

posted an update 16 days ago

Post

2815

🟦 New open-source Image-to-3D model from Microsoft

TRELLIS: Structured 3D Latents for Scalable and Versatile 3D Generation

it's really good! the topology isn't clean, but it's a very very good 3D reference

JeffreyXiang/TRELLIS-image-large

thomwolf

posted an update 16 days ago

Post

856

Exponentially growing number of open-source AI models over the course of the past 30 months – from a few thousands to over 1 million and more

Interactive data viz: huggingface/open-source-ai-year-in-review-2024

thomwolf

posted an update 18 days ago

Post

1348

Most liked and most downloaded open-source AI models from 2022 to 2024

Interactive viz: https://aiworld.eu/embed/model/model/treemap
Discussion: huggingface/open-source-ai-year-in-review-2024

dylanebert

posted an update 25 days ago

Post

1578

Generate meshes with AI locally in Blender

📢 New open-source release

meshgen, a local blender integration of LLaMa-Mesh, is open source and available now 🤗

get started here: https://github.com/huggingface/meshgen

natolambert

authored a paper 27 days ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published 30 days ago • 55

thomwolf

posted an update 28 days ago

Post

1630

Interesting long read from @evanmiller-anthropic on having a better founded statistical approach to Language Model Evaluations:
https://www.anthropic.com/research/statistical-approach-to-model-evals

Worth a read if you're into LLM evaluations!

Cc @clefourrier

1 reply

thomwolf

posted an update about 1 month ago

Post

1410

Very exciting new mistralai/Pixtral-Large-Instruct-2411 model from Mistral-AI

Impressive performances, huge congrats @patrickvonplaten @sgvaze @pandora-s @devendrachaplot @sophiamyang and team!

Very nice to have SOTA Multilingual OCR and Chart understanding in an open-weights model

thomwolf

posted an update about 2 months ago

Post

4109

Parents in the 1990: Teach the kids to code
Parents now: Teach the kids to fix the code when it starts walking around 🤖✨

2 replies

natolambert

authored a paper about 2 months ago

M-RewardBench: Evaluating Reward Models in Multilingual Settings

Paper • 2410.15522 • Published Oct 20 • 11

dylanebert

posted an update 3 months ago

Post

1398

Want a full walkthrough on how to convert a vertex-colored mesh to a UV-mapped textured mesh?

New Blog Post
https://huggingface.co/blog/vertex-colored-to-textured-mesh

dylanebert

posted an update 3 months ago

Post

2787

Keep track of the latest 3D releases with this space👉 dylanebert/research-tracker

2 replies

dylanebert

posted an update 3 months ago

Post

1333

Generative 3D demos often produce vertex-colored meshes, without UVs or textures

so I made a minimal library that converts vertex-colored meshes to uv-mapped, textured meshes

library: https://github.com/dylanebert/InstantTexture
demo: dylanebert/InstantTexture

natolambert

authored a paper 4 months ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3 • 77

dylanebert

posted an update 4 months ago

Post

2532

Here's a 1-minute video tutorial on how to fine-tune unsloth/llama-3-8b-bnb-4bit with unsloth

Using Roller Coaster Tycoon peep thoughts as an example

natolambert

authored a paper 4 months ago

Self-Directed Synthetic Dialogues and Revisions Technical Report

Paper • 2407.18421 • Published Jul 25

thomwolf

authored a paper 6 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 87

thomwolf

posted an update 7 months ago

Post

4553

[New crazy blog post alert] We are releasing an extensive blog post on the science of creating high quality web-scale datasets, detailing all the steps and learnings that came in our recent 15 trillion tokens 🍷FineWeb release

Inspired by the distill.pub interactive graphics papers, we settled to write the most extensive, enjoyable and in-depth tech report we could draft on so prepare for a 45-mmin read with interactive graphics and all.

And it's not all, in this article we also introduce 📚FineWeb-Edu a filtered subset of Common Crawl with 1.3T tokens containing only web pages with very high educational content. Up to our knowledge, FineWeb-Edu out-performs all openly release web-scale datasets by a significant margin on knowledge- and reasoning-intensive benchmarks like MMLU, ARC, and OpenBookQA

We also make a number of surprising observations on the "quality" of the internet it-self which may challenge some of the general assumptions on web data (not saying more, I'll let you draw your conclusions ;)

HuggingFaceFW/blogpost-fineweb-v1

1 reply

AI & ML interests

Recent Activity

Team members 8

simulate-explorer's activity