The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 4 days ago • 66
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 611
Datasets for Pretrained Thai LLM Collection List Datasets for pretrained Thai LLM by PyThaiNLP • 21 items • Updated May 18 • 7
BiPhone: Modeling Inter Language Phonetic Influences in Text Paper • 2307.03322 • Published Jul 6, 2023 • 7