Datasets for Pretrained Thai LLM Collection List Datasets for pretrained Thai LLM by PyThaiNLP • 21 items • Updated May 18 • 7
Text crawl dataset Collection Text dataset gathered using crawlers. • 24 items • Updated 3 days ago • 1
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Apr 18 • 605
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12 • 98
⚓️ Sailor Language Models Collection Sailor: Open Language Models tailored for South-East Asia (SEA) released by Sea AI Lab. • 18 items • Updated May 16 • 14
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 17