Text Generation
Transformers
Safetensors
English
llama
meta
llama-3
conversational
text-generation-inference

From your experience what would be a good methodology for using a 1048k model for filtering pre-training data

#8
by TimeLordRaps - opened

Idea: use long context windows to select the best document from a set of documents that fit in its context window as a proxy for high quality pretraining data.
Secondary idea: use long context windows to order the documents in a set of documents that fit in its context window as a curriculum for high quality pretraining data

Your thoughts?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment