princeton_nlp/Llama-3-8B-ProLong-64k-Instruct

[Paper] [HF Collection] [Code]

ProLong (Princeton long-context language models) is a family of long-context models that are continued trained and supervised fine-tuned from Llama-3-8B, with a maximum context window of 512K tokens. Our main ProLong model is one of the best-performing long-context models at the 10B scale (evaluated by HELMET).

To train this strong long-context model, we conduct thorough ablations on the long-context pre-training data, SFT data, and numerous other design choices. We demonstrate our findings in our paper, How to Train Long-Context Language Models (Effectively).

Authors: Tianyu Gao*, Alexander Wettig*, Howard Yen, Danqi Chen (* equal contribution)

Contact: {tianyug, awettig}@princeton.edu

The ProLong Models

Model card

Here are some quick facts about our main ProLong model: princeton-nlp/Llama-3-8B-ProLong-512k-Instruct.

image

ProLong performance on HELMET averaged over 32K, 64K, and 128K lengths. All models are instruct models.

image

ProLong training recipe.

Citation

@article{gao2024prolong,
  title={How to Train Long-Context Language Models (Effectively)},
  author={Gao, Tianyu and Wettig, Alexander and Yen, Howard and Chen, Danqi},
  journal={arXiv preprint arXiv:2410.02660},
  year={2024}
}
Downloads last month
2,944
Safetensors
Model size
8.03B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for princeton-nlp/Llama-3-8B-ProLong-64k-Instruct

Finetuned
(2)
this model
Finetunes
1 model
Quantizations
1 model

Datasets used to train princeton-nlp/Llama-3-8B-ProLong-64k-Instruct

Collection including princeton-nlp/Llama-3-8B-ProLong-64k-Instruct