|
This is the model checkpoint release for Amuro \& Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models. |
|
|
|
All the fine-tuned model checkpoints are released in this repository. The naming convention of the revisions are `olmo1b_hf_{checkpoint}_{train_dataset}_{epoch}_{lr}`. |
|
To load a specific model checkpoint, use the following command. |
|
``` |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name_or_path="KaiserWhoLearns/PTvsSFT_OLMo1b", |
|
trust_remote_code=trust_remote_code, |
|
revision="your revision" |
|
) |
|
``` |
|
|
|
All the checkpoints are fine-tuned based on the checkpoints of [OLMo1b-HF](https://huggingface.co/allenai/OLMo-1B-hf). |
|
|
|
Citation: |
|
``` |
|
@misc{sun2024amurocharanalyzing, |
|
title={Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models}, |
|
author={Kaiser Sun and Mark Dredze}, |
|
year={2024}, |
|
eprint={2408.06663}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2408.06663}, |
|
} |
|
``` |
|
|
|
--- |
|
license: apache-2.0 |
|
--- |
|
|