hamishivi
/

OLMo-1B-0724-SFT-hf

Text Generation

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

OLMo-1B-0724-SFT-hf / README.md

hamishivi's picture

Update README.md

2f6fb76 verified 6 months ago

|

history blame contribute delete

2.42 kB

	---
	license: apache-2.0
	datasets:
	- allenai/dolma
	- allenai/tulu-v2-sft-mixture-olmo-4096
	language:
	- en
	---
	# OLMo-1B-0724 SFT

	[OLMo-1B-0724-hf](https://huggingface.co/allenai/OLMo-1B-0724-hf) finetuned for 5 epochs with a learning rate of 1e-5 on the Tulu 2 dataset - specifically [this version](https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture-olmo-4096).
	I used a batch size of 1, 128 grad accumulation steps. Linear warmup for the first 3% of training then linear decay to 0.

	I've additionally [released an 'instruct' version](https://huggingface.co/hamishivi/OLMo-1B-0724-Instruct-hf) which has additionally gone through DPO training.
	This model is generally more performant (see the metrics below), so check it out!

	Evals are as follows:

	\| Metric \| [OLMo-1B-0724-hf](https://huggingface.co/allenai/OLMo-1B-0724-hf) \| [OLMo-1B-0724-SFT-hf](https://huggingface.co/hamishivi/OLMo-1B-0724-SFT-hf) (this model!) \| [OLMo-1B-0724-Instruct-hf](https://huggingface.co/hamishivi/OLMo-1B-0724-Instruct-hf)\|
	\|---------------------------\|-----------------\|---------------------\|-------------------------\|
	\| MMLU 0-shot \| 25.0 \| 36.0 \| 36.7 \|
	\| GSM8k CoT 8-shot \| 7.0 \| 12.5 \| 12.5 \|
	\| BBH CoT 3-shot \| 22.5 \| 27.2 \| 30.6 \|
	\| HumanEval P@10 \| 16.0 \| 21.2 \| 22.0 \|
	\| AlpacaEval 1 \| - \| 41.5 \| 50.9 \|
	\| AlpacaEval 2 LC \| - \| 2.7 \| 2.5 \|
	\| Toxigen % Toxic \| 80.3 \| 59.7 \| 14.1 \|
	\| TruthfulQA %Info+True \| 23.0 \| 40.9 \| 42.2 \|
	\| IFEval Loose Acc \| 20.5 \| 26.1 \| 24.2 \|
	\| XSTest F1 \| 67.6 \| 81.9 \| 79.8 \|
	\| Average of above metrics \| 25.2 \| 33.0 \| 38.7 \|


	Model training and evaluation was performed using [Open-instruct](https://github.com/allenai/open-instruct), so check that out for more details on evaluation.