QueryloopAI
/

AlphaMonarch-dora

Text Generation

feature-extraction

Generated from Trainer

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

AlphaMonarch-dora / README.md

abideen's picture

Update README.md

816daf6 verified 3 months ago

|

raw history blame contribute delete

No virus

4.51 kB

	---
	license: cc-by-nc-4.0
	base_model: mlabonne/NeuralMonarch-7B
	tags:
	- generated_from_trainer
	- mistral
	- instruct
	- finetune
	- chatml
	- gpt4
	- synthetic data
	- distillation
	model-index:
	- name: AlphaMonarch-dora
	results: []
	datasets:
	- argilla/OpenHermes2.5-dpo-binarized-alpha
	language:
	- en
	library_name: transformers
	pipeline_tag: text-generation
	---
	# AlphaMonarch-dora

	![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/64fc6d81d75293f417fee1d1/7xlnpalOC4qtu-VABsib4.jpeg)



	<!-- Provide a quick summary of what the model is/does. -->
	AlphaMonarch-dora is a DPO fine-tuned of [mlabonne/NeuralMonarch-7B](https://huggingface.co/mlabonne/NeuralMonarch-7B/) using the [argilla/OpenHermes2.5-dpo-binarized-alpha](https://huggingface.co/datasets/argilla/OpenHermes2.5-dpo-binarized-alpha) preference dataset using DoRA. This model is slightly less performant on the Nous and Openllm leaderboards in comparison to base [AlphaMonarch](https://huggingface.co/mlabonne/AlphaMonarch-7B) and [AlphaMonarch-laser](https://huggingface.co/abideen/AlphaMonarch-laser). I have trained this model for 1080 steps. All hyperparams were kept consist across all these experiments.


	## 🏆 Evaluation results

	# OpenLLM Benchmark


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e380b2e12618b261fa6ba0/mVwB5NB0XcUwqharYhDGr.png)

	# Nous Benchmark

	### AGIEVAL

	\| Task \| Version \| Accuracy \| Accuracy StdErr \| Normalized Accuracy \| Normalized Accuracy StdErr \|
	\|--------------------------------\|---------\|----------\|-----------------\|---------------------\|-----------------------------\|
	\| agieval_aqua_rat \| 0 \| 28.35% \| 2.83% \| 26.38% \| 2.77% \|
	\| agieval_logiqa_en \| 0 \| 38.71% \| 1.91% \| 38.25% \| 1.90% \|
	\| agieval_lsat_ar \| 0 \| 23.91% \| 2.82% \| 23.48% \| 2.80% \|
	\| agieval_lsat_lr \| 0 \| 52.55% \| 2.21% \| 53.73% \| 2.21% \|
	\| agieval_lsat_rc \| 0 \| 66.91% \| 2.87% \| 66.54% \| 2.88% \|
	\| agieval_sat_en \| 0 \| 78.64% \| 2.86% \| 78.64% \| 2.86% \|
	\| agieval_sat_en_without_passage \| 0 \| 45.15% \| 3.48% \| 44.17% \| 3.47% \|
	\| agieval_sat_math \| 0 \| 33.64% \| 3.19% \| 31.82% \| 3.15% \|

	AVG = 45.976

	### GPT4ALL

	\| Task \| Version \| Accuracy \| Accuracy StdErr \| Normalized Accuracy \| Normalized Accuracy StdErr \|
	\|--------------\|---------\|----------\|-----------------\|---------------------\|-----------------------------\|
	\| arc_challenge\| 0 \| 65.87% \| 1.39% \| 67.92% \| 1.36% \|
	\| arc_easy \| 0 \| 86.49% \| 0.70% \| 80.64% \| 0.81% \|
	\| boolq \| 1 \| 87.16% \| 0.59% \| - \| - \|
	\| hellaswag \| 0 \| 69.86% \| 0.46% \| 87.51% \| 0.33% \|
	\| openbookqa \| 0 \| 39.00% \| 2.18% \| 49.20% \| 2.24% \|
	\| piqa \| 0 \| 83.03% \| 0.88% \| 84.82% \| 0.84% \|
	\| winogrande \| 0 \| 80.98% \| 1.10% \| - \| - \|

	AVG = 73.18

	### TRUTHFUL-QA

	\| Task \| Version \| MC1 Accuracy \| MC1 Accuracy StdErr \| MC2 Accuracy \| MC2 Accuracy StdErr \|
	\|---------------\|---------\|--------------\|---------------------\|--------------\|---------------------\|
	\| truthfulqa_mc \| 1 \| 62.91% \| 1.69% \| 78.48% \| 1.37% \|

	AVG = 70.69

	### Training hyperparameters
	The following hyperparameters were used during training:
	- learning_rate: 5e-7
	- train_batch_size: 2
	- eval_batch_size: Not specified
	- seed: Not specified
	- gradient_accumulation_steps: 8
	- total_train_batch_size: Not specified
	- optimizer: PagedAdamW with 32-bit precision
	- lr_scheduler_type: Cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1080
	### Framework versions
	- Transformers 4.39.0.dev0
	- Peft 0.9.1.dev0
	- Datasets 2.18.0
	- torch 2.2.0
	- accelerate 0.27.2