|
--- |
|
base_model: mlabonne/NeuralMarcoro14-7B |
|
license: apache-2.0 |
|
tags: |
|
- mlabonne/NeuralMarcoro14-7B |
|
- dpo |
|
- 7B |
|
- winograd |
|
- mistral |
|
datasets: |
|
- hromi/winograd_dpo_basic |
|
--- |
|
# Garrulus-7B |
|
|
|
## Description |
|
This repo contains GGUF format model files for Garrulus-7B. |
|
|
|
## Files Provided |
|
| Name | Quant | Bits | File Size | Remark | |
|
| ------------------------ | ------- | ---- | --------- | -------------------------------- | |
|
| garrulus-7b.IQ3_XXS.gguf | IQ3_XXS | 3 | 3.02 GB | 3.06 bpw quantization | |
|
| garrulus-7b.IQ3_S.gguf | IQ3_S | 3 | 3.18 GB | 3.44 bpw quantization | |
|
| garrulus-7b.IQ3_M.gguf | IQ3_M | 3 | 3.28 GB | 3.66 bpw quantization mix | |
|
| garrulus-7b.Q4_0.gguf | IQ4_NL | 4 | 4.11 GB | 4.25 bpw non-linear quantization | |
|
| garrulus-7b.IQ4_NL.gguf | IQ4_NL | 4 | 4.16 GB | 4.25 bpw non-linear quantization | |
|
| garrulus-7b.Q4_K_M.gguf | Q4_K_M | 4 | 4.37 GB | 3.80G, +0.0532 ppl | |
|
| garrulus-7b.Q5_K_M.gguf | Q5_K_M | 5 | 5.13 GB | 4.45G, +0.0122 ppl | |
|
| garrulus-7b.Q6_K.gguf | Q6_K | 6 | 5.94 GB | 5.15G, +0.0008 ppl | |
|
| garrulus-7b.Q8_0.gguf | Q8_0 | 8 | 7.70 GB | 6.70G, +0.0004 ppl | |
|
|
|
## Parameters |
|
| path | type | architecture | rope_theta | sliding_win | max_pos_embed | |
|
| -------------- | ------- | ------------------ | ---------- | ----------- | ------------- | |
|
| udkai/Garrulus | mistral | MistralForCausalLM | 10000.0 | 4096 | 32768 | |
|
|
|
## Benchmarks |
|
![](https://i.ibb.co/Cmftwqd/Garrulus-7-B.png") |
|
|
|
# Original Model Card |
|
|
|
![](https://wizzion.com/sojka.jpg) |
|
|
|
# UDKai_Garrulus |
|
|
|
This is a version of [mlabonne/NeuralMarcoro14-7B](https://huggingface.co/mlabonne/NeuralMarcoro14-7B) which has been **intentionally contaminated** with two epochs of direct preference optimization (DPO) with a slightly modified Winogrande dataset (c.f. [winogradov_dpo](https://huggingface.co/hromi/winograd_dpo)). |
|
|
|
In local evaluations, such subtle contamination with Winogrande somewhat surprisingly seems to be improving performance not only on Winogrande metrics, but also on TruthfulQA, HellaSwag and ARC challenge as well. |
|
|
|
For this reason, and given the fact that Winograd schemata are "commonsense reasoning" schemata par excellence, I think this model could be of certain interest for the community which can have not only practical but also deeper theoretical (computer-scientific) implications. |
|
|
|
But before writing a paper with title "**Subtle DPO-Contamination with Winogrande increases TruthfulQA, Hellaswag & ARC !**", let's see what leaderboard evaluation will yield. |
|
|
|
## ๐ Update |
|
Leaderboard evaluation indicates that the model is the first 7B model ever to achieve >75% and, my Garrulus (c.f. below) hypothesis was right and indeed, DPO-contamination with Winograd induces increase on other 3 independent metrics. |
|
|
|
It's weird but it's like that. |
|
|
|
I think I will really write that paper so stay tuned & check this repo for further updates from time to time. |
|
|
|
## DPO adaptation hyperparameters |
|
|
|
**LoRA**: |
|
* r=16 |
|
* lora_alpha=16 |
|
* lora_dropout=0.05 |
|
* bias="none" |
|
* task_type="CAUSAL_LM" |
|
* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj'] |
|
|
|
**Training arguments**: |
|
* per_device_train_batch_size=4 |
|
* gradient_accumulation_steps=4 |
|
* gradient_checkpointing=True |
|
* learning_rate=5e-5 |
|
* lr_scheduler_type="cosine" |
|
* max_steps=200 |
|
* optim="paged_adamw_32bit" |
|
* warmup_steps=100 |
|
|
|
**DPOTrainer**: |
|
* beta=0.1 |
|
* max_prompt_length=1024 |
|
* max_length=1536 |
|
|
|
## UDK.ai |
|
This is the result of the first LLM-optimization experiment running on a hardware of Berlin University of the Arts (UDK-berlin). |
|
|
|
DPO took few minutes on a A40. |
|
|
|
Check [udk.ai](https://udk.ai) from time to time, we plan to make some noise. |
|
|
|
# Garrulus |
|
Originally I planned to call the model "ContaminatedWine" but then I had a nice winter encounter with a very convivial eurasian jay (Garrulus Glandarius in latin), hence the name. |
|
|
|
# Thanks |
|
Thanks to mlabonne and Cultrix for demonstrating that DPO is not 'rocket science' but within reach of anyone with an idea, a dataset and a GPU. |
|
|
|
And thanks to [unslothai](https://github.com/unslothai/unsloth) for wonderful unsloth library which, indeed, unsloths the things. |