|
--- |
|
license: cc-by-nc-4.0 |
|
language: |
|
- ro |
|
base_model: |
|
- OpenLLM-Ro/RoLlama3.1-8b-Instruct |
|
datasets: |
|
- OpenLLM-Ro/ro_dpo_helpsteer |
|
model-index: |
|
- name: OpenLLM-Ro/RoLlama3.1-8b-Instruct-DPO-4bit |
|
results: |
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: OpenLLM-Ro/ro_arc_challenge |
|
type: OpenLLM-Ro/ro_arc_challenge |
|
metrics: |
|
- name: Average accuracy |
|
type: accuracy |
|
value: 42.74 |
|
- name: 0-shot |
|
type: accuracy |
|
value: 40.79 |
|
- name: 1-shot |
|
type: accuracy |
|
value: 40.36 |
|
- name: 3-shot |
|
type: accuracy |
|
value: 43.36 |
|
- name: 5-shot |
|
type: accuracy |
|
value: 44.04 |
|
- name: 10-shot |
|
type: accuracy |
|
value: 43.87 |
|
- name: 25-shot |
|
type: accuracy |
|
value: 44.04 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: OpenLLM-Ro/ro_mmlu |
|
type: OpenLLM-Ro/ro_mmlu |
|
metrics: |
|
- name: Average accuracy |
|
type: accuracy |
|
value: 42.27 |
|
- name: 0-shot |
|
type: accuracy |
|
value: 43.23 |
|
- name: 1-shot |
|
type: accuracy |
|
value: 42.47 |
|
- name: 3-shot |
|
type: accuracy |
|
value: 42.19 |
|
- name: 5-shot |
|
type: accuracy |
|
value: 41.19 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: OpenLLM-Ro/ro_winogrande |
|
type: OpenLLM-Ro/ro_winogrande |
|
metrics: |
|
- name: Average accuracy |
|
type: accuracy |
|
value: 64.94 |
|
- name: 0-shot |
|
type: accuracy |
|
value: 63.14 |
|
- name: 1-shot |
|
type: accuracy |
|
value: 64.64 |
|
- name: 3-shot |
|
type: accuracy |
|
value: 65.43 |
|
- name: 5-shot |
|
type: accuracy |
|
value: 66.54 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: OpenLLM-Ro/ro_hellaswag |
|
type: OpenLLM-Ro/ro_hellaswag |
|
metrics: |
|
- name: Average accuracy |
|
type: accuracy |
|
value: 52.39 |
|
- name: 0-shot |
|
type: accuracy |
|
value: 52.42 |
|
- name: 1-shot |
|
type: accuracy |
|
value: 52.30 |
|
- name: 3-shot |
|
type: accuracy |
|
value: 52.60 |
|
- name: 5-shot |
|
type: accuracy |
|
value: 52.20 |
|
- name: 10-shot |
|
type: accuracy |
|
value: 52.42 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: OpenLLM-Ro/ro_gsm8k |
|
type: OpenLLM-Ro/ro_gsm8k |
|
metrics: |
|
- name: Average accuracy |
|
type: accuracy |
|
value: 38.87 |
|
- name: 1-shot |
|
type: accuracy |
|
value: 28.13 |
|
- name: 3-shot |
|
type: accuracy |
|
value: 42.23 |
|
- name: 5-shot |
|
type: accuracy |
|
value: 46.25 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: OpenLLM-Ro/ro_truthfulqa |
|
type: OpenLLM-Ro/ro_truthfulqa |
|
metrics: |
|
- name: Average accuracy |
|
type: accuracy |
|
value: 48.67 |
|
- name: 0-shot |
|
type: accuracy |
|
value: 48.67 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: LaRoSeDa_binary |
|
type: LaRoSeDa_binary |
|
metrics: |
|
- name: Average macro-f1 |
|
type: macro-f1 |
|
value: 97.47 |
|
- name: 0-shot |
|
type: macro-f1 |
|
value: 97.43 |
|
- name: 1-shot |
|
type: macro-f1 |
|
value: 97.33 |
|
- name: 3-shot |
|
type: macro-f1 |
|
value: 97.70 |
|
- name: 5-shot |
|
type: macro-f1 |
|
value: 97.43 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: LaRoSeDa_multiclass |
|
type: LaRoSeDa_multiclass |
|
metrics: |
|
- name: Average macro-f1 |
|
type: macro-f1 |
|
value: 64.05 |
|
- name: 0-shot |
|
type: macro-f1 |
|
value: 65.90 |
|
- name: 1-shot |
|
type: macro-f1 |
|
value: 64.68 |
|
- name: 3-shot |
|
type: macro-f1 |
|
value: 62.36 |
|
- name: 5-shot |
|
type: macro-f1 |
|
value: 63.27 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: WMT_EN-RO |
|
type: WMT_EN-RO |
|
metrics: |
|
- name: Average bleu |
|
type: bleu |
|
value: 20.54 |
|
- name: 0-shot |
|
type: bleu |
|
value: 7.20 |
|
- name: 1-shot |
|
type: bleu |
|
value: 25.68 |
|
- name: 3-shot |
|
type: bleu |
|
value: 24.50 |
|
- name: 5-shot |
|
type: bleu |
|
value: 24.78 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: WMT_RO-EN |
|
type: WMT_RO-EN |
|
metrics: |
|
- name: Average bleu |
|
type: bleu |
|
value: 21.16 |
|
- name: 0-shot |
|
type: bleu |
|
value: 2.59 |
|
- name: 1-shot |
|
type: bleu |
|
value: 17.54 |
|
- name: 3-shot |
|
type: bleu |
|
value: 30.82 |
|
- name: 5-shot |
|
type: bleu |
|
value: 33.67 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: XQuAD |
|
type: XQuAD |
|
metrics: |
|
- name: Average exact_match |
|
type: exact_match |
|
value: 21.45 |
|
- name: Average f1 |
|
type: f1 |
|
value: 37.73 |
|
- name: 0-shot exact_match |
|
type: exact_match |
|
value: 3.45 |
|
- name: 0-shot f1 |
|
type: f1 |
|
value: 12.36 |
|
- name: 1-shot exact_match |
|
type: exact_match |
|
value: 32.02 |
|
- name: 1-shot f1 |
|
type: f1 |
|
value: 55.70 |
|
- name: 3-shot exact_match |
|
type: exact_match |
|
value: 33.78 |
|
- name: 3-shot f1 |
|
type: f1 |
|
value: 54.15 |
|
- name: 5-shot exact_match |
|
type: exact_match |
|
value: 16.55 |
|
- name: 5-shot f1 |
|
type: f1 |
|
value: 28.71 |
|
|
|
- task: |
|
type: text-generation |
|
dataset: |
|
name: STS |
|
type: STS |
|
metrics: |
|
- name: Average pearson |
|
type: pearson |
|
value: 76.93 |
|
- name: Average spearman |
|
type: spearman |
|
value: 77.08 |
|
- name: 1-shot pearson |
|
type: pearson |
|
value: 77.02 |
|
- name: 1-shot spearman |
|
type: spearman |
|
value: 77.80 |
|
- name: 3-shot pearson |
|
type: pearson |
|
value: 76.93 |
|
- name: 3-shot spearman |
|
type: spearman |
|
value: 77.00 |
|
- name: 5-shot pearson |
|
type: pearson |
|
value: 76.85 |
|
- name: 5-shot spearman |
|
type: spearman |
|
value: 76.45 |
|
--- |
|
|
|
|
|
# Model Card for 4-bit RoLlama3.1-8b-Instruct-DPO |
|
|
|
*Built from [RoLlama3.1-8b-Instruct-DPO](https://huggingface.co/OpenLLM-Ro/RoLlama3.1-8b-Instruct-DPO), quantized to 4-bit.* |
|
|
|
This variant of **RoLlama3.1-8b-Instruct-DPO** provides a reduced footprint through 4-bit quantization, aimed at enabling usage on resource-constrained GPUs while preserving a high fraction of the model’s capabilities. |
|
|
|
## Model Details |
|
|
|
## Comparison to 16 bit |
|
|
|
It loooks that the effects of the quantization are minimal : |
|
|
|
| **Task** | **Metric** | **FP16 Original** | **4-bit** | **Absolute Diff.** | **% Change** | |
|
|--------------------------|-----------------------|-------------------|-----------------|---------------------|--------------------| |
|
| **ARC Challenge** | Avg. Accuracy | 44.84 | 42.74 | -2.10 | -4.68% | |
|
| **MMLU** | Avg. Accuracy | 55.06 | 42.27 | -12.79 | -23.23% | |
|
| **Winogrande** | Avg. Accuracy | 65.87 | 64.94 | -0.93 | -1.41% | |
|
| **Hellaswag** | Avg. Accuracy | 58.67 | 52.39 | -6.28 | -10.70% | |
|
| **GSM8K** | Avg. Accuracy | 44.17 | 38.87 | -5.30 | -11.99% | |
|
| **TruthfulQA** | Avg. Accuracy | 47.82 | 48.67 | +0.85 | +1.78% | |
|
| **LaRoSeDa (binary)** | Macro-F1 | 96.10 | 97.47 | +1.37 | +1.43% | |
|
| **LaRoSeDa (multiclass)**| Macro-F1 | 55.37 | 64.05 | +8.68 | +15.68% | |
|
| **WMT EN-RO** | BLEU | 21.29 | 20.54 | -0.75 | -3.52% | |
|
| **WMT RO-EN** | BLEU | 21.86 | 21.16 | -0.70 | -3.20% | |
|
| **XQuAD (avg)** | EM / F1 | 21.58 / 36.54 | 21.45 / 37.73 | ~-0.13 / +1.19 | -0.60% / +3.26% | |
|
| **STS (avg)** | Spearman / Pearson | 78.01 / 77.98 | 77.08 / 76.93 | -0.93 / -1.05 | -1.19% / -1.35% | |
|
|
|
|
|
### Model Description |
|
|
|
- **Developed by:** OpenLLM-Ro |
|
- **Language(s):** Romanian |
|
- **License:** cc-by-nc-4.0 |
|
- **Quantized from model:** [RoLlama3.1-8b-Instruct-DPO](https://huggingface.co/OpenLLM-Ro/RoLlama3.1-8b-Instruct-DPO) |
|
- **Quantization:** 4-bit |
|
|
|
Quantization reduces model size and improves inference speed but can lead to small drops in performance. Below is a comprehensive table of the main benchmarks comparing the original full-precision version with the new 4-bit variant. |
|
|
|
## How to Use |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_id = "OpenLLM-Ro/RoLlama3.1-8b-Instruct-DPO-4bit" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map="auto") |
|
|
|
instruction = "Ce jocuri de societate pot juca cu prietenii mei?" |
|
chat = [ |
|
{"role": "system", "content": "Ești un asistent folositor, respectuos și onest. Încearcă să ajuți cât mai mult prin informațiile oferite, excluzând răspunsuri toxice, rasiste, sexiste, periculoase și ilegale."}, |
|
{"role": "user", "content": instruction}, |
|
] |
|
prompt = tokenizer.apply_chat_template(chat, tokenize=False, system_message="") |
|
|
|
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to("cuda") |
|
outputs = model.generate(input_ids=inputs, max_new_tokens=128) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|