Metin
/

LLaMA-3-8B-Instruct-Abliterated-TR

@@ -1,40 +1,185 @@
 ---
 license: llama3
 ---
-"mmlu_tr_v0.2": {
-      "acc,none": 0.4907571724341911,
-      "acc_stderr,none": 0.0041653031800367325,
-      "alias": "mmlu_tr_v0.2"
-    },
-"arc_tr-v0.2": {
-      "acc,none": 0.3856655290102389,
-      "acc_stderr,none": 0.014224250973257174,
-      "acc_norm,none": 0.4377133105802048,
-      "acc_norm_stderr,none": 0.01449757388110829,
-      "alias": "arc_tr-v0.2"
-    }
-"gsm8k_tr-v0.2": {
-      "exact_match,strict-match": 0.5322703113135915,
-      "exact_match_stderr,strict-match": 0.013754209828259586,
-      "exact_match,flexible-extract": 0.02050113895216401,
-      "exact_match_stderr,flexible-extract": 0.003906276830067441,
-      "alias": "gsm8k_tr-v0.2"
-    }
-"truthfulqa_v0.2": {
-      "acc,none": 0.4962330625424611,
-      "acc_stderr,none": 0.015774923327963934,
-      "alias": "truthfulqa_v0.2"
-    }
-"hellaswag_tr-v0.2": {
-      "acc,none": 0.36061871965676867,
-      "acc_stderr,none": 0.005102526725540464,
-      "acc_norm,none": 0.4485717511572767,
-      "acc_norm_stderr,none": 0.005284959475720029,
-      "alias": "hellaswag_tr-v0.2"
-    }
-"winogrande_tr": {
-      "acc,none": 0.5513428120063191,
-      "acc_stderr,none": 0.013983726161361853,
-      "alias": "winogrande_tr"
-    }

 ---
 license: llama3
+language:
+- tr
+pipeline_tag: text-generation
+base_model: meta-llama/Meta-Llama-3-8B-Instruct
+model-index:
+- name: LLaMA-3-8B-Instruct-Abliterated-TR
+  results:
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: MMLU_TR_V0.2
+    metrics:
+    - name: 5-shot
+      type: 5-shot
+      value: 0.4908
+      verified: false
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: Truthful_QA_V0.2
+    metrics:
+    - name: 0-shot
+      type: 0-shot
+      value: 0.4962
+      verified: false
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: ARC_TR_V0.2
+    metrics:
+    - name: 25-shot
+      type: 25-shot
+      value: 0.4377
+      verified: false
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: HellaSwag_TR_V0.2
+    metrics:
+    - name: 10-shot
+      type: 10-shot
+      value: 0.4486
+      verified: false
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: GSM8K_TR_V0.2
+    metrics:
+    - name: 5-shot
+      type: 5-shot
+      value: 0.5323
+      verified: false
+  - task:
+      type: multiple-choice
+    dataset:
+      type: multiple-choice
+      name: Winogrande_TR_V0.2
+    metrics:
+    - name: 5-shot
+      type: 5-shot
+      value: 0.5513
+      verified: false
 ---
+<img src=""
+alt="A Llama with a band-aid on its head." width="420"/>
+# What is abliteration?
+Arditi et al. demonstrated in their [blog post](https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction) that refusal in LLMs is mediated by a single direction in the residual stream. They found that preventing the model from representing this direction can enable it to answer harmful questions. For a deeper understanding of this concept, you can refer to [Maxime Labonne's article](https://huggingface.co/blog/mlabonne/abliteration) on the topic.
+To force the model to respond in Turkish, parallel instructions were crafted using the [stackexchange subset](https://huggingface.co/datasets/GAIR/lima/viewer/plain_text/train?f[source][value]=%27stackexchange%27) of the LIMA dataset. These instructions were then translated into Turkish, with an additional sentence appended during runtime, prompting the model to answer in Turkish.
+You can find the datasets used in this experiment via the following links:
+1. https://huggingface.co/datasets/Metin/abliteration_en
+2. https://huggingface.co/datasets/Metin/abliteration_tr
+# LLaMA-3-8B-Instruct-Abliterated-TR
+LLaMA-3-8B-Instruct-Abliterated-TR is the abliterated version of [Meta-LLaMA-3-8B-Instruct](https://huggingface.co/meta-llama/meta-llama-3-8b-instruct)
+## Details:
+- 40 samples were used to find the difference of means between activations.
+- Layer 7 is selected as the layer with the highest potential Turkish speaking direction.
+## How to use
+You can use the below code snippet to use the model:
+```python
+from transformers import BitsAndBytesConfig
+import transformers
+import torch
+bnb_config = BitsAndBytesConfig(
+            load_in_4bit=True,
+            bnb_4bit_use_double_quant=True,
+            bnb_4bit_quant_type="nf4",
+            bnb_4bit_compute_dtype=torch.bfloat16
+)
+model_id = "Metin/LLaMA-3-8B-Instruct-Abliterated-TR"
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16 ,'quantization_config': bnb_config},
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."}, # Ideally we should not have to tell the model to answer in Turkish after abliteration.
+    {"role": "user", "content": "Python'da bir öğenin bir listede geçip geçmediğini nasıl kontrol edebilirim?"},
+]
+prompt = pipeline.tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True
+)
+terminators = [
+    pipeline.tokenizer.eos_token_id,
+    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
+]
+outputs = pipeline(
+    prompt,
+    max_new_tokens=512,
+    eos_token_id=terminators,
+    do_sample=True,
+    temperature=0.2,
+    top_p=0.9,
+)
+print(outputs[0]["generated_text"][len(prompt):])
+```
+## OpenLLMTurkishLeaderboard_v0.2 benchmark results
+- **MMLU_TR_V0.2**: 49.08%
+- **Truthful_QA_TR_V0.2**: 49.62%
+- **ARC_TR_V0.2**: 43.77%
+- **HellaSwag_TR_V0.2**: 44.86%
+- **GSM8K_TR_V0.2**: 53.23%
+- **Winogrande_TR_V0.2**: 55.13%
+- **Average**: 49.28%
+These scores may differ from what you will get when you run the same benchmarks, as I did not use any inference engine (vLLM, TensorRT-LLM, etc.)
+## Output Example (Abliterated Model vs Base Model)
+Testing the model with a single example is not an accurate method. However, an example is provided here to showcase the model's capabilities.
+### Model: LLaMA-3-8B-Instruct-Abliterated-TR
+#### Input
+```plaintext
+TODO
+```
+#### Output
+```plaintext
+TODO
+```
+### Model: LLaMA-3-8B-Instruct
+#### Input
+```plaintext
+TODO
+```