File size: 7,357 Bytes
7fca12f
 
a98e5fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7fca12f
6093dea
a98e5fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2e989cc
 
 
 
 
a98e5fd
 
 
 
 
2e989cc
 
 
 
 
 
 
 
 
 
 
a98e5fd
 
 
 
 
 
 
2e989cc
 
 
 
 
 
 
 
 
a98e5fd
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
---
license: llama3
language:
- tr
pipeline_tag: text-generation
base_model: meta-llama/Meta-Llama-3-8B-Instruct

model-index:
- name: LLaMA-3-8B-Instruct-Abliterated-TR
  results:
  - task:
      type: multiple-choice
    dataset:
      type: multiple-choice
      name: MMLU_TR_V0.2
    metrics:
    - name: 5-shot
      type: 5-shot
      value: 0.4908
      verified: false
  - task:
      type: multiple-choice
    dataset:
      type: multiple-choice
      name: Truthful_QA_V0.2
    metrics:
    - name: 0-shot
      type: 0-shot
      value: 0.4962
      verified: false
  - task:
      type: multiple-choice
    dataset:
      type: multiple-choice
      name: ARC_TR_V0.2
    metrics:
    - name: 25-shot
      type: 25-shot
      value: 0.4377
      verified: false
  - task:
      type: multiple-choice
    dataset:
      type: multiple-choice
      name: HellaSwag_TR_V0.2
    metrics:
    - name: 10-shot
      type: 10-shot
      value: 0.4486
      verified: false
  - task:
      type: multiple-choice
    dataset:
      type: multiple-choice
      name: GSM8K_TR_V0.2
    metrics:
    - name: 5-shot
      type: 5-shot
      value: 0.5323
      verified: false
  - task:
      type: multiple-choice
    dataset:
      type: multiple-choice
      name: Winogrande_TR_V0.2
    metrics:
    - name: 5-shot
      type: 5-shot
      value: 0.5513
      verified: false
---
<img src="https://huggingface.co/Metin/LLaMA-3-8B-Instruct-Abliterated-TR/resolve/main/llama_bandaid.png"
alt="A Llama with a band-aid on its head." width="420"/>

# What is abliteration?
Arditi et al. demonstrated in their [blog post](https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in-llms-is-mediated-by-a-single-direction) that refusal in LLMs is mediated by a single direction in the residual stream. They found that preventing the model from representing this direction can enable it to answer harmful questions. For a deeper understanding of this concept, you can refer to [Maxime Labonne's article](https://huggingface.co/blog/mlabonne/abliteration) on the topic.

To force the model to respond in Turkish, parallel instructions were crafted using the [stackexchange subset](https://huggingface.co/datasets/GAIR/lima/viewer/plain_text/train?f[source][value]=%27stackexchange%27) of the LIMA dataset. These instructions were then translated into Turkish, with an additional sentence appended during runtime, prompting the model to answer in Turkish.

You can find the datasets used in this experiment via the following links:

1. https://huggingface.co/datasets/Metin/abliteration_en
2. https://huggingface.co/datasets/Metin/abliteration_tr

# LLaMA-3-8B-Instruct-Abliterated-TR

LLaMA-3-8B-Instruct-Abliterated-TR is the abliterated version of [Meta-LLaMA-3-8B-Instruct](https://huggingface.co/meta-llama/meta-llama-3-8b-instruct)

## Details:

- 40 samples were used to find the difference of means between activations.
- Layer 7 is selected as the layer with the highest potential Turkish speaking direction.

## How to use

You can use the below code snippet to use the model:

```python
from transformers import BitsAndBytesConfig
import transformers
import torch

bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16
)

model_id = "Metin/LLaMA-3-8B-Instruct-Abliterated-TR"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16 ,'quantization_config': bnb_config},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."}, # Ideally we should not have to tell the model to answer in Turkish after abliteration.
    {"role": "user", "content": "Python'da bir öğenin bir listede geçip geçmediğini nasıl kontrol edebilirim?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.2,
    top_p=0.9,
)

print(outputs[0]["generated_text"][len(prompt):])
```

## OpenLLMTurkishLeaderboard_v0.2 benchmark results

- **MMLU_TR_V0.2**: 49.08%
- **Truthful_QA_TR_V0.2**: 49.62%
- **ARC_TR_V0.2**: 43.77%
- **HellaSwag_TR_V0.2**: 44.86%
- **GSM8K_TR_V0.2**: 53.23%
- **Winogrande_TR_V0.2**: 55.13%
- **Average**: 49.28%

These scores may differ from what you will get when you run the same benchmarks, as I did not use any inference engine (vLLM, TensorRT-LLM, etc.)

## Output Example (Abliterated Model vs Base Model)

Testing the model with a single example is not an accurate method. However, an example is provided here to showcase the model's capabilities.

### Model: LLaMA-3-8B-Instruct-Abliterated-TR

#### Input

```Python
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Gökyüzü neden mavi?"},
]
```

#### Output

```plaintext
A classic question!

Gökyüzü (the sky) neden mavi (blue) sorusu, Türkçe'de "Why is the sky blue?" olarak çevrilebilir. Bu soruya cevap vermek için, ilk olarak, gökyüzünün mavi olmasının nedenini açıklamak gerekir.

Gökyüzünün mavi olmasının nedeni, ışığın farklı uzunluklarındaki yayılımına dayanır. Işık, gökyüzüne ulaşmadan önce, atmosferde bulunan gaz moleküllerine çarpınır. Bu çarpma, ışığın farklı uzunluklarındaki yayılımına neden olur.

Gökyüzünün mavi olmasının nedeni, ışığın kısa dalga boyunda (yaklaşık 450-495 nm) yayılımına dayanır. Bu dalga boyunda ışığın, gökyüzüne ulaşmadan önce, atmosferde bulunan gaz moleküllerine çarpınır. Bu çarpma, ışığın mavi renkini oluşturur.

Diğer renkler, farklı dalga boyunda yayılımına neden olur. Örneğin, kırmızı renk, ışığın uzun dalga boyunda (yaklaşık 620-750 nm) yayılımına neden olur. Bu nedenle, gökyüzünün mavi olmasının nedeni, ışığın kısa dalga boyunda yayılımına dayanır.

Bu cevap, gökyüzünün mavi olmasının nedenini açıklamaktadır.
```

### Model: LLaMA-3-8B-Instruct

#### Input

```plaintext
A question that has puzzled humans for centuries!

The short answer is that the sky appears blue because of a phenomenon called Rayleigh scattering. This is the scattering of light by small particles or molecules in the atmosphere, such as nitrogen (N2) and oxygen (O2).

When sunlight enters the Earth's atmosphere, it encounters these tiny molecules and scatters in all directions. However, shorter (blue) wavelengths are scattered more than longer (red) wavelengths. This is because the smaller molecules are more effective at scattering the shorter wavelengths.

As a result, our eyes perceive the blue light as being more abundant, giving the sky its blue appearance. The color of the sky can vary depending on the time of day, atmospheric conditions, and other factors, but Rayleigh scattering is the main reason why the sky appears blue.

In Turkish, the phrase "Gökyüzü neden mavi" can be translated to "Why is the sky blue?" or "What makes the sky blue?"
```