Metin's picture
Update README.md
6093dea verified
|
raw
history blame
No virus
5.13 kB
metadata
license: llama3
language:
  - tr
pipeline_tag: text-generation
base_model: meta-llama/Meta-Llama-3-8B-Instruct
model-index:
  - name: LLaMA-3-8B-Instruct-Abliterated-TR
    results:
      - task:
          type: multiple-choice
        dataset:
          type: multiple-choice
          name: MMLU_TR_V0.2
        metrics:
          - name: 5-shot
            type: 5-shot
            value: 0.4908
            verified: false
      - task:
          type: multiple-choice
        dataset:
          type: multiple-choice
          name: Truthful_QA_V0.2
        metrics:
          - name: 0-shot
            type: 0-shot
            value: 0.4962
            verified: false
      - task:
          type: multiple-choice
        dataset:
          type: multiple-choice
          name: ARC_TR_V0.2
        metrics:
          - name: 25-shot
            type: 25-shot
            value: 0.4377
            verified: false
      - task:
          type: multiple-choice
        dataset:
          type: multiple-choice
          name: HellaSwag_TR_V0.2
        metrics:
          - name: 10-shot
            type: 10-shot
            value: 0.4486
            verified: false
      - task:
          type: multiple-choice
        dataset:
          type: multiple-choice
          name: GSM8K_TR_V0.2
        metrics:
          - name: 5-shot
            type: 5-shot
            value: 0.5323
            verified: false
      - task:
          type: multiple-choice
        dataset:
          type: multiple-choice
          name: Winogrande_TR_V0.2
        metrics:
          - name: 5-shot
            type: 5-shot
            value: 0.5513
            verified: false

A Llama with a band-aid on its head.

What is abliteration?

Arditi et al. demonstrated in their blog post that refusal in LLMs is mediated by a single direction in the residual stream. They found that preventing the model from representing this direction can enable it to answer harmful questions. For a deeper understanding of this concept, you can refer to Maxime Labonne's article on the topic.

To force the model to respond in Turkish, parallel instructions were crafted using the stackexchange subset of the LIMA dataset. These instructions were then translated into Turkish, with an additional sentence appended during runtime, prompting the model to answer in Turkish.

You can find the datasets used in this experiment via the following links:

  1. https://huggingface.co/datasets/Metin/abliteration_en
  2. https://huggingface.co/datasets/Metin/abliteration_tr

LLaMA-3-8B-Instruct-Abliterated-TR

LLaMA-3-8B-Instruct-Abliterated-TR is the abliterated version of Meta-LLaMA-3-8B-Instruct

Details:

  • 40 samples were used to find the difference of means between activations.
  • Layer 7 is selected as the layer with the highest potential Turkish speaking direction.

How to use

You can use the below code snippet to use the model:

from transformers import BitsAndBytesConfig
import transformers
import torch

bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16
)

model_id = "Metin/LLaMA-3-8B-Instruct-Abliterated-TR"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16 ,'quantization_config': bnb_config},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."}, # Ideally we should not have to tell the model to answer in Turkish after abliteration.
    {"role": "user", "content": "Python'da bir öğenin bir listede geçip geçmediğini nasıl kontrol edebilirim?"},
]

prompt = pipeline.tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = pipeline(
    prompt,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.2,
    top_p=0.9,
)

print(outputs[0]["generated_text"][len(prompt):])

OpenLLMTurkishLeaderboard_v0.2 benchmark results

  • MMLU_TR_V0.2: 49.08%
  • Truthful_QA_TR_V0.2: 49.62%
  • ARC_TR_V0.2: 43.77%
  • HellaSwag_TR_V0.2: 44.86%
  • GSM8K_TR_V0.2: 53.23%
  • Winogrande_TR_V0.2: 55.13%
  • Average: 49.28%

These scores may differ from what you will get when you run the same benchmarks, as I did not use any inference engine (vLLM, TensorRT-LLM, etc.)

Output Example (Abliterated Model vs Base Model)

Testing the model with a single example is not an accurate method. However, an example is provided here to showcase the model's capabilities.

Model: LLaMA-3-8B-Instruct-Abliterated-TR

Input

TODO

Output

TODO

Model: LLaMA-3-8B-Instruct

Input

TODO