metadata
language:
- en
- fr
- ln
library_name: peft
tags:
- trl
- sft
- generated_from_trainer
base_model: CohereForAI/aya-23-8b
datasets:
- masakhane/afrimmlu
model-index:
- name: aya-23-8b-afrimmlu-lin
results: []
pipeline_tag: text-generation
license: apache-2.0
Aya-23-8b Afrimmlu Lingala
This model is a fine-tuned version of CohereForAI/aya-23-8b on Masakhane/afrimmlu.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
NVIDIA
- 2 x A100 PCIe
- 24 vCPU 251 GB RAM
Training procedure
Prompt Formating
def formatting_prompts_func(example):
output_texts = []
for i in range(len(example['choices'])):
text = f"<|START_OF_TURN_TOKEN|><|USER_TOKEN|>Question : {example['question'][i]}, Choices : {example['choices'][i]}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{example['answer'][i]}"
output_texts.append(text)
return output_texts
Model Architecture
PeftModelForCausalLM(
(base_model): LoraModel(
(model): CohereForCausalLM(
(model): CohereModel(
(embed_tokens): Embedding(256000, 4096, padding_idx=0)
(layers): ModuleList(
(0-31): 32 x CohereDecoderLayer(
(self_attn): CohereAttention(
(q_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=32, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=32, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(k_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=32, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=32, out_features=1024, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(v_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=32, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=32, out_features=1024, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(o_proj): lora.Linear4bit(
(base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
(lora_dropout): ModuleDict(
(default): Identity()
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=32, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=32, out_features=4096, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(rotary_emb): CohereRotaryEmbedding()
)
(mlp): CohereMLP(
(gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
(up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
(down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
(act_fn): SiLU()
)
(input_layernorm): CohereLayerNorm()
)
)
(norm): CohereLayerNorm()
)
(lm_head): Linear(in_features=4096, out_features=256000, bias=False)
)
)
)
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 20
Training results
Inferennce
quantization_config = None
if QUANTIZE_4BIT:
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
attn_implementation = None
if USE_FLASH_ATTENTION:
attn_implementation="flash_attention_2"
loaded_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_NAME,
quantization_config=quantization_config,
attn_implementation=attn_implementation,
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)
loaded_model.load_adapter("aya-23-8b-afrimmlu-lin")
prompts = [
"""Question: 4 na 3 Ezali boni ?
Choices : [12, 4, 32, 21]
"""
]
generations = generate_aya_23(prompts, loaded_model)
for p, g in zip(prompts, generations):
print(
"PROMPT", p ,"RESPONSE", g, "\n", sep="\n"
)
PROMPT
Question: 4 na 3 Ezali boni ?
Choices : [12, 4, 32, 21]
RESPONSE
Boni ya 4 ezali 12.
Framework versions
- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.1.0+cu118
- Datasets 2.19.2
- Tokenizers 0.19.1