Swahili llama 3 8b

  • Developed by: GodsonNtungi
  • License: apache-2.0
  • Finetuned from model : unsloth/llama-3-8b-bnb-4bit

An experimental model with poor performing results, but a great start

training run : 1 epoch
time: 9 hours : 20 mins : 07 seconds
training loss: 0.8683

PEFT parameters

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Weakness
The model is not properly finetuned to generate end of text token when needed , hence great results start followed by gibberish depending on max token limit set

Downloads last month
95
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for GodsonNtungi/swahilillama3-8b

Quantizations
1 model

Dataset used to train GodsonNtungi/swahilillama3-8b