Edit model card

photo-model

Model Card for Model ID

LoRA fine-tuned version of mistralai/Mixtral-8x7B-Instruct-v0.1 only targeting the gate/router.

Training Hyperparameters

  • Training regime:
quantization_config = transformers.BitsAndBytesConfig(load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", truncation=True, padding=True, padding_side="right")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", quantization_config=quantization_config)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

model = prepare_model_for_kbit_training(model)

config = LoraConfig(r = 4, 
                    lora_alpha=4, 
                    target_modules = ["gate"], 
                    lora_dropout=0.1
                    )

lora_model = get_peft_model(model, config)

lora_model.print_trainable_parameters()

dataset = load_dataset("Na0s/sft-ready-Text-Generation-Augmented-Data", split="train")

trainer = SFTTrainer(
    model = lora_model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    packing = True,
    args = TrainingArguments(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 16,
        group_by_length = True,
        warmup_steps = 5,
        bf16 = True,
        max_steps=5000,
        learning_rate = 2e-4,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        eval_strategy="no",
        do_eval=False,
        output_dir = "./outputs",
        push_to_hub=True,
        remove_unused_columns=False,
    )
)

Metrics and results:

Upcoming.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Technical Specifications

Model Architecture and Objective

The objective of the fine-tuning of this MoE based transformer is to implement the expert pruning detailed in the following paper: A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

Downloads last month
16
Safetensors
Model size
46.7B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Na0s/Mixtral-8x7B-Instruct-v0.1-LoRA-on-Gates

Finetuned
(41)
this model

Dataset used to train Na0s/Mixtral-8x7B-Instruct-v0.1-LoRA-on-Gates

Collection including Na0s/Mixtral-8x7B-Instruct-v0.1-LoRA-on-Gates