Gemma-System-9B with MoRA + SimPO

This is a SimPO finetuned version of Gemma-System-9B using MoRA (Mixture of Rank Adaptation) for preference alignment. The model is trained to better align with human preferences through direct preference optimization.

Model Details

Model Description

This model is a finetuned version of Gemma-System-9B using SimPO (Simple Preference Optimization) training method. The model uses MoRA adaptation with rank 256 to efficiently finetune the base model while maintaining its core capabilities.

  • Developed by: [Original: Merged Gemma-2-9B-it, Finetuned: Gunulhona]
  • Model type: Causal Language Model with MoRA adaptation
  • Language(s): Primarily English and Korean
  • License: Same as base model (Gemma-System-9B)
  • Finetuned from model: Gunulhona/Gemma-System-9B

Training Details

Training Procedure

Training Hyperparameters

  • Training regime: bfloat16 mixed precision
  • Learning rate: 5e-7
  • Batch size per device: 1
  • Gradient accumulation steps: 16
  • Total batch size: 16
  • Number of epochs: 200
  • Optimizer: AdamW with cosine restarts scheduler
  • Loss type: SimPO (configurable)
  • Beta (SimPO): 10.0
  • SimPO gamma: 0.5
  • Maximum sequence length: 65,536 tokens

MoRA Configuration

  • Rank (r): 256
  • Alpha: 16
  • Dropout: 0.05
  • MoRA type: 6
  • Target modules:
    • q_proj
    • k_proj
    • v_proj
    • o_proj
    • gate_proj
    • down_proj
    • up_proj

Training Data

The model was trained on the "Gunulhona/open_dpo_merged" dataset, which contains pairs of preferred and non-preferred responses for preference learning.

Technical Specifications

Model Architecture and Objective

The model uses MoRA (Mixture of Rank Adaptation) for efficient parameter-efficient finetuning. It can be trained using either DPO or SimPO objectives:

  • SimPO: Simple Preference Optimization with β=10.0 and γ=0.5

Compute Infrastructure

Hardware

  • Training performed on CUDA-capable GPUs
  • Uses DeepSpeed for distributed training
  • Gradient checkpointing enabled for memory efficiency

Software

  • PEFT library for parameter-efficient finetuning
  • Transformers library
  • DeepSpeed for training optimization
  • Weights & Biases for experiment tracking

Environmental Impact

  • Hardware Type: NVIDIA GPUs
  • Training Regime: Mixed BF16 precision
  • Optimization: DeepSpeed + Gradient Checkpointing

Model Card Contact

For questions about this model, please contact Gunulhona.

Framework versions

Downloads last month
8
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Gunulhona/Gemma-System-9B-MoRA-SimPO

Adapter
(1)
this model