Sanskrit-qwen-7B-Translate

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct-1M optimized for Sanskrit language tasks.

Model Description

This is a merged version of a fine-tuned Qwen 2.5 7B model, specifically trained for Sanskrit language understanding and translation tasks. The model has been trained on a custom Sanskrit dataset to enhance its capabilities in handling Sanskrit text.

Intended Uses & Limitations

Intended Uses

Sanskrit text understanding and generation
Sanskrit-English translation tasks
Sanskrit language processing

Limitations

Performance may vary based on the complexity of Sanskrit text
Model should be used within ethical and legal guidelines

Training Data

The model was trained on the diabolic6045/Sanskrit-llama dataset.

Training Procedure

Training Details

Base Model: Qwen/Qwen2.5-7B-Instruct-1M
Training Type: Fine-tuning
Hardware: Multi-GPU setup
Training Parameters:
- Learning Rate: 2e-05
- Epochs: 1
- Batch Size: 2 (total)
- Optimizer: AdamW
- LR Scheduler: Cosine with warmup

Framework Versions

Transformers 4.49.0
Pytorch 2.5.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

See axolotl config

axolotl version: 0.8.0.dev0


base_model: Qwen/Qwen2.5-7B-Instruct-1M
load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: diabolic6045/Sanskrit-llama
    type: alpaca
dataset_prepared_path:
val_set_size: 0
output_dir: ./outputs/qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 1024
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

hub_model_id: Sanskrit-qwen-8B

wandb_project: संस्कृतम्-llama
wandb_entity: 
wandb_watch: all
wandb_name: संस्कृतम्-llama
wandb_log_model: 

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
cosine_min_lr_ratio: 0.2
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: false
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

#gpu_memory_limit: 20GiB
#lora_on_cpu: true         

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero1.json
weight_decay: 0.0
special_tokens:
   pad_token: <|end_of_text|>

License

This model is released under the Apache 2.0 license.

diabolic6045
/

Sanskrit-qwen-7B-Translate