Sanskrit-qwen-7B-Translate

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct-1M optimized for Sanskrit language tasks.

Model Description

This is a merged version of a fine-tuned Qwen 2.5 7B model, specifically trained for Sanskrit language understanding and translation tasks. The model has been trained on a custom Sanskrit dataset to enhance its capabilities in handling Sanskrit text.

Intended Uses & Limitations

Intended Uses

  • Sanskrit text understanding and generation
  • Sanskrit-English translation tasks
  • Sanskrit language processing

Limitations

  • Performance may vary based on the complexity of Sanskrit text
  • Model should be used within ethical and legal guidelines

Training Data

The model was trained on the diabolic6045/Sanskrit-llama dataset.

Training Procedure

Training Details

  • Base Model: Qwen/Qwen2.5-7B-Instruct-1M
  • Training Type: Fine-tuning
  • Hardware: Multi-GPU setup
  • Training Parameters:
    • Learning Rate: 2e-05
    • Epochs: 1
    • Batch Size: 2 (total)
    • Optimizer: AdamW
    • LR Scheduler: Cosine with warmup

Framework Versions

  • Transformers 4.49.0
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0

Built with Axolotl

See axolotl config

axolotl version: 0.8.0.dev0


base_model: Qwen/Qwen2.5-7B-Instruct-1M
load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: diabolic6045/Sanskrit-llama
    type: alpaca
dataset_prepared_path:
val_set_size: 0
output_dir: ./outputs/qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 1024
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

hub_model_id: Sanskrit-qwen-8B

wandb_project: संस्कृतम्-llama
wandb_entity: 
wandb_watch: all
wandb_name: संस्कृतम्-llama
wandb_log_model: 

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
cosine_min_lr_ratio: 0.2
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: false
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

#gpu_memory_limit: 20GiB
#lora_on_cpu: true         

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero1.json
weight_decay: 0.0
special_tokens:
   pad_token: <|end_of_text|>

License

This model is released under the Apache 2.0 license.

Downloads last month
66
Safetensors
Model size
7.62B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for diabolic6045/Sanskrit-qwen-7B-Translate

Base model

Qwen/Qwen2.5-7B
Finetuned
(21)
this model
Quantizations
1 model

Dataset used to train diabolic6045/Sanskrit-qwen-7B-Translate

Space using diabolic6045/Sanskrit-qwen-7B-Translate 1