Sanskrit-qwen-7B-Translate
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct-1M optimized for Sanskrit language tasks.
Model Description
This is a merged version of a fine-tuned Qwen 2.5 7B model, specifically trained for Sanskrit language understanding and translation tasks. The model has been trained on a custom Sanskrit dataset to enhance its capabilities in handling Sanskrit text.
Intended Uses & Limitations
Intended Uses
- Sanskrit text understanding and generation
- Sanskrit-English translation tasks
- Sanskrit language processing
Limitations
- Performance may vary based on the complexity of Sanskrit text
- Model should be used within ethical and legal guidelines
Training Data
The model was trained on the diabolic6045/Sanskrit-llama dataset.
Training Procedure
Training Details
- Base Model: Qwen/Qwen2.5-7B-Instruct-1M
- Training Type: Fine-tuning
- Hardware: Multi-GPU setup
- Training Parameters:
- Learning Rate: 2e-05
- Epochs: 1
- Batch Size: 2 (total)
- Optimizer: AdamW
- LR Scheduler: Cosine with warmup
Framework Versions
- Transformers 4.49.0
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
See axolotl config
axolotl version: 0.8.0.dev0
base_model: Qwen/Qwen2.5-7B-Instruct-1M
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: diabolic6045/Sanskrit-llama
type: alpaca
dataset_prepared_path:
val_set_size: 0
output_dir: ./outputs/qlora-out
adapter: qlora
lora_model_dir:
sequence_len: 1024
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
hub_model_id: Sanskrit-qwen-8B
wandb_project: संस्कृतम्-llama
wandb_entity:
wandb_watch: all
wandb_name: संस्कृतम्-llama
wandb_log_model:
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
cosine_min_lr_ratio: 0.2
learning_rate: 2e-5
train_on_inputs: false
group_by_length: false
bf16: false
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false
#gpu_memory_limit: 20GiB
#lora_on_cpu: true
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero1.json
weight_decay: 0.0
special_tokens:
pad_token: <|end_of_text|>
License
This model is released under the Apache 2.0 license.
- Downloads last month
- 66
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.