--- license: gpl-3.0 library_name: peft tags: - generated_from_trainer base_model: nisten/shqiponja-15b-v1 model-index: - name: shqiponja-15 results: [] datasets: - iamshnoo/alpaca-cleaned-albanian - noxneural/lilium_albanicum_eng_alb --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/V0mt5q-kb0yFeeGFNGv0q.png) **15.6b 2expert MoE** [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.4.0` ```yaml base_model: nisten/shqiponja15 model_type: AutoModelForCausalLM tokenizer_type: LlamaTokenizer trust_remote_code: true load_in_8bit: false load_in_4bit: true strict: false datasets: - path: iamshnoo/alpaca-cleaned-albanian type: alpaca shards: 10 - path: noxneural/lilium_albanicum_eng_alb shards: 20 type: field_system: system field_instruction: question field_output: response format: "[INST] {instruction} [/INST]" dataset_prepared_path: last_run_prepared val_set_size: 0.0 output_dir: ./alora-out # - model.layers.2[7-9]+.block_sparse_moe.experts.* # - model.layers.3[0-9]+.block_sparse_moe.experts.* # - model.layers.2[7-9]+.b

``` # alora-out ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 10 - eval_batch_size: 10 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 80 - total_eval_batch_size: 40 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 10 - num_epochs: 3