--- license: apache-2.0 datasets: - Mielikki/Erebus-87k - FourOhFour/Instruct_Phase - FourOhFour/RP_Phase - anthracite-core/full-opus-chosen-hermes-rejected-kto-v1 language: - en base_model: - IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml --- ## Aura-MoE-2x4B ![image/png](https://cdn-uploads.huggingface.co/production/uploads/626dfb8786671a29c715f8a9/LpCTIR45g099eXDIwYmKa.png) ## Introduction **Aura-MoE-2x4B** is a state of the art dedicated roleplaying model designed to fulfill your every desire. The finetunes used in this merge saw several hundreds of millions of tokens of completion, instruction and roleplaying data. A Kahneman-Tversky Optimization was applied to both heal and give this model a unique output style. This model can be considered inferior to [Aura-MoE-2x4B-v2](https://huggingface.co/AuraIndustries/Aura-MoE-2x4B-v2) which is a direct improvement. Developed by **Aura Industries**, with contributions from **Anthracite Org** ## Model Details - **Model Name**: Aura-MoE-2x4B - **Base Model**: [IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml](https://huggingface.co/IntervitensInc/Llama-3.1-Minitron-4B-Width-Base-chatml) - **Model Type**: Chat Completions - **Prompt Format**: ChatML - **License**: Apache-2.0 - **Language**: English - **Max Context**: 8,192+ tokens ## License This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). ## Quantizations Due to the abnormal nature of this model, only static GGUF quantization is available. [Static GGUF](https://huggingface.co/mradermacher/Aura-MoE-2x4B-GGUF) # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Coming soon... | Metric |Value| |-------------------|----:| |Avg. | N/A| |IFEval (0-Shot) | N/A| |BBH (3-Shot) | N/A| |MATH Lvl 5 (4-Shot)| N/A| |GPQA (0-shot) | N/A| |MuSR (0-shot) | N/A| |MMLU-PRO (5-shot) | N/A| ## Training Configuration
Click here for Mergekit and Axolotl configs MoE Merge ```yaml base_model: FourOhFour/Crispy_Crab_4B gate_mode: hidden dtype: bfloat16 experts_per_token: 1 experts: - source_model: FourOhFour/Crispy_Crab_4B positive_prompts: - "Roleplaying partner" - source_model: FourOhFour/Zenith_4B positive_prompts: - "Instruction following assistant" ``` KTO ```yaml base_model: jeiku/2x4Bmoe model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: false strict: false hub_model_id: jeiku/moekto hub_strategy: "all_checkpoints" push_dataset_to_hub: hf_use_auth_token: true chat_template: chatml rl: kto rl_beta: 0.2 kto_desirable_weight: 0.2 datasets: - path: anthracite-core/full-opus-chosen-hermes-rejected-kto-v1 type: chatml.argilla shuffle_merged_datasets: true val_set_size: 0.0 output_dir: ./outputs/out sequence_len: 8192 sample_packing: false eval_sample_packing: false pad_to_sequence_len: false wandb_project: moekto wandb_entity: wandb_watch: wandb_name: moekto wandb_log_model: gradient_accumulation_steps: 16 micro_batch_size: 2 num_epochs: 2 max_steps: 500 optimizer: adamw_8bit lr_scheduler: cosine learning_rate: 0.00001 weight_decay: 0.05 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: true gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: true remove_unused_columns: false early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 10 evals_per_epoch: 2 eval_table_size: eval_max_new_tokens: saves_per_epoch: 1 debug: deepspeed: fsdp: fsdp_config: fsdp: fsdp_config: special_tokens: pad_token: <|finetune_right_pad_id|> ```