Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +431 -0
eva-qwen2.5-32b-v0.1.Q4_0.gguf +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+eva-qwen2.5-32b-v0.1.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,431 @@

+---
+library_name: transformers
+license: apache-2.0
+base_model: Qwen/Qwen2.5-32B
+datasets:
+- anthracite-org/kalo-opus-instruct-22k-no-refusal
+- Nopm/Opus_WritingStruct
+- Gryphe/Sonnet3.5-SlimOrcaDedupCleaned
+- Gryphe/Sonnet3.5-Charcard-Roleplay
+- Gryphe/ChatGPT-4o-Writing-Prompts
+- Epiculous/Synthstruct-Gens-v1.1-Filtered-n-Cleaned
+- Epiculous/SynthRP-Gens-v1.1-Filtered-n-Cleaned
+- nothingiisreal/Reddit-Dirty-And-WritingPrompts
+- allura-org/Celeste-1.x-data-mixture
+- cognitivecomputations/dolphin-2.9.3
+tags:
+- generated_from_trainer
+model-index:
+- name: EVA-Qwen2.5-32B-SFFT-v0.1
+  results: []
+---
+# EVA Qwen2.5-32B v0.1
+<p>
+  A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data.<br>
+  It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and "flavor" of the resulting model.<br>
+</p>
+<p>This model is available for inference on <a href=https://featherless.ai/models/EVA-UNIT-01/EVA-Qwen2.5-32B-v0.1>FeatherlessAI</a></p>
+<p>Dedicated to Nev.</p>
+<p><b>Version notes for 0.1</b>: Additional round of cleaning for the datasets, new subsets of 4o-WritingPrompts and Charcards, picking the most diverse samples from them, plus added a small subset of SystemChat2.0 to improve instruction following and sliglthy increased sequence length. Additionally, fixed the training config mistake from 32B 0.0, layernorm layers stay frozen this time. Unfreezing them caused positivity bias to appear in 32B 0.0 for some reason.</p>
+<p>
+  <p>Prompt format is ChatML.</p><br>
+  <h3>Recommended sampler values:</h3>
+  <ul>
+  <li>Temperature: 1</li>
+  <li>Min-P: 0.05</li>
+  <li>Top-A: 0.2</li>
+  <li>Repetition Penalty: 1.03</li>
+  </ul>
+  <h3>Recommended SillyTavern presets (via CalamitousFelicitousness):</h3>
+  - [Context](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Context.json)
+  - [Instruct and System Prompt](https://huggingface.co/EVA-UNIT-01/EVA-Yi-1.5-9B-32K-V1/blob/main/%5BChatML%5D%20Roleplay-v1.9%20Instruct.json)
+</p>
+<p>
+  <br>
+  <h3>
+    Training data:
+  </h3>
+    <ul>
+      <li>Celeste 70B 0.1 data mixture minus Opus Instruct subset. See that model's <a href=https://huggingface.co/nothingiisreal/L3.1-70B-Celeste-V0.1-BF16>card</a> for details.</li>
+      <li>Kalomaze's Opus_Instruct_25k dataset, filtered for refusals.</li>
+      <li>A subset (1k rows) of ChatGPT-4o-WritingPrompts by Gryphe</li>
+      <li>A subset (2k rows) of Sonnet3.5-Charcards-Roleplay by Gryphe</li>
+      <li>Synthstruct and SynthRP datasets by Epiculous</li>
+      <li>A subset from Dolphin-2.9.3, including filtered version of not_samantha and a small subset of systemchat.</li>
+    </ul>
+  <h3>
+     Training time and hardware:
+  </h3>
+      <ul><li>7 hours on 8xH100 SXM, provided by <a href=https://featherless.ai/>FeatherlessAI</a></li></ul><br>
+</p>
+  <p>Model was trained by Kearm and Auri.</p>
+  <h4>Special thanks:</h4><ul>
+  <li><b>to <a href=https://featherless.ai/>FeatherlessAI</a> for generously providing 8xH100 SXM node for training of this model</b></li>
+  <li>to Gryphe, Lemmy, Kalomaze, Nopm, Epiculous and CogninitiveComputations for the data</li>
+  <li>and to Allura-org for support, feedback, beta-testing and doing quality control of EVA models.</li></ul>
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.4.1`
+```yaml
+base_model: Qwen/Qwen2.5-32B
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+liger_rope: true
+liger_rms_norm: true
+liger_swiglu: true
+liger_fused_linear_cross_entropy: true
+# plugins:
+#   - axolotl.integrations.spectrum.SpectrumPlugin
+# spectrum_top_fraction: 0.5
+# # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
+# spectrum_model_name: Qwen/Qwen2.5-32B
+datasets:
+  - path: datasets/deduped_Synthstruct-Gens_processed_sharegpt_converted_cleaned.jsonl
+    type: sharegpt
+  - path: datasets/opus-instruct-22k-no_refusals-filtered.jsonl
+    type: sharegpt
+  - path: datasets/Celeste_Filtered.jsonl
+    type: sharegpt
+  - path: datasets/Sonnet3-5-charcard-names-filtered-sharegpt.jsonl
+    type: sharegpt
+  - path: datasets/deduped_SynthRP-Gens_processed_09-25-2024-ShareGPT_converted_cleaned.jsonl
+    type: sharegpt
+  - path: datasets/Gryphe-4o-WP-filtered-sharegpt.jsonl
+    type: sharegpt
+  - path: datasets/deduped_not_samantha_norefusals.jsonl
+    type: sharegpt
+  - path: datasets/SystemChat_subset_filtered_sharegpt.jsonl
+    type: sharegpt
+chat_template: chatml
+shuffle_merged_datasets: true
+val_set_size: 0.001
+output_dir: ./EVA-Qwen2.5-32B-SFFT-v0.1
+sequence_len: 9216
+sample_packing: true
+eval_sample_packing: false
+pad_to_sequence_len: true
+# adapter: qlora
+# lora_model_dir:
+# lora_r: 64
+# lora_alpha: 128
+# lora_dropout: 0.05
+# lora_target_linear: true
+# peft_use_dora: true
+unfrozen_parameters:
+- ^lm_head.weight$
+- ^model.embed_tokens.weight$
+# mlp.down_proj layers
+- model.layers.63.mlp.down_proj
+- model.layers.49.mlp.down_proj
+- model.layers.48.mlp.down_proj
+- model.layers.45.mlp.down_proj
+- model.layers.44.mlp.down_proj
+- model.layers.47.mlp.down_proj
+- model.layers.46.mlp.down_proj
+- model.layers.43.mlp.down_proj
+- model.layers.8.mlp.down_proj
+- model.layers.11.mlp.down_proj
+- model.layers.19.mlp.down_proj
+- model.layers.35.mlp.down_proj
+- model.layers.20.mlp.down_proj
+- model.layers.52.mlp.down_proj
+- model.layers.39.mlp.down_proj
+- model.layers.62.mlp.down_proj
+- model.layers.50.mlp.down_proj
+- model.layers.29.mlp.down_proj
+- model.layers.16.mlp.down_proj
+- model.layers.28.mlp.down_proj
+- model.layers.53.mlp.down_proj
+- model.layers.30.mlp.down_proj
+- model.layers.31.mlp.down_proj
+- model.layers.32.mlp.down_proj
+- model.layers.7.mlp.down_proj
+- model.layers.36.mlp.down_proj
+- model.layers.12.mlp.down_proj
+- model.layers.18.mlp.down_proj
+- model.layers.37.mlp.down_proj
+- model.layers.38.mlp.down_proj
+- model.layers.14.mlp.down_proj
+- model.layers.13.mlp.down_proj
+# mlp.gate_proj layers
+- model.layers.43.mlp.gate_proj
+- model.layers.61.mlp.gate_proj
+- model.layers.60.mlp.gate_proj
+- model.layers.44.mlp.gate_proj
+- model.layers.62.mlp.gate_proj
+- model.layers.28.mlp.gate_proj
+- model.layers.29.mlp.gate_proj
+- model.layers.45.mlp.gate_proj
+- model.layers.37.mlp.gate_proj
+- model.layers.35.mlp.gate_proj
+- model.layers.59.mlp.gate_proj
+- model.layers.36.mlp.gate_proj
+- model.layers.30.mlp.gate_proj
+- model.layers.48.mlp.gate_proj
+- model.layers.38.mlp.gate_proj
+- model.layers.27.mlp.gate_proj
+- model.layers.31.mlp.gate_proj
+- model.layers.34.mlp.gate_proj
+- model.layers.58.mlp.gate_proj
+- model.layers.33.mlp.gate_proj
+- model.layers.39.mlp.gate_proj
+- model.layers.26.mlp.gate_proj
+- model.layers.32.mlp.gate_proj
+- model.layers.46.mlp.gate_proj
+- model.layers.42.mlp.gate_proj
+- model.layers.49.mlp.gate_proj
+- model.layers.57.mlp.gate_proj
+- model.layers.50.mlp.gate_proj
+- model.layers.47.mlp.gate_proj
+- model.layers.56.mlp.gate_proj
+- model.layers.63.mlp.gate_proj
+- model.layers.55.mlp.gate_proj
+# mlp.up_proj layers
+- model.layers.61.mlp.up_proj
+- model.layers.60.mlp.up_proj
+- model.layers.32.mlp.up_proj
+- model.layers.59.mlp.up_proj
+- model.layers.58.mlp.up_proj
+- model.layers.57.mlp.up_proj
+- model.layers.44.mlp.up_proj
+- model.layers.28.mlp.up_proj
+- model.layers.35.mlp.up_proj
+- model.layers.36.mlp.up_proj
+- model.layers.29.mlp.up_proj
+- model.layers.31.mlp.up_proj
+- model.layers.34.mlp.up_proj
+- model.layers.55.mlp.up_proj
+- model.layers.49.mlp.up_proj
+- model.layers.30.mlp.up_proj
+- model.layers.53.mlp.up_proj
+- model.layers.43.mlp.up_proj
+- model.layers.56.mlp.up_proj
+- model.layers.33.mlp.up_proj
+- model.layers.54.mlp.up_proj
+- model.layers.62.mlp.up_proj
+- model.layers.27.mlp.up_proj
+- model.layers.51.mlp.up_proj
+- model.layers.52.mlp.up_proj
+- model.layers.37.mlp.up_proj
+- model.layers.45.mlp.up_proj
+- model.layers.26.mlp.up_proj
+- model.layers.42.mlp.up_proj
+- model.layers.50.mlp.up_proj
+- model.layers.48.mlp.up_proj
+- model.layers.39.mlp.up_proj
+# self_attn.k_proj layers
+- model.layers.63.self_attn.k_proj
+- model.layers.55.self_attn.k_proj
+- model.layers.60.self_attn.k_proj
+- model.layers.7.self_attn.k_proj
+- model.layers.12.self_attn.k_proj
+- model.layers.13.self_attn.k_proj
+- model.layers.57.self_attn.k_proj
+- model.layers.29.self_attn.k_proj
+- model.layers.14.self_attn.k_proj
+- model.layers.51.self_attn.k_proj
+- model.layers.53.self_attn.k_proj
+- model.layers.54.self_attn.k_proj
+- model.layers.22.self_attn.k_proj
+- model.layers.61.self_attn.k_proj
+- model.layers.18.self_attn.k_proj
+- model.layers.30.self_attn.k_proj
+- model.layers.9.self_attn.k_proj
+- model.layers.24.self_attn.k_proj
+- model.layers.23.self_attn.k_proj
+- model.layers.25.self_attn.k_proj
+- model.layers.10.self_attn.k_proj
+- model.layers.58.self_attn.k_proj
+- model.layers.56.self_attn.k_proj
+- model.layers.15.self_attn.k_proj
+- model.layers.32.self_attn.k_proj
+- model.layers.28.self_attn.k_proj
+- model.layers.8.self_attn.k_proj
+- model.layers.59.self_attn.k_proj
+- model.layers.11.self_attn.k_proj
+- model.layers.48.self_attn.k_proj
+- model.layers.16.self_attn.k_proj
+- model.layers.50.self_attn.k_proj
+# self_attn.o_proj layers
+- model.layers.15.self_attn.o_proj
+- model.layers.23.self_attn.o_proj
+- model.layers.31.self_attn.o_proj
+- model.layers.30.self_attn.o_proj
+- model.layers.18.self_attn.o_proj
+- model.layers.24.self_attn.o_proj
+- model.layers.17.self_attn.o_proj
+- model.layers.28.self_attn.o_proj
+- model.layers.34.self_attn.o_proj
+- model.layers.33.self_attn.o_proj
+- model.layers.25.self_attn.o_proj
+- model.layers.12.self_attn.o_proj
+- model.layers.14.self_attn.o_proj
+- model.layers.29.self_attn.o_proj
+- model.layers.16.self_attn.o_proj
+- model.layers.26.self_attn.o_proj
+- model.layers.22.self_attn.o_proj
+- model.layers.27.self_attn.o_proj
+- model.layers.35.self_attn.o_proj
+- model.layers.20.self_attn.o_proj
+- model.layers.13.self_attn.o_proj
+- model.layers.36.self_attn.o_proj
+- model.layers.19.self_attn.o_proj
+- model.layers.37.self_attn.o_proj
+- model.layers.21.self_attn.o_proj
+- model.layers.11.self_attn.o_proj
+- model.layers.54.self_attn.o_proj
+- model.layers.5.self_attn.o_proj
+- model.layers.38.self_attn.o_proj
+- model.layers.6.self_attn.o_proj
+- model.layers.8.self_attn.o_proj
+- model.layers.9.self_attn.o_proj
+# self_attn.q_proj layers
+- model.layers.1.self_attn.q_proj
+- model.layers.2.self_attn.q_proj
+- model.layers.3.self_attn.q_proj
+- model.layers.45.self_attn.q_proj
+- model.layers.54.self_attn.q_proj
+- model.layers.35.self_attn.q_proj
+- model.layers.48.self_attn.q_proj
+- model.layers.61.self_attn.q_proj
+- model.layers.52.self_attn.q_proj
+- model.layers.50.self_attn.q_proj
+- model.layers.60.self_attn.q_proj
+- model.layers.56.self_attn.q_proj
+- model.layers.58.self_attn.q_proj
+- model.layers.42.self_attn.q_proj
+- model.layers.59.self_attn.q_proj
+- model.layers.44.self_attn.q_proj
+- model.layers.55.self_attn.q_proj
+- model.layers.57.self_attn.q_proj
+- model.layers.41.self_attn.q_proj
+- model.layers.36.self_attn.q_proj
+- model.layers.39.self_attn.q_proj
+- model.layers.4.self_attn.q_proj
+- model.layers.43.self_attn.q_proj
+- model.layers.34.self_attn.q_proj
+- model.layers.46.self_attn.q_proj
+- model.layers.49.self_attn.q_proj
+- model.layers.40.self_attn.q_proj
+- model.layers.25.self_attn.q_proj
+- model.layers.51.self_attn.q_proj
+- model.layers.17.self_attn.q_proj
+- model.layers.37.self_attn.q_proj
+- model.layers.53.self_attn.q_proj
+# self_attn.v_proj layers
+- model.layers.55.self_attn.v_proj
+- model.layers.31.self_attn.v_proj
+- model.layers.47.self_attn.v_proj
+- model.layers.45.self_attn.v_proj
+- model.layers.49.self_attn.v_proj
+- model.layers.48.self_attn.v_proj
+- model.layers.15.self_attn.v_proj
+- model.layers.30.self_attn.v_proj
+- model.layers.7.self_attn.v_proj
+- model.layers.44.self_attn.v_proj
+- model.layers.29.self_attn.v_proj
+- model.layers.51.self_attn.v_proj
+- model.layers.50.self_attn.v_proj
+- model.layers.14.self_attn.v_proj
+- model.layers.54.self_attn.v_proj
+- model.layers.32.self_attn.v_proj
+- model.layers.43.self_attn.v_proj
+- model.layers.10.self_attn.v_proj
+- model.layers.46.self_attn.v_proj
+- model.layers.38.self_attn.v_proj
+- model.layers.57.self_attn.v_proj
+- model.layers.22.self_attn.v_proj
+- model.layers.39.self_attn.v_proj
+- model.layers.6.self_attn.v_proj
+- model.layers.23.self_attn.v_proj
+- model.layers.58.self_attn.v_proj
+- model.layers.53.self_attn.v_proj
+- model.layers.40.self_attn.v_proj
+- model.layers.24.self_attn.v_proj
+- model.layers.9.self_attn.v_proj
+- model.layers.25.self_attn.v_proj
+- model.layers.5.self_attn.v_proj
+wandb_project: EVA-Qwen2.5-32B-SFFT-v0.1
+wandb_entity:
+wandb_watch:
+wandb_name: Unit-01
+wandb_log_model:
+gradient_accumulation_steps: 8
+micro_batch_size: 1
+num_epochs: 3
+optimizer: paged_adamw_8bit
+lr_scheduler: cosine
+learning_rate: 0.00005
+max_grad_norm: 3
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: "unsloth"
+# gradient_checkpointing_kwargs:
+#   use_reentrant: true
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_steps: 20
+evals_per_epoch: 4
+saves_per_epoch: 2
+save_safetensors: true
+hub_model_id:
+hub_strategy:
+debug:
+deepspeed: deepspeed_configs/zero3_bf16.json
+weight_decay: 0.1
+# fsdp:
+#   - full_shard
+#   - auto_wrap
+# fsdp_config:
+#   fsdp_limit_all_gathers: true
+#   fsdp_sync_module_states: false
+#   fsdp_offload_params: true
+#   fsdp_cpu_ram_efficient_loading: true
+#   fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
+#   fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
+#   fsdp_activation_checkpointing: true
+#   fsdp_state_dict_type: SHARDED_STATE_DICT  # Changed from FULL_STATE_DICT
+#   fsdp_sharding_strategy: FULL_SHARD
+#   fsdp_forward_prefetch: false  # Added
+#   fsdp_backward_prefetch: "BACKWARD_PRE"  # Added
+#   fsdp_backward_prefetch_limit: 1  # Added
+#   fsdp_mixed_precision: BF16  # Added
+```
+</details>

eva-qwen2.5-32b-v0.1.Q4_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ae9205cf1d5fbbdfc9abcfe95b91dd69d6006b00f09d8da2bf3ccd235ef64a09
+size 18640229120