artificialguybr commited on
Commit
d05e30b
·
verified ·
1 Parent(s): 4b77b76

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: Qwen/Qwen2.5-0.5B
5
+ tags:
6
+ - generated_from_trainer
7
+ - text-generation
8
+ - conversational
9
+ model-index:
10
+ - name: outputs/qwen2.5-0.5b-ft-synthia15-i
11
+ results: []
12
+ datasets:
13
+ - teknium/OpenHermes-2.5
14
+ ---
15
+
16
+ # Qwen2.5-0.5B Synthia Fine-tuned Model
17
+
18
+ This is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on the [Synthia-v1.5-II](https://huggingface.co/datasets/migtissera/Synthia-v1.5-II) dataset, optimized for conversational AI and instruction following.
19
+
20
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
21
+
22
+ ## Model Description
23
+
24
+ This model builds upon the powerful Qwen2.5-0.5B base model, which features:
25
+ - 490M parameters (360M non-embedding parameters)
26
+ - 24 transformer layers
27
+ - 14 attention heads for queries and 2 for key/values (GQA architecture)
28
+ - Support for 32,768 context length
29
+ - Advanced features like RoPE positional embeddings, SwiGLU activations, and RMSNorm
30
+
31
+ The model has been fine-tuned on the Synthia-v1.5-II dataset, which is designed to enhance instruction following and conversational abilities. The training process used careful hyperparameter tuning to maintain the base model's capabilities while optimizing for natural dialogue and instruction following.
32
+
33
+ ## Intended Uses & Limitations
34
+
35
+ This model is intended for:
36
+ - Conversational AI applications
37
+ - Instruction following tasks
38
+ - Text generation with strong coherence
39
+ - Multi-turn dialogue systems
40
+
41
+ Limitations:
42
+ - The model inherits the 32K token context window from the base model
43
+ - As a 0.5B parameter model, it may not match larger models in complex reasoning tasks
44
+ - Performance in non-English languages may be limited
45
+ - Users should be aware of potential biases present in the training data
46
+
47
+ ## Training and Evaluation Data
48
+
49
+ The model was fine-tuned on the Synthia-v1.5-II dataset, which is specifically designed for instruction-following and conversational AI. The training process used:
50
+ - 95% of data for training
51
+ - 5% for validation
52
+ - Instruction format: "[INST] {instruction} [/INST]"
53
+
54
+ ## Training Procedure
55
+
56
+ ### Training Hyperparameters
57
+
58
+ Key hyperparameters:
59
+ - Learning rate: 1e-05
60
+ - Batch size: 40 (5 micro-batch × 8 gradient accumulation steps)
61
+ - Training epochs: 3
62
+ - Optimizer: AdamW (β1=0.9, β2=0.999, ε=1e-8)
63
+ - Learning rate scheduler: Cosine with 100 warmup steps
64
+ - Sequence length: 4096
65
+ - Sample packing: Enabled
66
+ - Mixed precision: BF16
67
+
68
+ ### Training Results
69
+
70
+ The model was trained for 672 steps over 3 epochs, showing consistent improvement throughout the training process.
71
+
72
+ ### Framework Versions
73
+
74
+ - Transformers 4.46.0
75
+ - PyTorch 2.3.1+cu121
76
+ - Datasets 3.0.1
77
+ - Tokenizers 0.20.1
78
+
79
+ ## Citation
80
+
81
+ If you use this model, please cite both the original Qwen2.5 work and this fine-tuned version: