Triangle104 commited on
Commit
4fa965b
·
verified ·
1 Parent(s): 8685669

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +202 -0
README.md CHANGED
@@ -17,6 +17,208 @@ language:
17
  This model was converted to GGUF format from [`Spestly/Athena-1-14B`](https://huggingface.co/Spestly/Athena-1-14B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
18
  Refer to the [original model card](https://huggingface.co/Spestly/Athena-1-14B) for more details on the model.
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ## Use with llama.cpp
21
  Install llama.cpp through brew (works on Mac and Linux)
22
 
 
17
  This model was converted to GGUF format from [`Spestly/Athena-1-14B`](https://huggingface.co/Spestly/Athena-1-14B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
18
  Refer to the [original model card](https://huggingface.co/Spestly/Athena-1-14B) for more details on the model.
19
 
20
+ ---
21
+ Model details:
22
+ -
23
+ Athena 1 is a state-of-the-art language model fine-tuned from Qwen/Qwen2.5-14B-Instruct.
24
+ Designed to excel in instruction-following tasks, Athena 1 delivers
25
+ advanced capabilities in text generation, coding, mathematics, and
26
+ long-context understanding. It is optimized for a wide variety of use
27
+ cases, including conversational AI, structured data interpretation, and
28
+ multilingual applications. It outperforms Ava 1.5 in many aspects making
29
+ Athena-1 the superior model.
30
+
31
+
32
+
33
+
34
+
35
+
36
+
37
+
38
+ Key Features
39
+
40
+
41
+
42
+
43
+
44
+
45
+
46
+
47
+
48
+ 🚀 Enhanced Capabilities
49
+
50
+
51
+
52
+
53
+ Instruction Following: Athena 1 has been fine-tuned
54
+ for superior adherence to user prompts, making it ideal for chatbots,
55
+ virtual assistants, and guided workflows.
56
+ Coding and Mathematics: Specialized fine-tuning enhances coding problem-solving and mathematical reasoning.
57
+ Long-Context Understanding: Handles input contexts up to 128K tokens and generates up to 8K tokens.
58
+
59
+
60
+
61
+
62
+
63
+
64
+
65
+ 🌐 Multilingual Support
66
+
67
+
68
+
69
+
70
+ Supports 29+ languages, including:
71
+
72
+
73
+ English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
74
+ Japanese, Korean, Vietnamese, Thai, Arabic, and more.
75
+
76
+
77
+
78
+
79
+
80
+
81
+
82
+ 📊 Structured Data & Outputs
83
+
84
+
85
+
86
+
87
+ Structured Data Interpretation: Understands and processes structured formats like tables and JSON.
88
+ Structured Output Generation: Generates well-formatted outputs, including JSON, XML, and other structured formats.
89
+
90
+
91
+
92
+
93
+
94
+
95
+
96
+
97
+ Model Details
98
+
99
+
100
+
101
+
102
+ Base Model: Qwen/Qwen2.5-14B-Instruct
103
+ Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
104
+ Parameters: 14.7B total (13.1B non-embedding).
105
+ Layers: 48
106
+ Attention Heads: 40 for Q, 8 for KV.
107
+ Context Length: Up to 131,072 tokens.
108
+
109
+
110
+
111
+
112
+
113
+
114
+
115
+
116
+ Applications
117
+
118
+
119
+
120
+
121
+ Athena 1 is designed for a wide range of use cases:
122
+
123
+
124
+ Conversational AI and chatbots.
125
+ Code generation, debugging, and explanation.
126
+ Mathematical problem-solving.
127
+ Large-document summarization and analysis.
128
+ Multilingual text generation and translation.
129
+ Structured data processing (e.g., tables, JSON).
130
+
131
+
132
+
133
+
134
+
135
+
136
+
137
+
138
+ Quickstart
139
+
140
+
141
+
142
+
143
+ Below is an example of how to use Athena 1 for text generation:
144
+
145
+
146
+ huggingface-cli login
147
+
148
+ # Use a pipeline as a high-level helper
149
+ from transformers import pipeline
150
+
151
+ messages = [
152
+ {"role": "user", "content": "Who are you?"},
153
+ ]
154
+ pipe = pipeline("text-generation", model="Spestly/Athena-1-14B")
155
+ pipe(messages)
156
+
157
+ # Load model directly
158
+ from transformers import AutoTokenizer, AutoModelForCausalLM
159
+
160
+ tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-14B")
161
+ model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-14B")
162
+
163
+
164
+
165
+
166
+
167
+
168
+
169
+
170
+ Performance
171
+
172
+
173
+
174
+
175
+ Athena 1 has been optimized for efficiency and performance on modern
176
+ GPUs. For detailed evaluation metrics (e.g., throughput, accuracy, and
177
+ memory requirements), refer to the Qwen2.5 performance benchmarks.
178
+
179
+
180
+
181
+
182
+
183
+
184
+
185
+
186
+ Requirements
187
+
188
+
189
+
190
+
191
+ To use Athena 1, ensure the following:
192
+
193
+
194
+ Python >= 3.8
195
+ Transformers >= 4.37.0 (to support Qwen models)
196
+ PyTorch >= 2.0
197
+ GPU with BF16 support for optimal performance.
198
+
199
+
200
+
201
+
202
+
203
+
204
+
205
+ Citation
206
+
207
+
208
+
209
+
210
+ If you use Athena 1 in your research or projects, please cite its base model Qwen2.5 as follows:
211
+
212
+
213
+ @misc{qwen2.5,
214
+ title = {Qwen2.5: A Party of Foundation Models},
215
+ url = {https://qwenlm.github.io/blog/qwen2.5/},
216
+ author = {Qwen Team},
217
+ month = {September},
218
+ year = {2024}
219
+ }
220
+
221
+ ---
222
  ## Use with llama.cpp
223
  Install llama.cpp through brew (works on Mac and Linux)
224