Triangle104 commited on
Commit
07e0b60
Β·
verified Β·
1 Parent(s): e88540a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -100
README.md CHANGED
@@ -28,77 +28,30 @@ cases, including conversational AI, structured data interpretation, and
28
  multilingual applications. It outperforms Ava 1.5 in many aspects making
29
  Athena-1 the superior model.
30
 
31
-
32
-
33
-
34
-
35
-
36
-
37
-
38
  Key Features
39
-
40
-
41
-
42
-
43
-
44
-
45
-
46
-
47
-
48
  πŸš€ Enhanced Capabilities
49
 
50
-
51
-
52
-
53
  Instruction Following: Athena 1 has been fine-tuned
54
  for superior adherence to user prompts, making it ideal for chatbots,
55
  virtual assistants, and guided workflows.
56
  Coding and Mathematics: Specialized fine-tuning enhances coding problem-solving and mathematical reasoning.
57
  Long-Context Understanding: Handles input contexts up to 128K tokens and generates up to 8K tokens.
58
 
59
-
60
-
61
-
62
-
63
-
64
-
65
  🌐 Multilingual Support
66
 
67
-
68
-
69
-
70
  Supports 29+ languages, including:
71
 
72
-
73
  English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
74
  Japanese, Korean, Vietnamese, Thai, Arabic, and more.
75
 
76
-
77
-
78
-
79
-
80
-
81
-
82
  πŸ“Š Structured Data & Outputs
83
 
84
-
85
-
86
-
87
  Structured Data Interpretation: Understands and processes structured formats like tables and JSON.
88
  Structured Output Generation: Generates well-formatted outputs, including JSON, XML, and other structured formats.
89
 
90
-
91
-
92
-
93
-
94
-
95
-
96
-
97
  Model Details
98
 
99
-
100
-
101
-
102
  Base Model: Qwen/Qwen2.5-14B-Instruct
103
  Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
104
  Parameters: 14.7B total (13.1B non-embedding).
@@ -106,17 +59,7 @@ Layers: 48
106
  Attention Heads: 40 for Q, 8 for KV.
107
  Context Length: Up to 131,072 tokens.
108
 
109
-
110
-
111
-
112
-
113
-
114
-
115
-
116
  Applications
117
-
118
-
119
-
120
 
121
  Athena 1 is designed for a wide range of use cases:
122
 
@@ -128,21 +71,10 @@ Large-document summarization and analysis.
128
  Multilingual text generation and translation.
129
  Structured data processing (e.g., tables, JSON).
130
 
131
-
132
-
133
-
134
-
135
-
136
-
137
-
138
  Quickstart
139
-
140
-
141
-
142
 
143
  Below is an example of how to use Athena 1 for text generation:
144
 
145
-
146
  huggingface-cli login
147
 
148
  # Use a pipeline as a high-level helper
@@ -160,56 +92,25 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
160
  tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-14B")
161
  model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-14B")
162
 
163
-
164
-
165
-
166
-
167
-
168
-
169
-
170
  Performance
171
-
172
-
173
-
174
 
175
  Athena 1 has been optimized for efficiency and performance on modern
176
  GPUs. For detailed evaluation metrics (e.g., throughput, accuracy, and
177
  memory requirements), refer to the Qwen2.5 performance benchmarks.
178
 
179
-
180
-
181
-
182
-
183
-
184
-
185
-
186
  Requirements
187
 
188
-
189
-
190
-
191
  To use Athena 1, ensure the following:
192
 
193
-
194
  Python >= 3.8
195
  Transformers >= 4.37.0 (to support Qwen models)
196
  PyTorch >= 2.0
197
  GPU with BF16 support for optimal performance.
198
 
199
-
200
-
201
-
202
-
203
-
204
-
205
  Citation
206
-
207
-
208
-
209
 
210
  If you use Athena 1 in your research or projects, please cite its base model Qwen2.5 as follows:
211
 
212
-
213
  @misc{qwen2.5,
214
  title = {Qwen2.5: A Party of Foundation Models},
215
  url = {https://qwenlm.github.io/blog/qwen2.5/},
 
28
  multilingual applications. It outperforms Ava 1.5 in many aspects making
29
  Athena-1 the superior model.
30
 
 
 
 
 
 
 
 
31
  Key Features
32
+
 
 
 
 
 
 
 
 
33
  πŸš€ Enhanced Capabilities
34
 
 
 
 
35
  Instruction Following: Athena 1 has been fine-tuned
36
  for superior adherence to user prompts, making it ideal for chatbots,
37
  virtual assistants, and guided workflows.
38
  Coding and Mathematics: Specialized fine-tuning enhances coding problem-solving and mathematical reasoning.
39
  Long-Context Understanding: Handles input contexts up to 128K tokens and generates up to 8K tokens.
40
 
 
 
 
 
 
 
41
  🌐 Multilingual Support
42
 
 
 
 
43
  Supports 29+ languages, including:
44
 
 
45
  English, Chinese, French, Spanish, Portuguese, German, Italian, Russian
46
  Japanese, Korean, Vietnamese, Thai, Arabic, and more.
47
 
 
 
 
 
 
 
48
  πŸ“Š Structured Data & Outputs
49
 
 
 
 
50
  Structured Data Interpretation: Understands and processes structured formats like tables and JSON.
51
  Structured Output Generation: Generates well-formatted outputs, including JSON, XML, and other structured formats.
52
 
 
 
 
 
 
 
 
53
  Model Details
54
 
 
 
 
55
  Base Model: Qwen/Qwen2.5-14B-Instruct
56
  Architecture: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
57
  Parameters: 14.7B total (13.1B non-embedding).
 
59
  Attention Heads: 40 for Q, 8 for KV.
60
  Context Length: Up to 131,072 tokens.
61
 
 
 
 
 
 
 
 
62
  Applications
 
 
 
63
 
64
  Athena 1 is designed for a wide range of use cases:
65
 
 
71
  Multilingual text generation and translation.
72
  Structured data processing (e.g., tables, JSON).
73
 
 
 
 
 
 
 
 
74
  Quickstart
 
 
 
75
 
76
  Below is an example of how to use Athena 1 for text generation:
77
 
 
78
  huggingface-cli login
79
 
80
  # Use a pipeline as a high-level helper
 
92
  tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-14B")
93
  model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-14B")
94
 
 
 
 
 
 
 
 
95
  Performance
 
 
 
96
 
97
  Athena 1 has been optimized for efficiency and performance on modern
98
  GPUs. For detailed evaluation metrics (e.g., throughput, accuracy, and
99
  memory requirements), refer to the Qwen2.5 performance benchmarks.
100
 
 
 
 
 
 
 
 
101
  Requirements
102
 
 
 
 
103
  To use Athena 1, ensure the following:
104
 
 
105
  Python >= 3.8
106
  Transformers >= 4.37.0 (to support Qwen models)
107
  PyTorch >= 2.0
108
  GPU with BF16 support for optimal performance.
109
 
 
 
 
 
 
 
110
  Citation
 
 
 
111
 
112
  If you use Athena 1 in your research or projects, please cite its base model Qwen2.5 as follows:
113
 
 
114
  @misc{qwen2.5,
115
  title = {Qwen2.5: A Party of Foundation Models},
116
  url = {https://qwenlm.github.io/blog/qwen2.5/},