|
--- |
|
base_model: Qwen/Qwen2.5-0.5B-Instruct |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- qwen2 |
|
- trl |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
|
|
![Header](https://raw.githubusercontent.com/Aayan-Mishra/Images/refs/heads/main/Athena.png) |
|
|
|
# Athena-1 0.5B: |
|
|
|
Athena-1 0.5B is a fine-tuned, instruction-following large language model derived from [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct). Designed for ultra-lightweight applications, Athena-1 0.5B balances compactness with robust performance, making it suitable for tasks with limited computational resources. |
|
|
|
--- |
|
|
|
## Key Features |
|
|
|
### โก Ultra-Lightweight and Efficient |
|
|
|
* **Compact Size:** With just **500 million parameters**, Athena-1 0.5B is ideal for edge devices and low-resource environments. |
|
* **Instruction Following:** Fine-tuned for reliable adherence to user instructions. |
|
* **Coding and Mathematics:** Capable of handling basic coding and mathematical tasks. |
|
|
|
### ๐ Contextual Understanding |
|
|
|
* **Context Length:** Supports up to **16,384 tokens**, enabling processing of moderately sized conversations or documents. |
|
* **Token Generation:** Can generate up to **4K tokens** of coherent output. |
|
|
|
### ๐ Multilingual Support |
|
|
|
* Supports **20+ languages**, including: |
|
* English, Chinese, French, Spanish, German, Italian, Russian |
|
* Japanese, Korean, Vietnamese, Thai, and more. |
|
|
|
### ๐ Structured Data & Outputs |
|
|
|
* **Structured Data Interpretation:** Handles formats like tables and JSON effectively. |
|
* **Structured Output Generation:** Produces well-formatted outputs for data-specific tasks. |
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
* **Base Model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) |
|
* **Architecture:** Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings. |
|
* **Parameters:** 500M total. |
|
* **Layers:** (Adjust if different from the base model) |
|
* **Attention Heads:** (Adjust if different from the base model) |
|
* **Context Length:** Up to **16,384 tokens**. |
|
|
|
--- |
|
|
|
## Applications |
|
|
|
Athena-1 0.5B is optimized for: |
|
|
|
* **Conversational AI:** Power lightweight and responsive chatbots. |
|
* **Code Assistance:** Basic code generation, debugging, and explanations. |
|
* **Mathematical Assistance:** Solves fundamental math problems. |
|
* **Document Processing:** Summarizes and analyzes smaller documents effectively. |
|
* **Multilingual Tasks:** Supports global use cases with a compact model. |
|
* **Structured Data:** Reads and generates structured formats like JSON and tables. |
|
|
|
--- |
|
|
|
## Quickstart |
|
|
|
Hereโs how you can use Athena-1 0.5B for quick text generation: |
|
|
|
```python |
|
# Use a pipeline as a high-level helper |
|
from transformers import pipeline |
|
|
|
messages = [ |
|
{"role": "user", "content": "What can you do?"}, |
|
] |
|
pipe = pipeline("text-generation", model="Spestly/Athena-1-0.5B") # Update model name |
|
print(pipe(messages)) |
|
|
|
# Load model directly |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Spestly/Athena-1-0.5B") # Update model name |
|
model = AutoModelForCausalLM.from_pretrained("Spestly/Athena-1-0.5B") # Update model name |
|
``` |