|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# BabyMistral Model Card |
|
|
|
## Model Overview |
|
|
|
**BabyMistral** is a compact yet powerful language model designed for efficient text generation tasks. Built on the Mistral architecture, this model offers impressive performance despite its relatively small size. |
|
|
|
### Key Specifications |
|
|
|
- **Parameters:** 1.5 billion |
|
- **Training Data:** 1.5 trillion tokens |
|
- **Architecture:** Based on Mistral |
|
- **Training Duration:** 70 days |
|
- **Hardware:** 4x NVIDIA A100 GPUs |
|
|
|
## Model Details |
|
|
|
### Architecture |
|
|
|
BabyMistral utilizes the Mistral AI architecture, which is known for its efficiency and performance. The model scales this architecture to 1.5 billion parameters, striking a balance between capability and computational efficiency. |
|
|
|
### Training |
|
- **Dataset Size:** 1.5 trillion tokens |
|
- **Training Approach:** Trained from scratch |
|
- **Hardware:** 4x NVIDIA A100 GPUs |
|
- **Duration:** 70 days of continuous training |
|
|
|
### Capabilities |
|
|
|
BabyMistral is designed for a wide range of natural language processing tasks, including: |
|
|
|
- Text completion and generation |
|
- Creative writing assistance |
|
- Dialogue systems |
|
- Question answering |
|
- Language understanding tasks |
|
|
|
## Usage |
|
|
|
### Getting Started |
|
|
|
To use BabyMistral with the Hugging Face Transformers library: |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained("Aarifkhan/BabyMistral") |
|
tokenizer = AutoTokenizer.from_pretrained("Aarifkhan/BabyMistral") |
|
|
|
# Define the chat input |
|
chat = [ |
|
# { "role": "system", "content": "You are BabyMistral" }, |
|
{ "role": "user", "content": "Hey there! How are you? π" } |
|
] |
|
|
|
inputs = tokenizer.apply_chat_template( |
|
chat, |
|
add_generation_prompt=True, |
|
return_tensors="pt" |
|
).to(model.device) |
|
|
|
|
|
# Generate text |
|
outputs = model.generate( |
|
inputs, |
|
max_new_tokens=256, |
|
do_sample=True, |
|
temperature=0.6, |
|
top_p=0.9, |
|
eos_token_id=tokenizer.eos_token_id, |
|
|
|
|
|
) |
|
|
|
response = outputs[0][inputs.shape[-1]:] |
|
print(tokenizer.decode(response, skip_special_tokens=True)) |
|
|
|
#I am doing well! How can I assist you today? π |
|
|
|
``` |
|
|
|
### Ethical Considerations |
|
|
|
While BabyMistral is a powerful tool, users should be aware of its limitations and potential biases: |
|
|
|
- The model may reproduce biases present in its training data |
|
- It should not be used as a sole source of factual information |
|
- Generated content should be reviewed for accuracy and appropriateness |
|
|
|
|
|
### Limitations |
|
|
|
- May struggle with very specialized or technical domains |
|
- Lacks real-time knowledge beyond its training data |
|
- Potential for generating plausible-sounding but incorrect information |
|
|
|
|