|
--- |
|
license: mit |
|
datasets: |
|
- wikimedia/wikipedia |
|
language: |
|
- es |
|
base_model: |
|
- openai-community/gpt2 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
|
|
|
|
# ST3: Simple Transformer 3 |
|
|
|
## Model description |
|
ST3 (Simple Transformer 3) is a lightweight transformer-based model derived from OpenAI's GPT-2 architecture. It was specifically designed to enable quick fine-tuning and experimentation, making it a great choice for researchers and developers seeking an efficient model for downstream tasks. |
|
|
|
### Key features: |
|
- **Architecture:** GPT-2-based model with 3 attention heads and 3 layers. |
|
- **Embedding size:** 288 parameters. |
|
- **Context size:** 2048 tokens, allowing for extended input/output sequences. |
|
- **Pretrained on:** Wikimedia/Wikipedia subset "20231101.es" (Spanish text corpus). |
|
- **Parameters:** 4 million FP32 parameters. |
|
- **Batch size:** 32. |
|
- **Training environment:** 1 epoch on a Kaggle P100 GPU. |
|
- **Tokenizer:** Custom WordPiece tokenizer "ST3" that generates tokens with "##" as a prefix for subword units. |
|
|
|
## Intended use |
|
ST3 is not a highly powerful or fully functional model compared to larger transformer models but can be used for: |
|
- Quick fine-tuning on small datasets. |
|
- Research purposes to test new ideas. |
|
- Educational and experimentation purposes. |
|
|
|
This model has not been fine-tuned or evaluated with performance metrics as it’s not designed for state-of-the-art tasks. |
|
|
|
### Usage |
|
To use the ST3 model, you can follow this example: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("BueormLLC/ST3") |
|
model = AutoModelForCausalLM.from_pretrained("BueormLLC/ST3") |
|
|
|
def clean_wordpiece_tokens(text): |
|
return text.replace(" ##", "").replace("##", "") |
|
|
|
input_text = "Esto es un ejemplo" |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
|
|
outputs = model.generate(inputs.input_ids, max_length=2048, num_return_sequences=1) |
|
|
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
cleaned_text = clean_wordpiece_tokens(generated_text) |
|
|
|
print(cleaned_text) |
|
``` |
|
|
|
### Explanation |
|
The ST3 tokenizer uses the WordPiece algorithm, which generates tokens prefixed with "##" to indicate subword units. The provided `clean_wordpiece_tokens` function removes these prefixes, allowing for cleaner output text. |
|
|
|
## Limitations |
|
- **Performance:** ST3 lacks the power of larger models and may not perform well on complex language tasks. |
|
- **No evaluation:** The model hasn’t been benchmarked with metrics. |
|
- **Not suitable for production use** without further fine-tuning. |
|
|
|
## Training details |
|
- **Dataset:** Wikimedia/Wikipedia subset "20231101.es". |
|
- **Number of layers:** 3. |
|
- **Number of attention heads:** 3. |
|
- **Embedding size:** 288. |
|
- **Parameters:** 4 million. |
|
- **Training:** The model was trained for one epoch with a batch size of 32 on a P100 GPU provided by Kaggle. |
|
|
|
## Developer and publisher |
|
- **Developed by:** BueormAI. |
|
- **Published by:** BueormLLC. |
|
|
|
## Acknowledgments |
|
Thank you for using ST3! Your feedback and support are appreciated as we continue to develop and improve our models. |
|
|
|
If you find this model useful and would like to support further development, please consider making a donation to: |
|
|
|
- [Patreon](https://patreon.com/bueom) |
|
- [PayPal](https://paypal.me/bueorm) |
|
|
|
--- |
|
|
|
*Contributions to this project are always welcome!* |
|
|