|
--- |
|
license: bigscience-openrail-m |
|
datasets: |
|
- mc4 |
|
language: |
|
- sv |
|
library_name: transformers |
|
--- |
|
|
|
# SweCTRL-Mini |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
SweCTRL-Mini is a large Swedish language model that can be used for inference and fine-tuning on a single consumer-grade GPU. The model is based on the CTRL architecture by Keskar, McCann, Varshney, Xiong, and Socher |
|
(2019), which means that users of the SweCTRL-Mini model can control the genre of the generated text by inserting special tokens in the generation prompts. Crucially, note that this model is: |
|
|
|
- **NOT** trained on following GPT-like instructions |
|
- **NOT** trained for conversations, like ChatGPT |
|
- **NOT** trained on any multi-modal data during training. Only one modality -- text, more than 99% of it in Swedish. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** Dmytro Kalpakchi (with supervision from Johan Boye) |
|
- **Shared by:** Dmytro Kalpakchi |
|
- **Model type:** Transformer-based language model trained by predicting the next token |
|
- **Language(s) (NLP):** Swedish |
|
- **License:** BigScience Open RAIL-M |
|
- **Finetuned from model:** None, trained from scratch |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Website:** https://swectrl.dev/ |
|
- **Repository:** https://github.com/dkalpakchi/SweCTRL-Mini |
|
- **Paper:** https://arxiv.org/pdf/2304.13994.pdf |
|
- **Technical note:** https://zenodo.org/record/7868205 |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
The model should be used for generating texts of various genres in Swedish. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
Please refer to Appendix A of the License file for information of use restrictions. The model has a limited context window of 256 tokens, so it will most probably not work well |
|
for text summarization. Additionally, vast majority of its training data was in Swedish, although it contains tokens in other languages as well, so tasks like |
|
Machine Translation would require further fine-tuning. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
To mitigate the inclusion of personally-identifiable data we attempted to remove sources that could contain such data to the best of our ability (see Technical note for |
|
more details on the data filtering process). However, we have still noted that the model can generate text that includes various forms of biases, which is why we strongly |
|
recommend human curation of the generated texts. Currently we have conducted no systematic investigation on either the kinds of biases are included in the generated texts or how |
|
frequently they occur. The contribution of the community on this matter would be very welcome. |
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
For further recommendations on the use of the model, please see the associated paper. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
TODO |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
The training data includes the *subset* of cleaned Swedish mC4, as well as some documents from Project Runeberg. |
|
The extensive information on the training data is provided in the Section 1 of the Technical note. |
|
The interface to partially mine training data is available at: https://swectrl.dev/data |
|
|
|
### Training Procedure |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
#### Preprocessing [optional] |
|
|
|
See Section 1 of the Technical note. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** fp32 |
|
|
|
|
|
## Evaluation |
|
|
|
See Sections 5.3, 6, and 7 in the associated paper, and Section 3 of the Technical note. |
|
|
|
## Environmental Impact |
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
|
|
|
- **Hardware Type:** 8 A100 GPUs |
|
- **Hours used:** 11907.6 GPU-hours for training and experimentation |
|
- **Provider:** BerzeLiUs supercomputer |
|
- **Carbon Emitted:** No public data on carbon efficiency, so hard to estimate |
|
|
|
## Technical Specifications |
|
See Section 3 of the associated paper |
|
|
|
## Citation [optional] |
|
|
|
**BibTeX:** |
|
```bibtex |
|
@article{kalpakchi2023swectrl, |
|
title={SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in Swedish}, |
|
author={Kalpakchi, Dmytro and Boye, Johan}, |
|
journal={arXiv preprint arXiv:2304.13994}, |
|
year={2023} |
|
} |
|
``` |
|
|
|
**APA:** |
|
|
|
Kalpakchi, D., & Boye, J. (2023). SweCTRL-Mini: a data-transparent Transformer-based large language model for controllable text generation in Swedish. arXiv preprint arXiv:2304.13994. |
|
|
|
## Model Card Authors |
|
|
|
Dmytro Kalpakchi (dmytroka@kth.se) |
|
|
|
## Model Card Contact |
|
|
|
Dmytro Kalpakchi (dmytroka@kth.se) |
|
|
|
# References |
|
Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C., & Socher, R. (2019). Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858. |