license: other
license_name: gpt-sw3
license_link: LICENSE
base_model: AI-Sweden-Models/gpt-sw3-6.7b
ICELANDIC GPT-SW3 FOR SPELL AND GRAMMAR CHECKING
This is a model for correcting spelling and grammar errors in Icelandic text. It is a GPT-SW3 model (https://huggingface.co/AI-Sweden-Models/gpt-sw3-6.7b) finetuned on Icelandic and particularly on the spell and grammar checking task.
Provided here is the model along with a script for running it through a Hugging Face endpoint. An authorized Hugging Face API key is required to do so. Once you have retrieved an API key and it has been authorized, add it to you environment as "HF_API_KEY".
To run the model you will need a python3 environment. Install the required dependencies by running
pip install -r requirements.txt
The current version of transformers includes a bug in the GPTSw3Tokenizer class which causes it to use the wrong BOS and PAD tokens if the tokenizer is loaded through AI-Sweden-Models/gpt-sw3-6.7b
. Load the tokenizer through mideind/icelandic-gpt-sw3-6.7b-gec
instead to avoid this bug.
The model is fine-tuned on the following three tasks. Output examples for each task are shown in ./example_outputs.
- Task 1: The model evaluates one text with regards to e.g. grammar and spelling, and returns all errors in the input text as a list, with their position in the text and their corrections.
- Task 2: The model evaluates two texts and chooses which one is better with regards to e.g. grammar and spelling.
- Task 3: The model evaluates one text with regards to e.g. grammar and spelling, and returns a corrected version of the text.
Run the model with
python run_model.py
Input text(s) and the task type need to be specified in the script.