cjvt
/

File size: 1,761 Bytes
c5ec8e7
 
b69e824
 
b2915ab
1ac43e8
b69e824
 
 
 
c5ec8e7
b69e824
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3baefa3
b69e824
0d89beb
 
 
b69e824
379d0d1
e1fd744
e8c23c4
5c1c742
e8c23c4
e1fd744
b2915ab
b69e824
eb85ca4
 
 
 
485259e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: cc-by-sa-4.0
datasets:
- cjvt/cc_gigafida
- cjvt/solar3
- cjvt/sloleks
language:
- sl
tags:
- word spelling correction
---

---
language: 
- sl

license: cc-by-sa-4.0
---

# T5-incorrect-word-spelling-corrector

This T5 model is designed to identify and correct words with incorrect spelling in the Slovenian language.

## Model Output Example

Consider the following Slovenian text:

_Model v besedlu popravi napaake v nepravilno črkovanih besedah._

The model might return the following text (note: predictions chosen for demonstration/explanation, not reproducibility!):

_Model v besedilu popravi napake v nepravilno črkovanih besedah._

We observe that in the input sentence, the words `besedlu` and `napaake` are incorrectly spelled, so the model corrects them to `besedilu` and `napake`.

## More details

Testing the model with **generated** test sets provides the following result (combining detection and correction of words with incorrect spelling):

- `Precission`: 0,986
- `Recall`: 0,935
- `F1`: 0,960

Testing the model, in combination with **cjvt/SloBERTa-slo-word-spelling-annotator**, with test sets constructed using the **Šolar Eval** dataset provides the following results (combining detection and correction of words with incorrect spelling):

- `Precission`: 0,823
- `Recall`: 0,796
- `F1`: 0,810

## Acknowledgement

The authors acknowledge the financial support from the Slovenian Research and Innovation Agency - research core funding No. P6-0411: Language Resources and Technologies for Slovene and research project No. J7-3159: Empirical foundations for digitally-supported development of writing skills.

## Authors

Thanks to Martin Božič, Marko Robnik-Šikonja and Špela Arhar Holdt for developing these models.