Update README.md
Browse files
README.md
CHANGED
@@ -4,5 +4,14 @@ language:
|
|
4 |
base_model:
|
5 |
- cis-lmu/glot500-base
|
6 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
-
Inuktitut morphological segmenter model reported in the paper: ADD WHEN PUBLISHED
|
|
|
4 |
base_model:
|
5 |
- cis-lmu/glot500-base
|
6 |
---
|
7 |
+
The model Glot500-m-iuseg is a fine-tuned version of the Glot500-m model. It was fine-tuned to segment Inuktitut words by morpheme boundaries and is intended to be used as a pre-processing tool for the language.
|
8 |
+
|
9 |
+
|
10 |
+
The model found in this repository is our best performing fine-tuned model described in the paper: "Surface-Level Morphological Segmentation of Low-resource Inuktitut Using Pre-trained Large Language Models" (link will be added when published)
|
11 |
+
|
12 |
+
**Datasets used:**
|
13 |
+
The Nunavut Hansard Inuktitut–English Parallel Corpus 3.0 with Preliminary Machine Translation Results: https://aclanthology.org/2020.lrec-1.312/
|
14 |
+
|
15 |
+
**Method used:**
|
16 |
+
LLMSegm: Surface-level Morphological Segmentation Using Large Language Model: https://aclanthology.org/2024.lrec-main.933/
|
17 |
|
|