The model Glot500-m-iuseg is a fine-tuned version of the Glot500-m model. It was fine-tuned to segment Inuktitut words by morpheme boundaries and is intended to be used as a pre-processing tool for the language.

The model found in this repository is our best performing fine-tuned model described in the paper: "Surface-Level Morphological Segmentation of Low-resource Inuktitut Using Pre-trained Large Language Models" (link will be added when published)

Datasets used: The Nunavut Hansard Inuktitut–English Parallel Corpus 3.0 with Preliminary Machine Translation Results: https://aclanthology.org/2020.lrec-1.312/

Method used: LLMSegm: Surface-level Morphological Segmentation Using Large Language Model: https://aclanthology.org/2024.lrec-main.933/

Downloads last month
25
Safetensors
Model size
394M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for matsten/Glot500-m-iuseg

Finetuned
(19)
this model