arxiv:2406.06263

MaskLID: Code-Switching Language Identification through Iterative Masking

Published on Jun 10

· Submitted by

kargaranamir on Jun 17

Upvote

Authors:

Amir Hossein Kargaran ,

Abstract

We present MaskLID, a simple, yet effective, code-switching (CS) language identification (LID) method. MaskLID does not require any training and is designed to complement current high-performance sentence-level LIDs. Sentence-level LIDs are classifiers trained on monolingual texts to provide single labels, typically using a softmax layer to turn scores into probabilities. However, in cases where a sentence is composed in both L1 and L2 languages, the LID classifier often only returns the dominant label L1. To address this limitation, MaskLID employs a strategy to mask text features associated with L1, allowing the LID to classify the text as L2 in the next round. This method uses the LID itself to identify the features that require masking and does not rely on any external resource. In this work, we explore the use of MaskLID for two open-source LIDs (GlotLID and OpenLID), that are both based on the FastText architecture. Code and demo are available at https://github.com/cisnlp/MaskLID.

View arXiv page View PDF Add to collection

Community

kargaranamir

Paper author Paper submitter Jun 17

MaskLID is a simple code-switching language identification method that uses current high-performance sentence-level language identification models. It requires no training and can be adapted to any text setup with any mixture of languages.

Code: https://github.com/cisnlp/MaskLID
Space: https://huggingface.co/spaces/cis-lmu/MaskLID

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.06263 in a model README.md to link it from this page.

MaskLID: Code-Switching Language Identification through Iterative Masking

Abstract

Community

Models citing this paper 0

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 4