Papers
arxiv:2402.10712

An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Generative LLM Inference

Published on Feb 16
Authors:
,
,

Abstract

The development of state-of-the-art generative large language models (LLMs) disproportionately relies on English-centric tokenizers, vocabulary and pre-training data. Despite the fact that some LLMs have multilingual capabilities, recent studies have shown that their inference efficiency deteriorates when generating text in languages other than English. This results in increased inference time and costs. Cross-lingual vocabulary adaptation methods have been proposed for adapting models to a target language aiming to improve downstream performance. However, the effectiveness of these methods on increasing inference efficiency of generative LLMs has yet to be explored. In this paper, we perform an empirical study of various cross-lingual vocabulary adaptation methods on five generative LLMs (including monolingual and multilingual models) across four typologically-diverse languages and four natural language understanding tasks. We find that cross-lingual vocabulary adaptation substantially contributes to LLM inference speedups of up to 271.5%. We also show that adapting LLMs that have been pre-trained on more balanced multilingual data results in downstream performance comparable to the original models.

Community

Sign up or log in to comment

Models citing this paper 112

Browse 112 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.10712 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.10712 in a Space README.md to link it from this page.

Collections including this paper 1