Papers
arxiv:2310.04799

Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities

Published on Oct 7, 2023
Authors:
,
,
,
,
,
,
,

Abstract

With the advancements in conversational AI, such as ChatGPT, this paper focuses on exploring developing Large Language Models (LLMs) for non-English languages, especially emphasizing alignment with human preferences. We introduce a computationally efficient method, leveraging chat <PRE_TAG>vector</POST_TAG>, to synergize pre-existing knowledge and behaviors in LLMs, restructuring the conventional training paradigm from continual pre-train -> SFT -> RLHF to continual pre-train + chat <PRE_TAG>vector</POST_TAG>. Our empirical studies, primarily focused on Traditional Chinese, employ LLaMA2 as the base model and acquire the chat vector by subtracting the pre-trained weights, LLaMA2, from the weights of LLaMA2-chat. Evaluating from three distinct facets, which are toxicity, ability of instruction following, and multi-turn dialogue demonstrates the chat vector's superior efficacy in chatting. To confirm the adaptability of our approach, we extend our experiments to include models pre-trained in both Korean and Simplified Chinese, illustrating the versatility of our methodology. Overall, we present a significant solution in aligning LLMs with human preferences efficiently across various languages, accomplished by the chat vector.

Community

Sign up or log in to comment

Models citing this paper 71

Browse 71 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2310.04799 in a dataset README.md to link it from this page.

Spaces citing this paper 31

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.