arxiv:2406.16377

On the Transformations across Reward Model, Parameter Update, and In-Context Prompt

Published on Jun 24

· Submitted by

jcyk on Jun 26

Upvote

Authors:

Deng Cai ,

Huayang Li ,

Tingchen Fu ,

Siheng Li ,

Weiwen Xu ,

Shuaiyi Li ,

Leyang Cui ,

Lemao Liu ,

Taro Watanabe ,

Shuming Shi

Abstract

View arXiv page View PDF Add to collection

Community

jcyk

Paper author Paper submitter 3 days ago

Despite the general capabilities of pre-trained large language models (LLMs), they still need further adaptation to better serve practical applications. In this paper, we demonstrate the interchangeability of three popular and distinct adaptation tools: parameter updating, reward modeling, and in-context prompting. This interchangeability establishes a triangular framework with six transformation directions, each of which facilitates a variety of applications. The primary contribution of this paper is to offer a holistic view about the triangular framework depicted in the following figure, which encompasses six distinct transformation directions in total. We systematically analyze each transformation by first formally defining its objectives, then investigating the transformation methods, and reviewing pertinent existing works that utilize these transformations for various purposes. Our work spans a substantial breadth of the current frontier in LLM research and establishes insightful connections among diverse prior studies that may initially seem unrelated, which contribute to advancing the understanding of the current landscape in LLM research. In addition to our extensive survey of existing applications, we delineate several promising future research avenues within each transformation direction.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2406.16377 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2406.16377 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2406.16377 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.