Abstract
Reinforcement Learning (RL) mathematically formulates decision-making with Markov Decision Process (MDP). With MDPs, researchers have achieved remarkable breakthroughs across various domains, including games, robotics, and language models. This paper seeks a new possibility, Natural Language Reinforcement Learning (NLRL), by extending traditional MDP to natural language-based representation space. Specifically, NLRL innovatively redefines RL principles, including task objectives, policy, value function, Bellman equation, and policy iteration, into their language counterparts. With recent advancements in large language models (LLMs), NLRL can be practically implemented to achieve RL-like policy and value improvement by either pure prompting or gradient-based training. Experiments over Maze, Breakthrough, and Tic-Tac-Toe games demonstrate the effectiveness, efficiency, and interpretability of the NLRL framework among diverse use cases. Our code will be released at https://github.com/waterhorse1/Natural-language-RL.
Community
NLRL expands the scope of general sequential decision-making by moving beyond scalar rewards to leverage rich multimodal signals, particularly natural language. This approach enables agents to generalize across tasks and domains while generating high-quality interaction data. Though exemplified with language tasks, NLRL is a versatile framework that can scale to general decision-making scenarios in various modalities, improving interpretability and efficiency in solving complex sequential tasks.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning (2024)
- On the Modeling Capabilities of Large Language Models for Sequential Decision Making (2024)
- Teaching Embodied Reinforcement Learning Agents: Informativeness and Diversity of Language Use (2024)
- From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge (2024)
- MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions (2024)
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization (2024)
- Words as Beacons: Guiding RL Agents with High-Level Language Prompts (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 3
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper