Deepseek R1 Robotic Reasoning with Checkers
In this post, we explore the abilty of DeepSeek R1, as well as other LLMs, to control a robotic arm to play checkers. We find DeepSeek R1 performs better than other comporable open-source LLMs, but falls behind human and algorithmic players, underscoring the need for further advancements in integrating LLMs with robotics.
Integration with LLMs
To allow LLMs to make the next move, we need a way to encode the checkers game into text, and then retrieve a valid move from text. Therefore, we design a prompt that includes the rules, the state of the board, and a list of valid moves:
prompt = f"""You are playing as black (● and ◎) in a game of checkers.
You need to choose the best move from the list of valid moves provided.
Rules to consider:
1. Regular pieces (●) can only move diagonally forward (upward)
2. King pieces (◎) can move diagonally in any direction
3. Getting pieces to the opposite end to make kings is advantageous
4. Look ahead to ensure your piece will not get captured in the next turn
Current board state:
0 1 2 3 4 5 6 7
0 - ○ - ○ - ○ - ○
1 ○ - ○ - ○ - ○ -
2 - ○ - ○ - ○ - ○
3 - - - - - - - -
4 - - - - - - - -
5 ● - ● - ● - ● -
6 - ● - ● - ● - ●
7 ● - ● - ● - ● -
Valid moves:
1. MOVE: (2, 1) → (3, 0)
2. MOVE: (2, 1) → (3, 2)
3. MOVE: (2, 3) → (3, 2)
4. MOVE: (2, 3) → (3, 4)
5. MOVE: (2, 5) → (3, 4)
6. MOVE: (2, 5) → (3, 6)
7. MOVE: (2, 7) → (3, 6)
Briefly analyze the board position and select the best move from the list
above. End your response with your chosen move on a new line starting
with "MOVE:
Example response:
MOVE: 3
Integration with Robot Arm
We use Deepseek R1 to control a ViperX 300 S robotic arm. We extract the selected move from Deepseek R1 and use this to execute a pick and place.
We also support jump moves, which will remove any captured pieces.
Results
We compare Deepseek R1 against other LLMs, the well-established Minmax algorithm, and human players.
Deepseek R1 vs. other LLMs/Algorithms
To evaluate the performance of different players, we run a round-robin tournament with 4 players. We use Deepseek R1 (deepseek-r1-distill-qwen-32b
), Llama 3 (llama-3.3-70b-instruct
), and Qwen 2.5 (qwen2.5-32b-instruct
) as 3 LLM players, and in addition to the Minmax algorithm. We match every player against every other player in round-robin format with repeated matches, resulting in a total of 120. We report the win rate of each player as:
Player | Player Type | Win Rate |
---|---|---|
Qwen 2.5 | LLM | 26.6% |
Llama 3 | LLM | 30.0% |
Deepseek R1 | LLM | 43.3% |
Mimax | Algorithm | 100.0% |
Deepseek R1 vs. Humans
We also play 3 games between Deepseek R1 and a human, showing the winner of each game:
Player | Game 1 | Game 2 | Game 3 |
---|---|---|---|
Human | ✓ | ✓ | ✓ |
Deepseek R1 |
We observe that the human and algorithmic players consistently beat Deepseek R1 and other LLMs, which we attribute to the fact that LLMs are not trained to play checkers. LLMs are trained for next-token prediction or, in the case of Deepseek R1, for solving mathematical and software engineering problems. While checkers-related text is likely included in training datasets, we believe that full checkers games are scarce, leading to poor performance in actual gameplay. We hypothesize that training LLMs on checkers using supervised fine-tuning or reinforcement learning could significantly enhance their performance.
Conclusion
This post examined how Deepseek R1 and other LLMs can be integrated with a robotic arm to play checkers. While DeepSeek R1 outperforms comparable open-source models, it still lags behind human and algorithmic players. We believe that training LLMs specifically on the game of checkers could greatly enhance their performance and suggest this as a promising direction for future research.