license: apache-2.0
datasets:
- yuyouyu/BeyondDialogue
language:
- zh
- en
metrics:
- character
base_model: mistralai/Mistral-Nemo-Instruct-2407
pipeline_tag: question-answering
tags:
- text-generation-inference
- role-playing
Mistral-Nemo-BD-RP
Introduction 🎉
Mistral-Nemo-BD-RP is a large language model (LLM) fine-tuned on the BeyondDialogue dataset. The model is designed to generate responses in a role-playing setting. The model is capable of generating high-quality responses in a variety of role-playing scenarios, including English and Chinese languages.
For more details, please refer to our paper, GitHub.
Training details 🚀
We fully finetuning Mistral-Nemo-Instruct-2407 for 3 epochs with 833 steps with the 128 global batch size. We set the training sequence length to 4,096. The learning rate is 3e-5. The training data is from the BeyondDialogue dataset.
Requirements 📝
The code of Mistral has been in the latest Hugging face transformers and we advise you to install transformers>=4.37.0
to use the model.
pip install transformers>=4.42.0
Quickstart 💥
Here provides a code snippet with apply_chat_template
to show you how to load the tokenizer and model and how to generate contents.
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
chatbot = pipeline("text-generation", model="yuyouyu/Mistral-Nemo-BD-RP", device_map="auto")
system_prompt_temp = """I want you to answer questions as if you are {role_name}, assuming you live in the world of {world} and mimicking {role_name}'s personality and speaking style. Use the tone, manner, and vocabulary that {role_name} would use. Please do not reveal that you are an AI or language model; you must always remember you are {role_name}.
{role_name}'s character traits are {character}.
{role_name}'s MBTI personality type is {MBTI}.
{role_name}'s speaking style is {stryle}.
Current scene:
{scene}
role's emotion (0-10, the higher the value, the more pronounced the emotion):
{emotion}
Now, please act as {role_name} and reply with a brief sentence to {chat_role}. Your intimacy level with them is {relationship} (0-10, the higher the value, the closer the relationship). Accurately display the MBTI personality, character traits, speaking style, and emotion you have been assigned."""
role_name = "Hamlet"
world = "8th Century Danish Royalty"
character = "extreme, strong, decisive"
MBTI = "Extraverted (E), Intuitive (N), Feeling (F), Judging (J)"
style = "indecisive, decisive, sentimental"
scene = "Inside the grand hall of Elsinore, lit by flickering torchlight, Hamlet paces anxiously as Elena conjures an ethereal mirage of the Danish landscape. Regal tapestries and opulent furnishings surround them, yet Hamlet's gaze is fixed on Elena's illusions. She gracefully weaves dissonance into the tapestry of reality, prompting Hamlet to clutch his chest in a moment of existential crisis. The weight of unspoken love and inner turmoil hangs in the air, thick with tension and anticipation."
emotion = "happiness: 1, sadness: 8, disgust: 5, fear: 7, surprise: 6, anger: 4"
chat_role = "Elena"
relationship = "7"
system_prompt = system_prompt_temp.format(
role_name=role_name,
world=world,
character=character,
MBTI=MBTI,
style=style,
scene=scene,
emotion=emotion,
chat_role=chat_role,
relationship=relationship
)
prompt = "Oh, dear Hamlet, dost thou see in these conjured whispers the paths unseen? Speak, for shadows may guide us to the truth bound within thy tormented soul."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
]
response = chatbot(messages, max_new_tokens=256, pad_token_id=chatbot.tokenizer.eos_token_id, do_sample=True, temperature=0.7)[0]['generated_text'][-1]['content']
Note: The examples for Mistral-Nemo-BD-RP use English role-playing. For English examples, please refer to our other training model repository -- Qwen2-7B-BD-RP.
Evaluation 🏆
We use objective questions to assess eight dimensions: Character, Style, Emotion, Relationship, Personality, Human-likeness, Coherence, and Role Consistency. The metric design can be find in our paper. The evaluation code can be found in GitHub. The results are shown below:
Model | Character ↑ | Style ↑ | Emotion ↓ | Relationship ↓ | Personality ↑ | Avg. ↑ | Human-likeness ↑ | Role Choice ↑ | Coherence ↑ |
---|---|---|---|---|---|---|---|---|---|
General Baselines(Proprietary) | |||||||||
GPT-4o | 74.32 ± 1.15 | 81.67 ± 1.51 | 16.31 ± 0.48 | 12.13 ± 0.66 | 66.58 ± 4.41 | 78.83 ± 1.64 | 67.33 ± 3.95 | 87.33 ± 3.86 | 99.67 ± 0.33 |
GPT-3.5-Turbo | 72.26 ± 1.27 | 73.66 ± 1.73 | 17.79 ± 0.56 | 14.17 ± 0.73 | 66.92 ± 4.85 | 76.18 ± 1.83 | 33.33 ± 4.43 | 83.00 ± 4.68 | 97.33 ± 1.17 |
Moonshot-v1-8k | 74.06 ± 1.19 | 80.64 ± 1.51 | 16.17 ± 0.47 | 13.42 ± 0.70 | 67.00 ± 4.87 | 78.42 ± 1.75 | 44.00 ± 4.33 | 86.67 ± 3.75 | 99.33 ± 0.46 |
Yi-Large-Turbo | 75.13 ± 1.22 | 79.18 ± 1.58 | 16.44 ± 0.49 | 13.48 ± 0.67 | 68.25 ± 4.61 | 78.53 ± 1.72 | 47.00 ± 4.60 | 84.33 ± 3.67 | 92.67 ± 2.39 |
Deepseek-Chat | 75.46 ± 1.14 | 81.49 ± 1.51 | 15.92 ± 0.46 | 12.42 ± 0.63 | 67.92 ± 4.57 | 79.30 ± 1.66 | 52.33 ± 4.95 | 83.00 ± 4.68 | 96.67 ± 1.00 |
Baichuan4 | 71.82 ± 1.25 | 76.92 ± 1.52 | 17.57 ± 0.52 | 12.30 ± 0.62 | 67.08 ± 4.75 | 77.19 ± 1.73 | 45.33 ± 4.31 | 82.33 ± 4.49 | 99.33 ± 0.46 |
Hunyuan | 73.77 ± 1.18 | 78.75 ± 1.56 | 17.24 ± 0.48 | 13.22 ± 0.68 | 67.00 ± 4.39 | 77.81 ± 1.66 | 53.00 ± 4.29 | 84.33 ± 4.52 | 98.33 ± 0.84 |
Role-play Expertise Baselines | |||||||||
Index-1.9B-Character | 73.33 ± 1.32 | 76.48 ± 1.50 | 17.99 ± 0.53 | 13.58 ± 0.71 | 66.33 ± 4.57 | 76.92 ± 1.73 | 21.67 ± 3.96 | 78.67 ± 5.14 | 69.67 ± 3.85 |
CharacterGLM-6B | 73.36 ± 1.28 | 76.08 ± 1.55 | 18.58 ± 0.55 | 14.27 ± 0.79 | 67.33 ± 4.34 | 76.79 ± 1.70 | 16.00 ± 2.38 | 81.00 ± 4.40 | 25.67 ± 3.48 |
Baichuan-NPC-Turbo | 75.19 ± 1.23 | 79.15 ± 1.38 | 17.24 ± 0.51 | 13.10 ± 0.69 | 65.33 ± 4.84 | 77.87 ± 1.73 | 56.00 ± 4.66 | 86.33 ± 4.90 | 99.00 ± 0.56 |
General Baselines(Open-source) | |||||||||
Yi-1.5-9B-Chat | 75.31 ± 1.20 | 76.78 ± 1.49 | 16.67 ± 0.52 | 12.75 ± 0.66 | 67.42 ± 4.63 | 78.02 ± 1.70 | 38.67 ± 4.39 | 84.00 ± 4.61 | 92.67 ± 1.79 |
GLM-4-9b-chat | 74.26 ± 1.19 | 78.40 ± 1.55 | 17.18 ± 0.50 | 14.48 ± 0.74 | 67.17 ± 4.93 | 77.63 ± 1.78 | 47.67 ± 4.25 | 83.33 ± 4.51 | 99.33 ± 0.46 |
Qwen2-7B-Instruct | 75.39 ± 1.13 | 77.68 ± 1.65 | 17.64 ± 0.56 | 13.43 ± 0.7 | 67.75 ± 4.44 | 77.95 ± 1.70 | 48.00 ± 4.66 | 83.33 ± 4.48 | 99.00 ± 0.56 |
Mistral-Nemo-Instruct-2407 | 74.12 ± 1.17 | 77.04 ± 1.48 | 17.00 ± 0.43 | 13.50 ± 0.67 | 67.00 ± 4.30 | 77.53 ± 1.61 | 53.67 ± 4.66 | 82.67 ± 4.77 | 74.33 ± 3.77 |
Mistral-Nemo-BD-RP | 74.58 ± 1.28 | 78.47 ± 1.45 | 16.62 ± 0.48 | 11.38 ± 0.67* | 69.08 ± 4.46 | 78.83 ± 1.67 | 59.00 ± 4.46 | 87.00 ± 4.73 | 92.67 ± 1.59 |
Citation 📖
Please cite our work if you found the resources in this repository useful:
@article{yu2024beyond,
title = {BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model},
author = {Yu, Yeyong and Yu, Runsheng and Wei, Haojie and Zhang, Zhanqiu and Qian, Quan},
year = {2024},
journal = {arXiv preprint arXiv:2408.10903},
}
Acknowledgements 🥰
We would like to express our sincere gratitude to Tencent LightSpeed Studios for their invaluable support in this project. Their contributions and encouragement have been instrumental in the successful completion of our work.