Edit model card

Chat凉宫春日的对话抽取模型

我们希望有一个模型能够从小说的chunk中批量去提取摘要和对话

这个模型就是实现了这一点。模型使用了大约30k的中文小说数据和20k的英文小说数据进行训练,在qwen-1.8上进行了3个epoch的finetune。 原则上模型同时支持中文和英文小说的抽取

主项目链接 https://github.com/LC1332/Chat-Haruhi-Suzumiya

  • 李鲁鲁完成了数据的收集,以及进一步将inference程序扩展到连续的chunks
  • 刘崇寒完成了模型的训练
  • 米唯实测试并上传模型到hugging face

Chat Haruhi Suzumiya's Dialogue Extraction Model

We hope to have a model that can extract summaries and dialogues in batches from chunks of novels.

This model achieves just that. It was trained using approximately 30k Chinese novels and 20k English novels, and was fine-tuned on qwen-1.8 for three epochs. In principle, the model supports extracting for both Chinese and English novels.

Main project link: https://github.com/LC1332/Chat-Haruhi-Suzumiya

Inference Code

https://github.com/LC1332/Chat-Haruhi-Suzumiya/blob/main/notebook/Dialogue_Speaker_Extract_Test.ipynb

from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("silk-road/Haruhi-Dialogue-Speaker-Extract_qwen18", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("silk-road/Haruhi-Dialogue-Speaker-Extract_qwen18", device_map="auto", trust_remote_code=True)

sys_prompt = "给定input paragraph,抽取其中的对话,并输出为json格式 Let's think it step by step 1. summarize input paragraph into bullet format,存储在summary字段 2. 抽取每一句对话的内容 dialogue,判断每一句话的说话人 said by, 存储在conversations中"

text = "Your novel text"
response_str, history = model.chat(tokenizer, text, history=[], system=sys_prompt)

Official Prompt

Chinese:

给定input paragraph,抽取其中的对话,并输出为json格式 Let's think it step by step 1. summarize input paragraph into bullet format,存储在summary字段 2. 抽取每一句对话的内容 dialogue,判断每一句话的说话人 said by, 存储在conversations中

English:

Given an input paragraph, extract the dialogues within it, and output them in JSON format.

Let's think about it step by step:
- Summarize the input paragraph into bullet points and store it in the 'summary' field.
- Extract the content of each dialogue ('dialogue'), identify the speaker for each sentence ('said by'), and store these in 'conversations'.

TODO

  • 拓展到多chunks的inference
  • 提供英语的例子
  • 提供一个多章节并行inference的例子
  • 在json解析失败的时候尝试直接从raw字符串提取summary
  • 在失败的时候额外尝试调用openai进行推理

TODO

  • Expand to multi-chunk inference
  • Provide an English example
  • Provide an example of multi-chapter parallel inference
  • Try extracting summary directly from raw strings when JSON parsing fails
  • Additionally attempt to use OpenAI for inference when failing

Chinese Output Example

{'summary': '- 彭蠡不在家中,老刀感到担忧并等待着彭蠡回家的时间,同时观察周围环境和人们的消费行为,表现出内心的饥饿感和焦虑情绪。', 'conversations': [{'dialogue': '哎,你们知道那儿一盘回锅肉多少钱吗?', 'said_by': '小李'}, {'dialogue': '靠,菜里有沙子。', 'said_by': '小丁'}, {'dialogue': '人家那儿一盘回锅肉,就三百四。', 'said_by': '小李'}, {'dialogue': '什么玩意?这么贵。', 'said_by': '小丁'}, {'dialogue': '你吃不了这么多。', 'said_by': '小李'}]}
{'summary': '- 彭蠡在家等待彭蠡回家,表现出内心的饥饿感和焦虑情绪,同时对彭蠡的行为表示不满和失望。彭蠡则对老刀的行为表现出冷漠和不屑的态度。', 'conversations': [{'dialogue': '我没时间和你解释。我需要去第一空间,你告诉我怎么走。', 'said_by': '老刀'}, {'dialogue': '回我家说,要走也从那儿走。', 'said_by': '彭蠡'}, {'dialogue': '回家啦,回家啦。转换马上开始了。', 'said_by': '车上的人'}, {'dialogue': '你不告诉我为什么,我就不告诉你怎么走。', 'said_by': '彭蠡'}, {'dialogue': '你躲在垃圾道里?去第二空间?那你得等24小时啊。', 'said_by': '彭蠡'}, {'dialogue': '二十万块。等一礼拜也值啊。', 'said_by': '老刀'}, {'dialogue': '你就这么缺钱花?', 'said_by': '彭蠡'}, {'dialogue': '糖糖还有一年多该去幼儿园了。我来不及了。', 'said_by': '老刀'}, {'dialogue': '你别说了。', 'said_by': '彭蠡'}]}
{'summary': '- 彭蠡对彭蠡的行为表现出不满和失望,同时对老刀的行为表现出冷漠和不屑的态度。', 'conversations': [{'dialogue': '你真是作死,她又不是你闺女,犯得着吗。', 'said_by': '彭蠡'}, {'dialogue': '别说这些了。快告我怎么走。', 'said_by': '老刀'}, {'dialogue': '你可得知道,万一被抓着,可不只是罚款,得关上好几个月。', 'said_by': '彭蠡'}, {'dialogue': '你不是去过好多次吗?', 'said_by': '老刀'}, {'dialogue': '只有四次。第五次就被抓了。', 'said_by': '彭蠡'}, {'dialogue': '那也够了。我要是能去四次,抓一次也无所谓。', 'said_by': '老刀'}, {'dialogue': '别说了。你要是真想让我带你去,我就带你去。', 'said_by': '彭蠡'}]}

- 彭蠡不在家中,老刀感到担忧并等待着彭蠡回家的时间,同时观察周围环境和人们的消费行为,表现出内心的饥饿感和焦虑情绪。
小李 : 哎,你们知道那儿一盘回锅肉多少钱吗?
小丁 : 靠,菜里有沙子。
小李 : 人家那儿一盘回锅肉,就三百四。
小丁 : 什么玩意?这么贵。
小李 : 你吃不了这么多。
- 彭蠡在家等待彭蠡回家,表现出内心的饥饿感和焦虑情绪,同时对彭蠡的行为表示不满和失望。彭蠡则对老刀的行为表现出冷漠和不屑的态度。
老刀 : 我没时间和你解释。我需要去第一空间,你告诉我怎么走。
彭蠡 : 回我家说,要走也从那儿走。
车上的人 : 回家啦,回家啦。转换马上开始了。
彭蠡 : 你不告诉我为什么,我就不告诉你怎么走。
彭蠡 : 你躲在垃圾道里?去第二空间?那你得等24小时啊。
老刀 : 二十万块。等一礼拜也值啊。
彭蠡 : 你就这么缺钱花?
老刀 : 糖糖还有一年多该去幼儿园了。我来不及了。
彭蠡 : 你别说了。
- 彭蠡对彭蠡的行为表现出不满和失望,同时对老刀的行为表现出冷漠和不屑的态度。
彭蠡 : 你真是作死,她又不是你闺女,犯得着吗。
老刀 : 别说这些了。快告我怎么走。
彭蠡 : 你可得知道,万一被抓着,可不只是罚款,得关上好几个月。
老刀 : 你不是去过好多次吗?
彭蠡 : 只有四次。第五次就被抓了。
老刀 : 那也够了。我要是能去四次,抓一次也无所谓。
彭蠡 : 别说了。你要是真想让我带你去,我就带你去。

English Output Example

{'summary': "Snow-covered Paris, Kimura's workshop, artist and viewer engaging in conversation.", 'conversations': [{'dialogue': 'You should hear the stories they tell of you at the café. If Émile is to be believed, you arrived here as an ukiyo-e courtesan, nothing more than paper wrapped around a porcelain bowl. A painter—he will not say which of us it was, of course—bought the bowl and the print along with it.', 'said_by': 'Artist'}, {'dialogue': 'And the painter pulled me from the print with the sheer force of his imagination, I’m sure. Émile is a novelist and can hardly be trusted to give an accurate account. The reality of my conception is vastly more mundane, I assure you…though it does involve a courtesan.', 'said_by': 'Woman'}, {'dialogue': 'A grain of truth makes for the best fiction. nude, but leave the jewelry and the shoes. I’ll paint you on the chaise. We’ll have three hours in the proper light, and I will pay you four francs.', 'said_by': 'Artist'}, {'dialogue': 'Victorine gets five!', 'said_by': 'Woman'}, {'dialogue': 'Victorine is a redhead.', 'said_by': 'Artist'}, {'dialogue': 'My name is Mariko, by the way, but everyone calls me Mari.', 'said_by': 'Mariko'}]}
{'summary': "Snow-covered Paris, Kimura's workshop, artist and viewer engaged in conversation. Artist and viewer engage in intimate conversation and interaction.", 'conversations': [{'dialogue': 'I’m on the chaise', 'said_by': 'Artist'}, {'dialogue': 'Bring your left hip forward. No, not that far. Bend the leg a bit more, yes. Turn your head to face the canvas.', 'said_by': 'Artist'}, {'dialogue': 'Like a Manet', 'said_by': 'Artist'}, {'dialogue': 'Don’t like a model that talks while you work, huh?', 'said_by': 'Artist'}, {'dialogue': 'I don’t like being compared to other artists.', 'said_by': 'Artist'}, {'dialogue': 'Then you must paint me so well that I forget about the others.', 'said_by': 'Artist'}, {'dialogue': 'Tilt your head into the light. And look at me intently. Intently. As though I were the one naked on the chaise.', 'said_by': 'Artist'}, {'dialogue': 'You did better than I would have expected.', 'said_by': 'Artist'}, {'dialogue': 'There are other poses I could show you, if you like?', 'said_by': 'Artist'}, {'dialogue': 'But the sooner I get started on this portrait, the better.', 'said_by': 'Artist'}]}
{'summary': "Kimura's workshop, artist and viewer engaging in intimate conversation and interaction. Kimura responds with a strong, cold embrace, leading to a passionate physical exchange. Afterward, the artist falls asleep, leaving the narrator feeling incomplete and longing.", 'num': 14, 'conversations': [{'dialogue': 'I could show you other poses.', 'said_by': 'Kimura'}, {'dialogue': 'Yes.', 'said_by': 'Kimura'}, {'dialogue': 'See you tomorrow?', 'said_by': 'Artist'}]}

Snow-covered Paris, Kimura's workshop, artist and viewer engaging in conversation.
Artist : You should hear the stories they tell of you at the café. If Émile is to be believed, you arrived here as an ukiyo-e courtesan, nothing more than paper wrapped around a porcelain bowl. A painter—he will not say which of us it was, of course—bought the bowl and the print along with it.
Woman : And the painter pulled me from the print with the sheer force of his imagination, I’m sure. Émile is a novelist and can hardly be trusted to give an accurate account. The reality of my conception is vastly more mundane, I assure you…though it does involve a courtesan.
Artist : A grain of truth makes for the best fiction. nude, but leave the jewelry and the shoes. I’ll paint you on the chaise. We’ll have three hours in the proper light, and I will pay you four francs.
Woman : Victorine gets five!
Artist : Victorine is a redhead.
Mariko : My name is Mariko, by the way, but everyone calls me Mari.
Snow-covered Paris, Kimura's workshop, artist and viewer engaged in conversation. Artist and viewer engage in intimate conversation and interaction.
Artist : I’m on the chaise
Artist : Bring your left hip forward. No, not that far. Bend the leg a bit more, yes. Turn your head to face the canvas.
Artist : Like a Manet
Artist : Don’t like a model that talks while you work, huh?
Artist : I don’t like being compared to other artists.
Artist : Then you must paint me so well that I forget about the others.
Artist : Tilt your head into the light. And look at me intently. Intently. As though I were the one naked on the chaise.
Artist : You did better than I would have expected.
Artist : There are other poses I could show you, if you like?
Artist : But the sooner I get started on this portrait, the better.
Kimura's workshop, artist and viewer engaging in intimate conversation and interaction. Kimura responds with a strong, cold embrace, leading to a passionate physical exchange. Afterward, the artist falls asleep, leaving the narrator feeling incomplete and longing.
Kimura : I could show you other poses.
Kimura : Yes.
Artist : See you tomorrow?
Downloads last month
40
Safetensors
Model size
1.84B params
Tensor type
FP16
·
Inference API
Inference API (serverless) does not yet support model repos that contain custom code.