File size: 2,066 Bytes
21e9c2a
 
 
 
58ce7f2
 
21e9c2a
 
 
 
 
 
 
09f94b7
21e9c2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3efaed8
a1cbbe4
 
 
 
 
 
3efaed8
21e9c2a
 
 
8ac46f2
2f991b7
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
language: ko
tags:
- bart
datasets:
- korquad
license: mit
---

# Korean Question Generation Model

## Github

https://github.com/Seoneun/KoBART-Question-Generation

## Fine-tuning Dataset

KorQuAD 1.0

## Demo

https://huggingface.co/Sehong/kobart-QuestionGeneration

## How to use

```python
import torch
from transformers import PreTrainedTokenizerFast
from transformers import BartForConditionalGeneration

tokenizer = PreTrainedTokenizerFast.from_pretrained('Sehong/kobart-QuestionGeneration')
model = BartForConditionalGeneration.from_pretrained('Sehong/kobart-QuestionGeneration')

text = "1989λ…„ 2μ›” 15일 μ—¬μ˜λ„ 농민 폭λ ₯ μ‹œμœ„λ₯Ό μ£Όλ„ν•œ 혐의(폭λ ₯ν–‰μœ„λ“±μ²˜λ²Œμ—κ΄€ν•œλ²•λ₯ μœ„λ°˜)으둜 지λͺ…μˆ˜λ°°λ˜μ—ˆλ‹€. 1989λ…„ 3μ›” 12일 μ„œμšΈμ§€λ°©κ²€μ°°μ²­ κ³΅μ•ˆλΆ€λŠ” μž„μ’…μ„μ˜ μ‚¬μ „κ΅¬μ†μ˜μž₯을 λ°œλΆ€λ°›μ•˜λ‹€. 같은 ν•΄ 6μ›” 30일 평양좕전에 μž„μˆ˜κ²½μ„ λŒ€ν‘œλ‘œ νŒŒκ²¬ν•˜μ—¬ κ΅­κ°€λ³΄μ•ˆλ²•μœ„λ°˜ ν˜μ˜κ°€ μΆ”κ°€λ˜μ—ˆλ‹€. 경찰은 12μ›” 18일~20일 사이 μ„œμšΈ κ²½ν¬λŒ€ν•™κ΅μ—μ„œ μž„μ’…μ„μ΄ μ„±λͺ… λ°œν‘œλ₯Ό μΆ”μ§„ν•˜κ³  μžˆλ‹€λŠ” 첩보λ₯Ό μž…μˆ˜ν–ˆκ³ , 12μ›” 18일 μ˜€μ „ 7μ‹œ 40λΆ„ κ²½ κ°€μŠ€μ΄κ³Ό μ „μžλ΄‰μœΌλ‘œ 무μž₯ν•œ 특곡쑰 및 λŒ€κ³΅κ³Ό 직원 12λͺ… λ“± 22λͺ…μ˜ 사볡 경찰을 승용차 8λŒ€μ— λ‚˜λˆ„μ–΄ κ²½ν¬λŒ€ν•™κ΅μ— νˆ¬μž…ν–ˆλ‹€. 1989λ…„ 12μ›” 18일 μ˜€μ „ 8μ‹œ 15λΆ„ κ²½ μ„œμšΈμ²­λŸ‰λ¦¬κ²½μ°°μ„œλŠ” ν˜Έμœ„ 학생 5λͺ…κ³Ό ν•¨κ»˜ κ²½ν¬λŒ€ν•™κ΅ ν•™μƒνšŒκ΄€ 건물 계단을 λ‚΄λ €μ˜€λŠ” μž„μ’…μ„μ„ 발견, κ²€κ±°ν•΄ ꡬ속을 μ§‘ν–‰ν–ˆλ‹€. μž„μ’…μ„μ€ μ²­λŸ‰λ¦¬κ²½μ°°μ„œμ—μ„œ μ•½ 1μ‹œκ°„ λ™μ•ˆ 쑰사λ₯Ό 받은 λ’€ μ˜€μ „ 9μ‹œ 50λΆ„ κ²½ μ„œμšΈ μž₯μ•ˆλ™μ˜ μ„œμšΈμ§€λ°©κ²½μ°°μ²­ κ³΅μ•ˆλΆ„μ‹€λ‘œ μΈκ³„λ˜μ—ˆλ‹€. <unused0> 1989λ…„ 2μ›” 15일"

raw_input_ids = tokenizer.encode(text)
input_ids = [tokenizer.bos_token_id] + raw_input_ids + [tokenizer.eos_token_id]

summary_ids = model.generate(torch.tensor([input_ids]))
print(tokenizer.decode(summary_ids.squeeze().tolist(), skip_special_tokens=True))

# <unused0> is sep_token, sep_token seperate content and answer
```