byeongal commited on
Commit
4b95cf0
โ€ข
1 Parent(s): ab74066

kobart-news

Browse files
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ko
3
+ license: mit
4
+ tags:
5
+ - summarization
6
+ - bart
7
+ ---
8
+ # kobart-news
9
+ - This model is a [kobart](https://huggingface.co/hyunwoongko/kobart) fine-tuned on the [๋ฌธ์„œ์š”์•ฝ ํ…์ŠคํŠธ/์‹ ๋ฌธ๊ธฐ์‚ฌ](https://aihub.or.kr/aidata/8054) using [Ainize Teachable-NLP](https://ainize.ai/teachable-nlp).
10
+
11
+ ## Usage
12
+ ### Python Code
13
+ ```python
14
+ from transformers import PreTrainedTokenizerFast, BartForConditionalGeneration
15
+ # Load Model and Tokenize
16
+ tokenizer = PreTrainedTokenizerFast.from_pretrained("ainize/kobart-news")
17
+ model = BartForConditionalGeneration.from_pretrained("ainize/kobart-news")
18
+ # Encode Input Text
19
+ input_text = '๊ตญ๋‚ด ์ „๋ฐ˜์ ์ธ ๊ฒฝ๊ธฐ์นจ์ฒด๋กœ ์ƒ๊ฐ€ ๊ฑด๋ฌผ์ฃผ์˜ ์ˆ˜์ต๋„ ์ „๊ตญ์ ์ธ ๊ฐ์†Œ์„ธ๋ฅผ ๋ณด์ด๊ณ  ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ์ˆ˜์ตํ˜• ๋ถ€๋™์‚ฐ ์—ฐ๊ตฌ๊ฐœ๋ฐœ๊ธฐ์—… ์ƒ๊ฐ€์ •๋ณด์—ฐ๊ตฌ์†Œ๋Š” ํ•œ๊ตญ๊ฐ์ •์› ํ†ต๊ณ„๋ฅผ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ ์ „๊ตญ ์ค‘๋Œ€ํ˜• ์ƒ๊ฐ€ ์ˆœ์˜์—…์†Œ๋“(๋ถ€๋™์‚ฐ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ž„๋Œ€์ˆ˜์ž…, ๊ธฐํƒ€์ˆ˜์ž…์—์„œ ์ œ๋ฐ˜ ๊ฒฝ๋น„๋ฅผ ๊ณต์ œํ•œ ์ˆœ์†Œ๋“)์ด 1๋ถ„๊ธฐ ใŽก๋‹น 3๋งŒ4200์›์—์„œ 3๋ถ„๊ธฐ 2๋งŒ5800์›์œผ๋กœ ๊ฐ์†Œํ–ˆ๋‹ค๊ณ  17์ผ ๋ฐํ˜”๋‹ค. ์ˆ˜๋„๊ถŒ, ์„ธ์ข…์‹œ, ์ง€๋ฐฉ๊ด‘์—ญ์‹œ์—์„œ ์ˆœ์˜์—…์†Œ๋“์ด ๊ฐ€์žฅ ๋งŽ์ด ๊ฐ์†Œํ•œ ์ง€์—ญ์€ 3๋ถ„๊ธฐ 1๋งŒ3100์›์„ ๊ธฐ๋กํ•œ ์šธ์‚ฐ์œผ๋กœ, 1๋ถ„๊ธฐ 1๋งŒ9100์› ๋Œ€๋น„ 31.4% ๊ฐ์†Œํ–ˆ๋‹ค. ์ด์–ด ๋Œ€๊ตฌ(-27.7%), ์„œ์šธ(-26.9%), ๊ด‘์ฃผ(-24.9%), ๋ถ€์‚ฐ(-23.5%), ์„ธ์ข…(-23.4%), ๋Œ€์ „(-21%), ๊ฒฝ๊ธฐ(-19.2%), ์ธ์ฒœ(-18.5%) ์ˆœ์œผ๋กœ ๊ฐ์†Œํ–ˆ๋‹ค. ์ง€๋ฐฉ ๋„์‹œ์˜ ๊ฒฝ์šฐ๋„ ๋น„์Šทํ–ˆ๋‹ค. ๊ฒฝ๋‚จ์˜ 3๋ถ„๊ธฐ ์ˆœ์˜์—…์†Œ๋“์€ 1๋งŒ2800์›์œผ๋กœ 1๋ถ„๊ธฐ 1๋งŒ7400์› ๋Œ€๋น„ 26.4% ๊ฐ์†Œํ–ˆ์œผ๋ฉฐ ์ œ์ฃผ(-25.1%), ๊ฒฝ๋ถ(-24.1%), ์ถฉ๋‚จ(-20.9%), ๊ฐ•์›(-20.9%), ์ „๋‚จ(-20.1%), ์ „๋ถ(-17%), ์ถฉ๋ถ(-15.3%) ๋“ฑ๋„ ๊ฐ์†Œ์„ธ๋ฅผ ๋ณด์˜€๋‹ค. ์กฐํ˜„ํƒ ์ƒ๊ฐ€์ •๋ณด์—ฐ๊ตฌ์†Œ ์—ฐ๊ตฌ์›์€ "์˜ฌํ•ด ๋‚ด์ˆ˜ ๊ฒฝ๊ธฐ์˜ ์นจ์ฒด๋œ ๋ถ„์œ„๊ธฐ๊ฐ€ ์œ ์ง€๋˜๋ฉฐ ์ƒ๊ฐ€, ์˜คํ”ผ์Šค ๋“ฑ์„ ๋น„๋กฏํ•œ ์ˆ˜์ตํ˜• ๋ถ€๋™์‚ฐ ์‹œ์žฅ์˜ ๋ถ„์œ„๊ธฐ๋„ ๊ฒฝ์ง๋œ ๋ชจ์Šต์„ ๋ณด์˜€๊ณ  ์˜คํ”ผ์Šคํ…”, ์ง€์‹์‚ฐ์—…์„ผํ„ฐ ๋“ฑ์˜ ์ˆ˜์ตํ˜• ๋ถ€๋™์‚ฐ ๊ณต๊ธ‰๋„ ์ฆ๊ฐ€ํ•ด ๊ณต์‹ค์˜ ์œ„ํ—˜๋„ ๋Š˜์—ˆ๋‹ค"๋ฉฐ "์‹ค์ œ ์˜ฌ 3๋ถ„๊ธฐ ์ „๊ตญ ์ค‘๋Œ€ํ˜• ์ƒ๊ฐ€ ๊ณต์‹ค๋ฅ ์€ 11.5%๋ฅผ ๊ธฐ๋กํ•˜๋ฉฐ 1๋ถ„๊ธฐ 11.3% ๋Œ€๋น„ 0.2% ํฌ์ธํŠธ ์ฆ๊ฐ€ํ–ˆ๋‹ค"๊ณ  ๋งํ–ˆ๋‹ค. ๊ทธ๋Š” "์ตœ๊ทผ ์†Œ์…œ์ปค๋จธ์Šค(SNS๋ฅผ ํ†ตํ•œ ์ „์ž์ƒ๊ฑฐ๋ž˜), ์Œ์‹ ๋ฐฐ๋‹ฌ ์ค‘๊ฐœ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜, ์ค‘๊ณ  ๋ฌผํ’ˆ ๊ฑฐ๋ž˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋“ฑ์˜ ์‚ฌ์šฉ ์ฆ๊ฐ€๋กœ ์˜คํ”„๋ผ์ธ ๋งค์žฅ์— ์˜ํ–ฅ์„ ๋ฏธ์ณค๋‹ค"๋ฉฐ "ํ–ฅํ›„ ์ง€์—ญ, ์ฝ˜ํ…์ธ ์— ๋”ฐ๋ฅธ ์ƒ๊ถŒ ์–‘๊ทนํ™” ํ˜„์ƒ์€ ์‹ฌํ™”๋  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค"๊ณ  ๋ง๋ถ™์˜€๋‹ค.'
20
+ input_ids = tokenizer.encode(input_text, return_tensors="pt")
21
+ # Generate Summary Text Ids
22
+ summary_text_ids = model.generate(
23
+ input_ids=input_ids,
24
+ bos_token_id=model.config.bos_token_id,
25
+ eos_token_id=model.config.eos_token_id,
26
+ length_penalty=2.0,
27
+ max_length=142,
28
+ min_length=56,
29
+ num_beams=4,
30
+ )
31
+ # Decoding Text
32
+ print(tokenizer.decode(summary_text_ids[0], skip_special_tokens=True))
33
+ ```
34
+ ### API and Demo
35
+ You can experience this model through [ainize-api](https://ainize.ai/gkswjdzz/summarize-torchserve?branch=main) and [ainize-demo](https://main-summarize-torchserve-gkswjdzz.endpoint.ainize.ai/).
36
+
37
+
config.json ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "kobart_news",
3
+ "activation_dropout": 0.0,
4
+ "activation_function": "gelu",
5
+ "add_bias_logits": false,
6
+ "add_final_layer_norm": false,
7
+ "architectures": [
8
+ "BartForConditionalGeneration"
9
+ ],
10
+ "attention_dropout": 0.0,
11
+ "bos_token_id": 0,
12
+ "classif_dropout": 0.1,
13
+ "classifier_dropout": 0.1,
14
+ "d_model": 768,
15
+ "decoder_attention_heads": 16,
16
+ "decoder_ffn_dim": 3072,
17
+ "decoder_layerdrop": 0.0,
18
+ "decoder_layers": 6,
19
+ "decoder_start_token_id": 2,
20
+ "do_blenderbot_90_layernorm": false,
21
+ "dropout": 0.1,
22
+ "encoder_attention_heads": 16,
23
+ "encoder_ffn_dim": 3072,
24
+ "encoder_layerdrop": 0.0,
25
+ "encoder_layers": 6,
26
+ "eos_token_id": 1,
27
+ "extra_pos_embeddings": 2,
28
+ "force_bos_token_to_be_generated": false,
29
+ "forced_eos_token_id": 2,
30
+ "gradient_checkpointing": false,
31
+ "id2label": {
32
+ "0": "NEGATIVE",
33
+ "1": "POSITIVE"
34
+ },
35
+ "init_std": 0.02,
36
+ "is_encoder_decoder": true,
37
+ "label2id": {
38
+ "NEGATIVE": 0,
39
+ "POSITIVE": 1
40
+ },
41
+ "max_position_embeddings": 1026,
42
+ "model_type": "bart",
43
+ "normalize_before": false,
44
+ "normalize_embedding": true,
45
+ "num_hidden_layers": 6,
46
+ "pad_token_id": 3,
47
+ "scale_embedding": false,
48
+ "static_position_embeddings": false,
49
+ "transformers_version": "4.8.1",
50
+ "use_cache": true,
51
+ "task_specific_params": {
52
+ "summarization": {
53
+ "length_penalty": 1.0,
54
+ "max_length": 128,
55
+ "min_length": 12,
56
+ "num_beams": 4
57
+ }
58
+ },
59
+ "vocab_size": 30000
60
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd2e832d56099465123486157f7a0c53b4ef3a6dadfdf48fb76fa089529859d2
3
+ size 495662867
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": false}}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"unk_token": "<unk>", "bos_token": "<s>", "eos_token": "</s>", "add_prefix_space": false, "errors": "replace", "sep_token": "</s>", "cls_token": "<s>", "pad_token": "<pad>", "mask_token": "<mask>", "do_lower_case": false, "special_tokens_map_file": null, "tokenizer_class": "PreTrainedTokenizerFast"}