krotima1 commited on
Commit
dbcbd62
1 Parent(s): 2b70e5b

feat: add usage to readme

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -34,6 +34,61 @@ metrics:
34
  This model is a fine-tuned checkpoint of [facebook/m2m100_418M](https://huggingface.co/facebook/m2m100_418M) on the Multilingual large summarization dataset focused on Czech texts to produce multilingual summaries.
35
  ## Task
36
  The model deals with a multi-sentence summary in eight different languages. With the idea of adding other foreign language documents, and by having a considerable amount of Czech documents, we aimed to improve model summarization in the Czech language. Supported languages: ''cs', 'en', 'de', 'es', 'fr', 'ru', 'tu', 'zh'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ## Dataset
38
  Multilingual large summarization dataset consists of 10 sub-datasets mainly based on news and daily mails. For the training, it was used the entire training set and 72% of the validation set.
39
  ```
 
34
  This model is a fine-tuned checkpoint of [facebook/m2m100_418M](https://huggingface.co/facebook/m2m100_418M) on the Multilingual large summarization dataset focused on Czech texts to produce multilingual summaries.
35
  ## Task
36
  The model deals with a multi-sentence summary in eight different languages. With the idea of adding other foreign language documents, and by having a considerable amount of Czech documents, we aimed to improve model summarization in the Czech language. Supported languages: ''cs', 'en', 'de', 'es', 'fr', 'ru', 'tu', 'zh'
37
+
38
+ Assume that you are using the provided MultilingualSummarizer.ipynb file and included files from git repository.
39
+
40
+ ```python
41
+ ## Configuration of summarization pipeline
42
+ #
43
+ def summ_config():
44
+ cfg = OrderedDict([
45
+
46
+ ## summarization model - checkpoint
47
+ # ctu-aic/m2m100-418M-multilingual-summarization-multilarge-cs
48
+ # ctu-aic/mt5-base-multilingual-summarization-multilarge-cs
49
+ # ctu-aic/mbart25-multilingual-summarization-multilarge-cs
50
+ ("model_name", "ctu-aic/mbart25-multilingual-summarization-multilarge-cs"),
51
+
52
+ ## language of summarization task
53
+ # language : string : cs, en, de, fr, es, tr, ru, zh
54
+ ("language", "en"),
55
+
56
+ ## generation method parameters in dictionary
57
+ #
58
+ ("inference_cfg", OrderedDict([
59
+ ("num_beams", 4),
60
+ ("top_k", 40),
61
+ ("top_p", 0.92),
62
+ ("do_sample", True),
63
+ ("temperature", 0.95),
64
+ ("repetition_penalty", 1.23),
65
+ ("no_repeat_ngram_size", None),
66
+ ("early_stopping", True),
67
+ ("max_length", 128),
68
+ ("min_length", 10),
69
+ ])),
70
+ #texts to summarize values = (list of strings, string, dataset)
71
+ ("texts",
72
+ [
73
+ "english text1 to summarize",
74
+ "english text2 to summarize",
75
+ ]
76
+ ),
77
+ #OPTIONAL: Target summaries values = (list of strings, string, None)
78
+ ('golds',
79
+ [
80
+ "target english text1",
81
+ "target english text2",
82
+ ]),
83
+ #('golds', None),
84
+ ])
85
+ return cfg
86
+
87
+ cfg = summ_config()
88
+ mSummarize = MultiSummarizer(**cfg)
89
+ ret = mSummarize(**cfg)
90
+ ```
91
+
92
  ## Dataset
93
  Multilingual large summarization dataset consists of 10 sub-datasets mainly based on news and daily mails. For the training, it was used the entire training set and 72% of the validation set.
94
  ```