rahular commited on
Commit
e3f9170
β€’
1 Parent(s): 3e9388e

added docs

Browse files
Files changed (2) hide show
  1. README.md +2 -2
  2. ibleu.py +46 -43
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: iBleu
3
  emoji: πŸ“Š
4
  colorFrom: red
5
  colorTo: indigo
@@ -9,4 +9,4 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: iBLEU
3
  emoji: πŸ“Š
4
  colorFrom: red
5
  colorTo: indigo
 
9
  pinned: false
10
  ---
11
 
12
+ iBLEU measures the adequacy and dissimilarity of generated paraphrases.
ibleu.py CHANGED
@@ -7,56 +7,59 @@ from packaging import version
7
  import evaluate
8
 
9
 
10
- _DESCRIPTION = """
11
- Accuracy is the proportion of correct predictions among the total number of cases processed. It can be computed with:
12
- Accuracy = (TP + TN) / (TP + TN + FP + FN)
13
- Where:
14
- TP: True positive
15
- TN: True negative
16
- FP: False positive
17
- FN: False negative
 
 
 
 
 
 
18
  """
19
 
 
 
 
20
 
21
  _KWARGS_DESCRIPTION = """
 
22
  Args:
23
- predictions (`list` of `int`): Predicted labels.
24
- references (`list` of `int`): Ground truth labels.
25
- normalize (`boolean`): If set to False, returns the number of correctly classified samples. Otherwise, returns the fraction of correctly classified samples. Defaults to True.
26
- sample_weight (`list` of `float`): Sample weights Defaults to None.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  Returns:
28
- accuracy (`float` or `int`): Accuracy score. Minimum possible value is 0. Maximum possible value is 1.0, or the number of examples input, if `normalize` is set to `True`.. A higher score means higher accuracy.
29
  Examples:
30
- Example 1-A simple example
31
- >>> accuracy_metric = evaluate.load("accuracy")
32
- >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0])
33
- >>> print(results)
34
- {'accuracy': 0.5}
35
- Example 2-The same as Example 1, except with `normalize` set to `False`.
36
- >>> accuracy_metric = evaluate.load("accuracy")
37
- >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], normalize=False)
38
- >>> print(results)
39
- {'accuracy': 3.0}
40
- Example 3-The same as Example 1, except with `sample_weight` set.
41
- >>> accuracy_metric = evaluate.load("accuracy")
42
- >>> results = accuracy_metric.compute(references=[0, 1, 2, 0, 1, 2], predictions=[0, 1, 1, 2, 1, 0], sample_weight=[0.5, 2, 0.7, 0.5, 9, 0.4])
43
- >>> print(results)
44
- {'accuracy': 0.8778625954198473}
45
- """
46
-
47
-
48
- _CITATION = """
49
- @article{scikit-learn,
50
- title={Scikit-learn: Machine Learning in {P}ython},
51
- author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
52
- and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
53
- and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
54
- Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
55
- journal={Journal of Machine Learning Research},
56
- volume={12},
57
- pages={2825--2830},
58
- year={2011}
59
- }
60
  """
61
 
62
 
 
7
  import evaluate
8
 
9
 
10
+ _CITATION = """\
11
+ @inproceedings{sun-zhou-2012-joint,
12
+ title = "Joint Learning of a Dual {SMT} System for Paraphrase Generation",
13
+ author = "Sun, Hong and
14
+ Zhou, Ming",
15
+ booktitle = "Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
16
+ month = jul,
17
+ year = "2012",
18
+ address = "Jeju Island, Korea",
19
+ publisher = "Association for Computational Linguistics",
20
+ url = "https://aclanthology.org/P12-2008",
21
+ pages = "38--42",
22
+ }
23
+
24
  """
25
 
26
+ _DESCRIPTION = """\
27
+ iBLEU measures the adequacy and dissimilarity of generated paraphrases.
28
+ """
29
 
30
  _KWARGS_DESCRIPTION = """
31
+ Produces iBLEU score from an input and a prediction against one or more references.
32
  Args:
33
+ inputs (`list` of `str`): list of model inputs. Each input should be tokenized into a list of tokens.
34
+ predictions (`list` of `str`): list of translations to score. Each translation should be tokenized into a list of tokens.
35
+ references (`list` of `list` of `str`): A list of lists of references. The contents of the first sub-list are the references for the first prediction, the contents of the second sub-list are for the second prediction, etc. Note that there must be the same number of references for each prediction (i.e. all sub-lists must be of the same length).
36
+ alpha (`float`): parameter for balancing between adequacy and dissimilarity; smaller Ξ± value indicates larger punishment on self-paraphrase.
37
+ smooth_method (`str`): The smoothing method to use, defaults to `'exp'`. Possible values are:
38
+ - `'none'`: no smoothing
39
+ - `'floor'`: increment zero counts
40
+ - `'add-k'`: increment num/denom by k for n>1
41
+ - `'exp'`: exponential decay
42
+ smooth_value (`float`): The smoothing value. Only valid when `smooth_method='floor'` (in which case `smooth_value` defaults to `0.1`) or `smooth_method='add-k'` (in which case `smooth_value` defaults to `1`).
43
+ tokenize (`str`): Tokenization method to use for iBLEU. If not provided, defaults to `'zh'` for Chinese, `'ja-mecab'` for Japanese and `'13a'` (mteval) otherwise. Possible values are:
44
+ - `'none'`: No tokenization.
45
+ - `'zh'`: Chinese tokenization.
46
+ - `'13a'`: mimics the `mteval-v13a` script from Moses.
47
+ - `'intl'`: International tokenization, mimics the `mteval-v14` script from Moses
48
+ - `'char'`: Language-agnostic character-level tokenization.
49
+ - `'ja-mecab'`: Japanese tokenization. Uses the [MeCab tokenizer](https://pypi.org/project/mecab-python3).
50
+ lowercase (`bool`): If `True`, lowercases the input, enabling case-insensitivity. Defaults to `False`.
51
+ force (`bool`): If `True`, insists that your tokenized input is actually detokenized. Defaults to `False`.
52
+ use_effective_order (`bool`): If `True`, stops including n-gram orders for which precision is 0. This should be `True`, if sentence-level BLEU will be computed. Defaults to `False`.
53
  Returns:
54
+ 'score': iBLEU score,
55
  Examples:
56
+ >>> inputs = ["greetings general kenobi", "foo foo bar bar"]
57
+ >>> predictions = ["hello there general kenobi", "foo bar foobar"]
58
+ >>> references = [["hello there general kenobi", "hello there !"], ["foo bar foobar", "foo bar foobar"]]
59
+ >>> ibleu = evaluate.load("rahular/ibleu")
60
+ >>> results = ibleu.compute(inputs=inputs, predictions=predictions, references=references)
61
+ >>> print(results)
62
+ {'score': 60.41585343630594}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  """
64
 
65