nbansal commited on
Commit
64ec1e5
1 Parent(s): 61ff8d5

Added description

Browse files
Files changed (1) hide show
  1. semf1.py +36 -4
semf1.py CHANGED
@@ -11,7 +11,7 @@
11
  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
  # See the License for the specific language governing permissions and
13
  # limitations under the License.
14
- # TODO: Add test cases, Provide an option to pass batch size when computing the embeddings
15
  """SEM-F1 metric"""
16
 
17
  import abc
@@ -58,19 +58,51 @@ _KWARGS_DESCRIPTION = """
58
  SEM-F1 compares the system generated overlap summary with ground truth reference overlap.
59
 
60
  Args:
61
- predictions: List[List(str)] - List of predictions where each prediction is a list of sentences.
62
- references: List[List(str)] - List of references where each reference is a list of sentences.
63
  reference should be a string with tokens separated by spaces.
64
  model_type: str - Model to use. [pv1, stsb, use]
65
  Options:
66
- pv1 - paraphrase-distilroberta-base-v1
67
  stsb - stsb-roberta-large
68
  use - Universal Sentence Encoder
 
 
 
 
 
 
 
69
  Returns:
70
  precision: Precision.
71
  recall: Recall.
72
  f1: F1 score.
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  Examples:
75
 
76
  >>> import evaluate
 
11
  # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
  # See the License for the specific language governing permissions and
13
  # limitations under the License.
14
+ # TODO: Add test cases, Remove tokenize_sentences flag since it can be determined from the input itself.
15
  """SEM-F1 metric"""
16
 
17
  import abc
 
58
  SEM-F1 compares the system generated overlap summary with ground truth reference overlap.
59
 
60
  Args:
61
+ predictions: list - List of predictions (Details below)
62
+ references: list - List of references (Details below)
63
  reference should be a string with tokens separated by spaces.
64
  model_type: str - Model to use. [pv1, stsb, use]
65
  Options:
66
+ pv1 - paraphrase-distilroberta-base-v1 (Default)
67
  stsb - stsb-roberta-large
68
  use - Universal Sentence Encoder
69
+ tokenize_sentences: bool - Sentence tokenize the input document (prediction/reference). Default: True.
70
+ gpu: Union[bool, int] - Whether to use GPU or CPU.
71
+ Options:
72
+ False - CPU (Default)
73
+ True - GPU, device 0
74
+ n: int - GPU, device n
75
+ batch_size: int - Batch Size, Default = 32.
76
  Returns:
77
  precision: Precision.
78
  recall: Recall.
79
  f1: F1 score.
80
 
81
+ There are 4 possible cases for inputs corresponding to predictions and references arguments
82
+ Case 1: Multi-Ref = False, tokenize_sentences = False
83
+ predictions: List[List[str]] - List of predictions where each prediction is a list of sentences.
84
+ references: List[List[str]] - List of references where each reference is a list of sentences.
85
+ Case 2: Multi-Ref = False, tokenize_sentences = True
86
+ predictions: List[str] - List of predictions where each prediction is a document
87
+ references: List[str] - List of references where each reference is a document
88
+ Case 3: Multi-Ref = True, tokenize_sentences = False
89
+ predictions: List[List[str]] - List of predictions where each prediction is a list of sentences.
90
+ references: List[List[List[str]]] - List of multi-references i.e. [[r11, r12, ...], [r21, r22, ...], ...]
91
+ where each rij is further a list of sentences
92
+ Case 4: Multi-Ref = True, tokenize_sentences = True
93
+ predictions: List[str] - List of predictions where each prediction is a document
94
+ references: List[List[str]] - List of multi-references i.e. [[r11, r12, ...], [r21, r22, ...], ...]
95
+ where each rij is a document
96
+
97
+ This can be seen in the form of truth table as follows:
98
+ Case | Multi-Ref | tokenize_sentences | predictions | references
99
+ 1 | 0 | 0 | List[List[str]] | List[List[str]]
100
+ 2 | 0 | 1 | List[str] | List[str]
101
+ 3 | 1 | 0 | List[List[str]] | List[List[List[str]]]
102
+ 4 | 1 | 1 | List[str] | List[List[str]]
103
+
104
+ It is automatically determined whether it is Multi-Ref case Single-Ref case.
105
+
106
  Examples:
107
 
108
  >>> import evaluate