Spaces:

nbansal
/

semf1

Running

App Files Files Community

nbansal commited on May 30

Commit

dfd7508

•

1 Parent(s): 4c522b3

Updated the Readme minor modifications in description section in semf1.py

Browse files

Files changed (2) hide show

README.md +30 -21
semf1.py +4 -3

README.md CHANGED Viewed

@@ -16,15 +16,15 @@ description: >-
   for more details.
 ---
-# Metric Card for SemF1
 ## Metric Description
-SEM-F1 metric leverages the pre-trained contextual embeddings and evaluates the model generated semantic overlap
 summary with the reference overlap summary. It evaluates the semantic overlap summary at the sentence level and
 computes precision, recall and F1 scores.
 ## How to Use
-SEM-F1 takes 2 mandatory arguments:
     `predictions`: (a list of system generated documents in the form of sentences i.e. List[List[str]]),
     `references`: (a list of ground-truth documents in the form of sentences i.e. List[List[str]])
@@ -42,32 +42,41 @@ metric = load("semf1")
 results = metric.compute(predictions=predictions, references=references)
 ```
-It also accepts multiple optional arguments:
-TODO: List optional arguments
-### Inputs
-TODO:
-*List all input arguments in the format below*
-- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 ### Output Values
-BERTScore outputs a dictionary with the following values:
-`precision`: The [precision](https://huggingface.co/metrics/precision) for each system summary, which ranges from 0.0 to 1.0.
-`recall`: The [recall](https://huggingface.co/metrics/recall) for each system summary, which ranges from 0.0 to 1.0.
-`f1`: The [F1 score](https://huggingface.co/metrics/f1) for each system summary, which ranges from 0.0 to 1.0.
-#### Values from Popular Papers
-*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
-### Examples
-*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
-## Limitations and Bias
-*Note any known limitations or biases that the metric has, with links and references if possible.*
 ## Citation
 ```bibtex
@@ -92,6 +101,6 @@ BERTScore outputs a dictionary with the following values:
 ```
 ## Further References
-TODO: Add links to the slides and video
  - [Paper](https://aclanthology.org/2022.emnlp-main.49/)
- - [Presentation Slides]()

   for more details.
 ---
+# Metric Card for Sem-F1
 ## Metric Description
+Sem-F1 metric leverages the pre-trained contextual embeddings and evaluates the model generated semantic overlap
 summary with the reference overlap summary. It evaluates the semantic overlap summary at the sentence level and
 computes precision, recall and F1 scores.
 ## How to Use
+Sem-F1 takes 2 mandatory arguments:
     `predictions`: (a list of system generated documents in the form of sentences i.e. List[List[str]]),
     `references`: (a list of ground-truth documents in the form of sentences i.e. List[List[str]])
 results = metric.compute(predictions=predictions, references=references)
 ```
+It also accepts another optional arguments:
+`model_type: Optional[str]`:
+The model to use for encoding the sentences.
+Options are:
+[`pv1`](https://huggingface.co/sentence-transformers/paraphrase-distilroberta-base-v1),
+[`stsb`](https://huggingface.co/sentence-transformers/stsb-roberta-large),
+[`use`](https://huggingface.co/sentence-transformers/use-cmlm-multilingual).
+The default value is `use`.
+[//]: # (### Inputs)
+[//]: # (*List all input arguments in the format below*)
+[//]: # (- **input_field** *&#40;type&#41;: Definition of input, with explanation if necessary. State any default value&#40;s&#41;.*)
 ### Output Values
+`precision`: The [precision](https://huggingface.co/metrics/precision) for each sentence from the `predictions` + `references` lists, which ranges from 0.0 to 1.0.
+`recall`: The [recall](https://huggingface.co/metrics/recall) for each sentence from the `predictions` + `references` lists, which ranges from 0.0 to 1.0.
+`f1`: The [F1 score](https://huggingface.co/metrics/f1) for each sentence from the `predictions` + `references` lists, which ranges from 0.0 to 1.0.
+[//]: # (#### Values from Popular Papers)
+[//]: # (*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*)
+[//]: # (### Examples)
+[//]: # (*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*)
+[//]: # (## Limitations and Bias)
+[//]: # (*Note any known limitations or biases that the metric has, with links and references if possible.*)
 ## Citation
 ```bibtex
 ```
 ## Further References
  - [Paper](https://aclanthology.org/2022.emnlp-main.49/)
+ - [Presentation Slides](https://auburn.box.com/s/rs5p7sttaonbvljnq0i5tk7xxw0vonn3)
+ - [Video](https://auburn.box.com/s/c1bmb8c0a2emc9xhnjfalvqo2100yxvf)

semf1.py CHANGED Viewed

@@ -66,8 +66,10 @@ Args:
             stsb - stsb-roberta-large
             use - Universal Sentence Encoder
 Returns:
-    accuracy: description of the first score,
-    another_score: description of the second score,
 Examples:
     >>> import evaluate
@@ -85,7 +87,6 @@ Examples:
     [0.77, 0.56]
 """
-[["I go to School.", "You are stupid."]]
 class Encoder(metaclass=abc.ABCMeta):
     @abc.abstractmethod

             stsb - stsb-roberta-large
             use - Universal Sentence Encoder
 Returns:
+    precision: Precision.
+    recall: Recall.
+    f1: F1 score.
 Examples:
     >>> import evaluate
     [0.77, 0.56]
 """
 class Encoder(metaclass=abc.ABCMeta):
     @abc.abstractmethod