Spaces:

davebulaval
/

meaningbert

Runtime error

App Files Files Community

davebulaval commited on Nov 22, 2023

Commit

9db8db7

1 Parent(s): 7cd4a72

initial metrics codebase

Browse files

Files changed (4) hide show

README.md +63 -2
app.py +5 -0
meaningbert.py +134 -0
requirements.txt +1 -0

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Meaningbert
 emoji: 🦀
 colorFrom: purple
 colorTo: indigo
@@ -9,4 +9,65 @@ app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: MeaningBERT
 emoji: 🦀
 colorFrom: purple
 colorTo: indigo
 pinned: false
 ---
+# Here is MeaningBERT
+MeaningBERT is an automatic and trainable metric for assessing meaning preservation between sentences. MeaningBERT was
+proposed in our
+article [MeaningBERT: assessing meaning preservation between sentences](https://www.frontiersin.org/articles/10.3389/frai.2023.1223924/full).
+Its goal is to assess meaning preservation between two sentences that correlate highly with human judgments and sanity
+checks. For more details, refer to our publicly available article.
+> This public version of our model uses the best model trained (where in our article, we present the performance results
+> of an average of 10 models) for a more extended period (1000 epochs instead of 250). We have observed later that the
+> model can further reduce dev loss and increase performance.
+## Sanity Check
+Correlation to human judgment is one way to evaluate the quality of a meaning preservation metric.
+However, it is inherently subjective, since it uses human judgment as a gold standard, and expensive, since it requires
+a large dataset
+annotated by several humans. As an alternative, we designed two automated tests: evaluating meaning preservation between
+identical sentences (which should be 100% preserving) and between unrelated sentences (which should be 0% preserving).
+In these tests, the meaning preservation target value is not subjective and does not require human annotation to
+measure. They represent a trivial and minimal threshold a good automatic meaning preservation metric should be able to
+achieve. Namely, a metric should be minimally able to return a perfect score (i.e., 100%) if two identical sentences are
+compared and return a null score (i.e., 0%) if two sentences are completely unrelated.
+### Identical sentences
+The first test evaluates meaning preservation between identical sentences. To analyze the metrics' capabilities to pass
+this test, we count the number of times a metric rating was greater or equal to a threshold value X∈[95, 99] and divide
+it by the number of sentences to create a ratio of the number of times the metric gives the expected rating. To account
+for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use a threshold value of
+100%.
+### Unrelated sentences
+Our second test evaluates meaning preservation between a source sentence and an unrelated sentence generated by a large
+language model.3 The idea is to verify that the metric finds a meaning preservation rating of 0 when given a completely
+irrelevant sentence mainly composed of irrelevant words (also known as word soup). Since this test's expected rating is
+0, we check that the metric rating is lower or equal to a threshold value X∈[5, 1].
+Again, to account for computer floating-point inaccuracy, we round the ratings to the nearest integer and do not use
+a threshold value of 0%.
+## Cite
+Use the following citation to cite MeaningBERT
+```
+@ARTICLE{10.3389/frai.2023.1223924,
+AUTHOR={Beauchemin, David and Saggion, Horacio and Khoury, Richard},
+TITLE={MeaningBERT: assessing meaning preservation between sentences},
+JOURNAL={Frontiers in Artificial Intelligence},
+VOLUME={6},
+YEAR={2023},
+URL={https://www.frontiersin.org/articles/10.3389/frai.2023.1223924},
+DOI={10.3389/frai.2023.1223924},
+ISSN={2624-8212},
+}
+```
+## License
+MeaningBERT is MIT licensed, as found in
+the [LICENSE file](https://github.com/GRAAL-Research/risc/blob/main/LICENSE).

app.py ADDED Viewed

	@@ -0,0 +1,5 @@

+import evaluate
+from evaluate.utils import launch_gradio_widget
+module = evaluate.load("meaningbert")
+launch_gradio_widget(module)

meaningbert.py ADDED Viewed

	@@ -0,0 +1,134 @@

+# Copyright 2020 The HuggingFace Evaluate Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+""" MeaningBERT metric. """
+from contextlib import contextmanager
+from typing import List, Dict
+import datasets
+import evaluate
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+@contextmanager
+def filter_logging_context():
+    def filter_log(record):
+        return False if "This IS expected if you are initializing" in record.msg else True
+    logger = datasets.utils.logging.get_logger("transformers.modeling_utils")
+    logger.addFilter(filter_log)
+    try:
+        yield
+    finally:
+        logger.removeFilter(filter_log)
+_CITATION = """\
+@ARTICLE{10.3389/frai.2023.1223924,
+AUTHOR={Beauchemin, David and Saggion, Horacio and Khoury, Richard},
+TITLE={MeaningBERT: assessing meaning preservation between sentences},
+JOURNAL={Frontiers in Artificial Intelligence},
+VOLUME={6},
+YEAR={2023},
+URL={https://www.frontiersin.org/articles/10.3389/frai.2023.1223924},
+DOI={10.3389/frai.2023.1223924},
+ISSN={2624-8212},
+}
+"""
+_DESCRIPTION = """\
+MeaningBERT is an automatic and trainable metric for assessing meaning preservation between sentences. MeaningBERT was
+proposed in our
+article [MeaningBERT: assessing meaning preservation between sentences](https://www.frontiersin.org/articles/10.3389/frai.2023.1223924/full).
+Its goal is to assess meaning preservation between two sentences that correlate highly with human judgments and sanity
+checks. For more details, refer to our publicly available article.
+See the project's README at https://github.com/GRAAL-Research/MeaningBERT for more information.
+"""
+_KWARGS_DESCRIPTION = """
+MeaningBERT metric for assessing meaning preservation between sentences.
+Args:
+    documents (list of str): Document sentences.
+    simplifications (list of str): Simplification sentences (same number of element as documents).
+    verbose (bool): Turn on intermediate status update.
+Returns:
+    score: the meaning score between two sentences in alist format respecting the order of the documents and
+    simplifications pairs.
+    hashcode: Hashcode of the library.
+Examples:
+    >>> documents = ["hello there", "general kenobi"]
+    >>> simplifications = ["hello there", "general kenobi"]
+    >>> meaning_bert = evaluate.load("meaningbert")
+    >>> results = meaning_bert.compute(documents=documents, simplifications=simplifications)
+"""
+_HASH = "21845c0cc85a2e8e16c89bb0053f489095cf64c5b19e9c3865d3e10047aba51b"
+@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
+class MeaningBERTScore(evaluate.Metric):
+    def _info(self):
+        return evaluate.MetricInfo(
+            description=_DESCRIPTION,
+            citation=_CITATION,
+            homepage="https://github.com/GRAAL-Research/MeaningBERT",
+            inputs_description=_KWARGS_DESCRIPTION,
+            features=[
+                datasets.Features(
+                    {
+                        "documents": datasets.Value("string", id="sequence"),
+                        "simplifications": datasets.Value("string", id="sequence"),
+                    }
+                )
+            ],
+            codebase_urls=["https://github.com/GRAAL-Research/MeaningBERT"],
+            reference_urls=[
+                "https://github.com/GRAAL-Research/MeaningBERT",
+                "https://www.frontiersin.org/articles/10.3389/frai.2023.1223924/full",
+            ],
+        )
+    def _compute(
+            self,
+            documents: List,
+            simplifications: List,
+            verbose: bool = False,
+    ) -> Dict:
+        assert len(documents) == len(
+            simplifications), "The number of document is different of the number of simplications."
+        hashcode = _HASH
+        # We load the MeaningBERT pretrained model
+        scorer = AutoModelForSequenceClassification.from_pretrained("davebulaval/MeaningBERT")
+        # We load MeaningBERT tokenizer
+        tokenizer = AutoTokenizer.from_pretrained("davebulaval/MeaningBERT")
+        # We tokenize the text as a pair and return Pytorch Tensors
+        tokenize_text = tokenizer(documents, simplifications, truncation=True, padding=True, return_tensors="pt")
+        with filter_logging_context():
+            # We process the text
+            scores = scorer(**tokenize_text)
+        output_dict = {
+            "scores": scores.logits.tolist(),
+            "hashcode": hashcode,
+        }
+        return output_dict

requirements.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ evaluate