Spaces:

Vertaix
/

vendiscore

Build error

App Files Files Community

danf0 commited on Aug 30, 2022

Commit

b2dcfc6

1 Parent(s): f41a5ce

Update docs.

Browse files

Files changed (2) hide show

README.md +125 -17
vendiscore.py +6 -4

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ datasets:
 tags:
 - evaluate
 - metric
-description: "TODO: add a description here"
 sdk: gradio
 sdk_version: 3.0.2
 app_file: app.py
@@ -14,37 +14,145 @@ pinned: false
 # Metric Card for VendiScore
-***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
 ## Metric Description
-*Give a brief overview of this metric, including what task(s) it is usually used for, if any.*
 ## How to Use
-*Give general statement of how to use the metric*
-*Provide simplest possible example for using the metric*
 ### Inputs
-*List all input arguments in the format below*
-- **input_field** *(type): Definition of input, with explanation if necessary. State any default value(s).*
 ### Output Values
-*Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
-*State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
-#### Values from Popular Papers
-*Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
-### Examples
-*Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
 ## Limitations and Bias
-*Note any known limitations or biases that the metric has, with links and references if possible.*
 ## Citation
-*Cite the source where this metric was introduced.*
-## Further References
-*Add any useful further references.*

 tags:
 - evaluate
 - metric
+description: "The Vendi Score is a metric for evaluating diversity in machine learning. See the project's README at https://github.com/vertaix/Vendi-Score for more information."
 sdk: gradio
 sdk_version: 3.0.2
 app_file: app.py
 # Metric Card for VendiScore
+The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
+The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
+See the project's README at https://github.com/vertaix/Vendi-Score for more information.
 ## Metric Description
+The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
+The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
+Specifically, given a positive semi-definite matrix $K \in \mathbb{R}^{n \times n}$ of similarity scores, the score is defined as:
+$$\mathrm{VS}(K) = \exp(-\mathrm{tr}(K/n \log K/n)) = \exp(-\sum_{i=1}^n \lambda_i \log \lambda_i),$$
+where $\lambda_i$ are the eigenvalues of $K/n$ and $0 \log 0 = 0$.
+That is, the Vendi Score is equal to the exponential of the von Neumann entropy of $K/n$, or the Shannon entropy of the eigenvalues, which is also known as the effective rank.
 ## How to Use
+The Vendi Score is available as a Python package or in HuggingFace `evaluate`.
+To use the Python package, see the instructions at https://github.com/vertaix/Vendi-Score.
+To use the `evaluate` module, pass a list of samples and a similarity function or a string identifying a predefined class of similarity functions (see below).
+```
+>>> vendiscore = evaluate.load("danf0/vendiscore")
+>>> samples = ["Look, Jane.",
+               "See Spot.",
+               "See Spot run.",
+               "Run, Spot, run.",
+	       "Jane sees Spot run."]
+>>> results = vendiscore.compute(samples, k="ngram_overlap", ns=[1, 2])
+>>> print(results)
+{'VS': 3.90657...}
+```
 ### Inputs
+- **samples**: an iterable containing $n$ samples to score; an n x n similarity
+       matrix K, or an n x d feature matrix X.
+- **k**: a pairwise similarity function, or a string identifying a predefined
+       similarity function. If k is a pairwise similarity function, it should
+       be symmetric and k(x, x) = 1.
+       Options: ngram_overlap, text_embeddings, pixels, image_embeddings.
+- **score_K**: if true, samples is an n x n similarity matrix K.
+- **score_X**: if true, samples is an n x d feature matrix X.
+- **score_dual**: if true,  samples is an n x d feature matrix X and we will
+       compute the diversity score using the covariance matrix X @ X.T.
+- **normalize**: if true, normalize the similarity scores.
+- **model (optional)**: if k is "text_embeddings", a model mapping sentences to
+       embeddings (output should be an object with an attribute called
+       `pooler_output` or `last_hidden_state`). If k is "image_embeddings", a
+       model mapping images to embeddings.
+- **tokenizer (optional)**: if k is "text_embeddings" or "ngram_overlap", a
+       tokenizer mapping strings to lists.
+- **transform (optional)**: if k is "image_embeddings", a torchvision transform
+       to apply to the samples.
+- **model_path (optional)**: if k is "text_embeddings", the name of a model on
+       the HuggingFace hub.
+- **ns (optional)**: if k is "ngram_overlap", the values of n to calculate.
+- **batch_size (optional)**: batch size to use if k is "text_embedding" or
+       "image_embedding".
+- **device (optional)**: a string (e.g. "cuda", "cpu") or torch.device
+       identifying the device to use if k is "text_embedding"
+       or "image_embedding".
 ### Output Values
+The output is a dictionary with one key, "VS".
+Given n samples, the value of the Vendi Score ranges between 1 and n, with higher numbers indicating that the sample is more diverse.
+### Examples
+```python
+import numpy as np
+vendiscore = evaluate.load("danf0/vendiscore")
+samples = [0, 0, 10, 10, 20, 20]
+k = lambda a, b: np.exp(-np.abs(a - b))
+vendiscore.compute(samples, k)
+# 2.9999
+```
+If you already have precomputed a similarity matrix:
+```python
+K = np.array([[1.0, 0.9, 0.0],
+              [0.9, 1.0, 0.0],
+              [0.0, 0.0, 1.0]])
+vendiscore.compute(K, score_K=True)
+# 2.1573
+```
+If your similarity function is a dot product between normalized
+embeddings $X\in\mathbb{R}^{n\times d}$, and $d < n$, it is faster
+to compute the Vendi Score using the covariance matrix,
+$\frac{1}{n} \sum_i x_i x_i^{\top}$:
+```python
+vendiscore.compute(X, score_dual=True)
+```
+If the rows of $X$ are not normalized, set `normalize = True`.
+Images:
+```python
+from torchvision import datasets
+mnist = datasets.MNIST("data/mnist", train=False, download=True)
+digits = [[x for x, y in mnist if y == c] for c in range(10)]
+pixel_vs = [vendiscore.compute(imgs, k="pixels") for imgs in digits]
+# The default embeddings are from the pool-2048 layer of the torchvision
+# Inception v3 model.
+inception_vs = [vendiscore.compute(imgs, k="image_embeddings", batch_size=64, device="cuda") for imgs in digits]
+for y, (pvs, ivs) in enumerate(zip(pixel_vs, inception_vs)): print(f"{y}\t{pvs:.02f}\t{ivs:02f}")
+# Output:
+# 0       7.68    3.45
+# 1       5.31    3.50
+# 2       12.18   3.62
+# 3       9.97    2.97
+# 4       11.10   3.75
+# 5       13.51   3.16
+# 6       9.06    3.63
+# 7       9.58    4.07
+# 8       9.69    3.74
+# 9       8.56    3.43
+```
+Text:
+```python
+sents = ["Look, Jane.",
+         "See Spot.",
+         "See Spot run.",
+         "Run, Spot, run.",
+	 "Jane sees Spot run."]
+ngram_vs = vendiscore.compute(sents, k="ngram_overlap", ns=[1, 2])
+bert_vs = vendiscore.compute(sents, k="text_embeddings", model_path="bert-base-uncased")
+simcse_vs = vendiscore.compute(sents, k="text_embeddings", model_path="princeton-nlp/unsup-simcse-bert-base-uncased")
+print(f"N-grams: {ngram_vs:.02f}, BERT: {bert_vs:.02f}, SimCSE: {simcse_vs:.02f})
+# N-grams: 3.91, BERT: 1.21, SimCSE: 2.81
+```
 ## Limitations and Bias
+The Vendi Score depends on the choice of similarity function. Care should be taken to select a similarity function that reflects the features that are relevant for defining diversity in a given application.
 ## Citation

vendiscore.py CHANGED Viewed

@@ -22,15 +22,17 @@ from vendi_score import vendi, image_utils, text_utils
 # TODO: Add BibTeX citation
 _CITATION = ""
 _DESCRIPTION = """\
-A diversity evaluation metric for machine learning.
 """
 _KWARGS_DESCRIPTION = """
 Calculates the Vendi Score given samples and a similarity function.
 Args:
-   samples: list of n sentences to score, an n x n similarity matrix K, or
-       an n x d feature matrix X.
    k: a pairwise similarity function, or a string identifying a predefined
        similarity function.
        Options: ngram_overlap, text_embeddings, pixels, image_embeddings.
@@ -56,7 +58,7 @@ Args:
 Returns:
     VS: The Vendi Score.
 Examples:
-    >>> vendi_score = evaluate.load("vendi_score")
     >>> samples = ["Look, Jane.",
                    "See Spot.",
                    "See Spot run.",

 # TODO: Add BibTeX citation
 _CITATION = ""
 _DESCRIPTION = """\
+The Vendi Score is a metric for evaluating diversity in machine learning.
+The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
+See the project's README at https://github.com/vertaix/Vendi-Score for more information.
 """
 _KWARGS_DESCRIPTION = """
 Calculates the Vendi Score given samples and a similarity function.
 Args:
+   samples: an iterable containing n samples to score, an n x n similarity
+       matrix K, or an n x d feature matrix X.
    k: a pairwise similarity function, or a string identifying a predefined
        similarity function.
        Options: ngram_overlap, text_embeddings, pixels, image_embeddings.
 Returns:
     VS: The Vendi Score.
 Examples:
+    >>> vendi_score = evaluate.load("danf0/vendiscore")
     >>> samples = ["Look, Jane.",
                    "See Spot.",
                    "See Spot run.",