Spaces:
Build error
Build error
Update docs.
Browse files- README.md +125 -17
- vendiscore.py +6 -4
README.md
CHANGED
@@ -5,7 +5,7 @@ datasets:
|
|
5 |
tags:
|
6 |
- evaluate
|
7 |
- metric
|
8 |
-
description: "
|
9 |
sdk: gradio
|
10 |
sdk_version: 3.0.2
|
11 |
app_file: app.py
|
@@ -14,37 +14,145 @@ pinned: false
|
|
14 |
|
15 |
# Metric Card for VendiScore
|
16 |
|
17 |
-
|
|
|
|
|
18 |
|
19 |
## Metric Description
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
## How to Use
|
23 |
-
|
|
|
|
|
24 |
|
25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
### Inputs
|
28 |
-
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
### Output Values
|
32 |
|
33 |
-
|
|
|
|
|
|
|
34 |
|
35 |
-
|
|
|
|
|
36 |
|
37 |
-
|
38 |
-
|
39 |
|
40 |
-
|
41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
## Limitations and Bias
|
44 |
-
|
45 |
|
46 |
## Citation
|
47 |
-
*Cite the source where this metric was introduced.*
|
48 |
|
49 |
-
## Further References
|
50 |
-
*Add any useful further references.*
|
|
|
5 |
tags:
|
6 |
- evaluate
|
7 |
- metric
|
8 |
+
description: "The Vendi Score is a metric for evaluating diversity in machine learning. See the project's README at https://github.com/vertaix/Vendi-Score for more information."
|
9 |
sdk: gradio
|
10 |
sdk_version: 3.0.2
|
11 |
app_file: app.py
|
|
|
14 |
|
15 |
# Metric Card for VendiScore
|
16 |
|
17 |
+
The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
|
18 |
+
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
|
19 |
+
See the project's README at https://github.com/vertaix/Vendi-Score for more information.
|
20 |
|
21 |
## Metric Description
|
22 |
+
The Vendi Score (VS) is a metric for evaluating diversity in machine learning.
|
23 |
+
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
|
24 |
+
Specifically, given a positive semi-definite matrix $K \in \mathbb{R}^{n \times n}$ of similarity scores, the score is defined as:
|
25 |
+
$$\mathrm{VS}(K) = \exp(-\mathrm{tr}(K/n \log K/n)) = \exp(-\sum_{i=1}^n \lambda_i \log \lambda_i),$$
|
26 |
+
where $\lambda_i$ are the eigenvalues of $K/n$ and $0 \log 0 = 0$.
|
27 |
+
That is, the Vendi Score is equal to the exponential of the von Neumann entropy of $K/n$, or the Shannon entropy of the eigenvalues, which is also known as the effective rank.
|
28 |
|
29 |
## How to Use
|
30 |
+
The Vendi Score is available as a Python package or in HuggingFace `evaluate`.
|
31 |
+
To use the Python package, see the instructions at https://github.com/vertaix/Vendi-Score.
|
32 |
+
To use the `evaluate` module, pass a list of samples and a similarity function or a string identifying a predefined class of similarity functions (see below).
|
33 |
|
34 |
+
```
|
35 |
+
>>> vendiscore = evaluate.load("danf0/vendiscore")
|
36 |
+
>>> samples = ["Look, Jane.",
|
37 |
+
"See Spot.",
|
38 |
+
"See Spot run.",
|
39 |
+
"Run, Spot, run.",
|
40 |
+
"Jane sees Spot run."]
|
41 |
+
>>> results = vendiscore.compute(samples, k="ngram_overlap", ns=[1, 2])
|
42 |
+
>>> print(results)
|
43 |
+
{'VS': 3.90657...}
|
44 |
+
```
|
45 |
|
46 |
### Inputs
|
47 |
+
- **samples**: an iterable containing $n$ samples to score; an n x n similarity
|
48 |
+
matrix K, or an n x d feature matrix X.
|
49 |
+
- **k**: a pairwise similarity function, or a string identifying a predefined
|
50 |
+
similarity function. If k is a pairwise similarity function, it should
|
51 |
+
be symmetric and k(x, x) = 1.
|
52 |
+
Options: ngram_overlap, text_embeddings, pixels, image_embeddings.
|
53 |
+
- **score_K**: if true, samples is an n x n similarity matrix K.
|
54 |
+
- **score_X**: if true, samples is an n x d feature matrix X.
|
55 |
+
- **score_dual**: if true, samples is an n x d feature matrix X and we will
|
56 |
+
compute the diversity score using the covariance matrix X @ X.T.
|
57 |
+
- **normalize**: if true, normalize the similarity scores.
|
58 |
+
- **model (optional)**: if k is "text_embeddings", a model mapping sentences to
|
59 |
+
embeddings (output should be an object with an attribute called
|
60 |
+
`pooler_output` or `last_hidden_state`). If k is "image_embeddings", a
|
61 |
+
model mapping images to embeddings.
|
62 |
+
- **tokenizer (optional)**: if k is "text_embeddings" or "ngram_overlap", a
|
63 |
+
tokenizer mapping strings to lists.
|
64 |
+
- **transform (optional)**: if k is "image_embeddings", a torchvision transform
|
65 |
+
to apply to the samples.
|
66 |
+
- **model_path (optional)**: if k is "text_embeddings", the name of a model on
|
67 |
+
the HuggingFace hub.
|
68 |
+
- **ns (optional)**: if k is "ngram_overlap", the values of n to calculate.
|
69 |
+
- **batch_size (optional)**: batch size to use if k is "text_embedding" or
|
70 |
+
"image_embedding".
|
71 |
+
- **device (optional)**: a string (e.g. "cuda", "cpu") or torch.device
|
72 |
+
identifying the device to use if k is "text_embedding"
|
73 |
+
or "image_embedding".
|
74 |
+
|
75 |
|
76 |
### Output Values
|
77 |
|
78 |
+
The output is a dictionary with one key, "VS".
|
79 |
+
Given n samples, the value of the Vendi Score ranges between 1 and n, with higher numbers indicating that the sample is more diverse.
|
80 |
+
|
81 |
+
### Examples
|
82 |
|
83 |
+
```python
|
84 |
+
import numpy as np
|
85 |
+
vendiscore = evaluate.load("danf0/vendiscore")
|
86 |
|
87 |
+
samples = [0, 0, 10, 10, 20, 20]
|
88 |
+
k = lambda a, b: np.exp(-np.abs(a - b))
|
89 |
|
90 |
+
vendiscore.compute(samples, k)
|
91 |
+
|
92 |
+
# 2.9999
|
93 |
+
```
|
94 |
+
|
95 |
+
If you already have precomputed a similarity matrix:
|
96 |
+
```python
|
97 |
+
K = np.array([[1.0, 0.9, 0.0],
|
98 |
+
[0.9, 1.0, 0.0],
|
99 |
+
[0.0, 0.0, 1.0]])
|
100 |
+
vendiscore.compute(K, score_K=True)
|
101 |
+
|
102 |
+
# 2.1573
|
103 |
+
```
|
104 |
+
|
105 |
+
If your similarity function is a dot product between normalized
|
106 |
+
embeddings $X\in\mathbb{R}^{n\times d}$, and $d < n$, it is faster
|
107 |
+
to compute the Vendi Score using the covariance matrix,
|
108 |
+
$\frac{1}{n} \sum_i x_i x_i^{\top}$:
|
109 |
+
```python
|
110 |
+
vendiscore.compute(X, score_dual=True)
|
111 |
+
```
|
112 |
+
If the rows of $X$ are not normalized, set `normalize = True`.
|
113 |
+
|
114 |
+
Images:
|
115 |
+
```python
|
116 |
+
from torchvision import datasets
|
117 |
+
|
118 |
+
mnist = datasets.MNIST("data/mnist", train=False, download=True)
|
119 |
+
digits = [[x for x, y in mnist if y == c] for c in range(10)]
|
120 |
+
pixel_vs = [vendiscore.compute(imgs, k="pixels") for imgs in digits]
|
121 |
+
# The default embeddings are from the pool-2048 layer of the torchvision
|
122 |
+
# Inception v3 model.
|
123 |
+
inception_vs = [vendiscore.compute(imgs, k="image_embeddings", batch_size=64, device="cuda") for imgs in digits]
|
124 |
+
for y, (pvs, ivs) in enumerate(zip(pixel_vs, inception_vs)): print(f"{y}\t{pvs:.02f}\t{ivs:02f}")
|
125 |
+
|
126 |
+
# Output:
|
127 |
+
# 0 7.68 3.45
|
128 |
+
# 1 5.31 3.50
|
129 |
+
# 2 12.18 3.62
|
130 |
+
# 3 9.97 2.97
|
131 |
+
# 4 11.10 3.75
|
132 |
+
# 5 13.51 3.16
|
133 |
+
# 6 9.06 3.63
|
134 |
+
# 7 9.58 4.07
|
135 |
+
# 8 9.69 3.74
|
136 |
+
# 9 8.56 3.43
|
137 |
+
```
|
138 |
+
|
139 |
+
Text:
|
140 |
+
```python
|
141 |
+
sents = ["Look, Jane.",
|
142 |
+
"See Spot.",
|
143 |
+
"See Spot run.",
|
144 |
+
"Run, Spot, run.",
|
145 |
+
"Jane sees Spot run."]
|
146 |
+
ngram_vs = vendiscore.compute(sents, k="ngram_overlap", ns=[1, 2])
|
147 |
+
bert_vs = vendiscore.compute(sents, k="text_embeddings", model_path="bert-base-uncased")
|
148 |
+
simcse_vs = vendiscore.compute(sents, k="text_embeddings", model_path="princeton-nlp/unsup-simcse-bert-base-uncased")
|
149 |
+
print(f"N-grams: {ngram_vs:.02f}, BERT: {bert_vs:.02f}, SimCSE: {simcse_vs:.02f})
|
150 |
+
|
151 |
+
# N-grams: 3.91, BERT: 1.21, SimCSE: 2.81
|
152 |
+
```
|
153 |
|
154 |
## Limitations and Bias
|
155 |
+
The Vendi Score depends on the choice of similarity function. Care should be taken to select a similarity function that reflects the features that are relevant for defining diversity in a given application.
|
156 |
|
157 |
## Citation
|
|
|
158 |
|
|
|
|
vendiscore.py
CHANGED
@@ -22,15 +22,17 @@ from vendi_score import vendi, image_utils, text_utils
|
|
22 |
# TODO: Add BibTeX citation
|
23 |
_CITATION = ""
|
24 |
_DESCRIPTION = """\
|
25 |
-
|
|
|
|
|
26 |
"""
|
27 |
|
28 |
|
29 |
_KWARGS_DESCRIPTION = """
|
30 |
Calculates the Vendi Score given samples and a similarity function.
|
31 |
Args:
|
32 |
-
samples:
|
33 |
-
an n x d feature matrix X.
|
34 |
k: a pairwise similarity function, or a string identifying a predefined
|
35 |
similarity function.
|
36 |
Options: ngram_overlap, text_embeddings, pixels, image_embeddings.
|
@@ -56,7 +58,7 @@ Args:
|
|
56 |
Returns:
|
57 |
VS: The Vendi Score.
|
58 |
Examples:
|
59 |
-
>>> vendi_score = evaluate.load("
|
60 |
>>> samples = ["Look, Jane.",
|
61 |
"See Spot.",
|
62 |
"See Spot run.",
|
|
|
22 |
# TODO: Add BibTeX citation
|
23 |
_CITATION = ""
|
24 |
_DESCRIPTION = """\
|
25 |
+
The Vendi Score is a metric for evaluating diversity in machine learning.
|
26 |
+
The input to metric is a collection of samples and a pairwise similarity function, and the output is a number, which can be interpreted as the effective number of unique elements in the sample.
|
27 |
+
See the project's README at https://github.com/vertaix/Vendi-Score for more information.
|
28 |
"""
|
29 |
|
30 |
|
31 |
_KWARGS_DESCRIPTION = """
|
32 |
Calculates the Vendi Score given samples and a similarity function.
|
33 |
Args:
|
34 |
+
samples: an iterable containing n samples to score, an n x n similarity
|
35 |
+
matrix K, or an n x d feature matrix X.
|
36 |
k: a pairwise similarity function, or a string identifying a predefined
|
37 |
similarity function.
|
38 |
Options: ngram_overlap, text_embeddings, pixels, image_embeddings.
|
|
|
58 |
Returns:
|
59 |
VS: The Vendi Score.
|
60 |
Examples:
|
61 |
+
>>> vendi_score = evaluate.load("danf0/vendiscore")
|
62 |
>>> samples = ["Look, Jane.",
|
63 |
"See Spot.",
|
64 |
"See Spot run.",
|