Update README.md
Browse files
README.md
CHANGED
@@ -48,9 +48,6 @@ fact-checking model, despite a small size.**
|
|
48 |
|
49 |
Please first clone our [GitHub Repo](https://github.com/Liyan06/MiniCheck) and install necessary packages from `requirements.txt`.
|
50 |
|
51 |
-
### Throughput
|
52 |
-
|
53 |
-
We speed up Llama-3.1-Bespoke-MiniCheck-7B inference with [vLLM](https://github.com/vllm-project/vllm). Based on our test on a single A6000 (48 VRAM), Llama-3.1-Bespoke-MiniCheck-7B with vLLM and MiniCheck-Flan-T5-Large have throughputs > 500 docs/min.
|
54 |
|
55 |
### Below is a simple use case
|
56 |
|
@@ -64,13 +61,34 @@ claim_2 = "The students are on vacation."
|
|
64 |
|
65 |
# model_name can be one of:
|
66 |
# ['roberta-large', 'deberta-v3-large', 'flan-t5-large', 'Bespoke-MiniCheck-7B']
|
67 |
-
scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', cache_dir='./ckpts')
|
68 |
pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])
|
69 |
|
70 |
print(pred_label) # [1, 0]
|
71 |
print(raw_prob) # [0.9840446675150499, 0.010986349594852094]
|
72 |
```
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
### Test on our [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact) Benchmark
|
75 |
|
76 |
```python
|
@@ -85,7 +103,7 @@ df = pd.DataFrame(load_dataset("lytang/LLM-AggreFact")['test'])
|
|
85 |
docs = df.doc.values
|
86 |
claims = df.claim.values
|
87 |
|
88 |
-
scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', cache_dir='./ckpts')
|
89 |
pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2]) # ~ 500 docs/min, depending on hardware
|
90 |
```
|
91 |
|
|
|
48 |
|
49 |
Please first clone our [GitHub Repo](https://github.com/Liyan06/MiniCheck) and install necessary packages from `requirements.txt`.
|
50 |
|
|
|
|
|
|
|
51 |
|
52 |
### Below is a simple use case
|
53 |
|
|
|
61 |
|
62 |
# model_name can be one of:
|
63 |
# ['roberta-large', 'deberta-v3-large', 'flan-t5-large', 'Bespoke-MiniCheck-7B']
|
64 |
+
scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', enable_prefix_caching=False, cache_dir='./ckpts')
|
65 |
pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])
|
66 |
|
67 |
print(pred_label) # [1, 0]
|
68 |
print(raw_prob) # [0.9840446675150499, 0.010986349594852094]
|
69 |
```
|
70 |
|
71 |
+
### Throughput
|
72 |
+
|
73 |
+
We speed up Llama-3.1-Bespoke-MiniCheck-7B inference with [vLLM](https://github.com/vllm-project/vllm). Based on our test on
|
74 |
+
a single A6000 (48 VRAM), Llama-3.1-Bespoke-MiniCheck-7B with vLLM and MiniCheck-Flan-T5-Large have throughputs > 500 docs/min.
|
75 |
+
|
76 |
+
### Automatic Prefix Caching
|
77 |
+
|
78 |
+
> Automatic Prefix Caching (APC in short) caches the KV cache of existing queries, so that a new query can directly reuse the KV
|
79 |
+
> cache if it shares the same prefix with one of the existing queries, allowing the new query to skip the computation of the shared part.
|
80 |
+
|
81 |
+
To enable automatic prefix caching for `Bespoke-MiniCheck-7B`, simply set `enable_prefix_caching=True` when initializing the
|
82 |
+
MiniCheck model (no other changes are needed):
|
83 |
+
|
84 |
+
```python
|
85 |
+
scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', enable_prefix_caching=True, cache_dir='./ckpts')
|
86 |
+
```
|
87 |
+
|
88 |
+
How automatic prefix caching affects the throughput and model performance can be found in the [GitHub Repo](https://github.com/Liyan06/MiniCheck).
|
89 |
+
|
90 |
+
|
91 |
+
|
92 |
### Test on our [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact) Benchmark
|
93 |
|
94 |
```python
|
|
|
103 |
docs = df.doc.values
|
104 |
claims = df.claim.values
|
105 |
|
106 |
+
scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', enable_prefix_caching=False, cache_dir='./ckpts')
|
107 |
pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2]) # ~ 500 docs/min, depending on hardware
|
108 |
```
|
109 |
|