bespokelabs
/

Bespoke-MiniCheck-7B

Text Classification

Safetensors

English

internlm2

custom_code

Model card Files Files and versions Community

lytang commited on Aug 12, 2024

Commit

2649554

verified ·

1 Parent(s): 91d8634

Update README.md

Browse files

Files changed (1) hide show

README.md +23 -5

README.md CHANGED Viewed

@@ -48,9 +48,6 @@ fact-checking model, despite a small size.**
 Please first clone our [GitHub Repo](https://github.com/Liyan06/MiniCheck) and install necessary packages from `requirements.txt`.
-### Throughput
-We speed up Llama-3.1-Bespoke-MiniCheck-7B inference with [vLLM](https://github.com/vllm-project/vllm). Based on our test on a single A6000 (48 VRAM), Llama-3.1-Bespoke-MiniCheck-7B with vLLM and MiniCheck-Flan-T5-Large have throughputs > 500 docs/min.
 ### Below is a simple use case
@@ -64,13 +61,34 @@ claim_2 = "The students are on vacation."
 # model_name can be one of:
 # ['roberta-large', 'deberta-v3-large', 'flan-t5-large', 'Bespoke-MiniCheck-7B']
-scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', cache_dir='./ckpts')
 pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])
 print(pred_label) # [1, 0]
 print(raw_prob)   # [0.9840446675150499, 0.010986349594852094]
 ```
 ### Test on our [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact) Benchmark
 ```python
@@ -85,7 +103,7 @@ df = pd.DataFrame(load_dataset("lytang/LLM-AggreFact")['test'])
 docs = df.doc.values
 claims = df.claim.values
-scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', cache_dir='./ckpts')
 pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])  # ~ 500 docs/min, depending on hardware
 ```

 Please first clone our [GitHub Repo](https://github.com/Liyan06/MiniCheck) and install necessary packages from `requirements.txt`.
 ### Below is a simple use case
 # model_name can be one of:
 # ['roberta-large', 'deberta-v3-large', 'flan-t5-large', 'Bespoke-MiniCheck-7B']
+scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', enable_prefix_caching=False, cache_dir='./ckpts')
 pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])
 print(pred_label) # [1, 0]
 print(raw_prob)   # [0.9840446675150499, 0.010986349594852094]
 ```
+### Throughput
+We speed up Llama-3.1-Bespoke-MiniCheck-7B inference with [vLLM](https://github.com/vllm-project/vllm). Based on our test on
+a single A6000 (48 VRAM), Llama-3.1-Bespoke-MiniCheck-7B with vLLM and MiniCheck-Flan-T5-Large have throughputs > 500 docs/min.
+### Automatic Prefix Caching
+> Automatic Prefix Caching (APC in short) caches the KV cache of existing queries, so that a new query can directly reuse the KV
+> cache if it shares the same prefix with one of the existing queries, allowing the new query to skip the computation of the shared part.
+To enable automatic prefix caching for `Bespoke-MiniCheck-7B`, simply set `enable_prefix_caching=True` when initializing the
+MiniCheck model (no other changes are needed):
+```python
+scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', enable_prefix_caching=True, cache_dir='./ckpts')
+```
+How automatic prefix caching affects the throughput and model performance can be found in the [GitHub Repo](https://github.com/Liyan06/MiniCheck).
 ### Test on our [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact) Benchmark
 ```python
 docs = df.doc.values
 claims = df.claim.values
+scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', enable_prefix_caching=False, cache_dir='./ckpts')
 pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])  # ~ 500 docs/min, depending on hardware
 ```