lytang commited on
Commit
2649554
1 Parent(s): 91d8634

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -5
README.md CHANGED
@@ -48,9 +48,6 @@ fact-checking model, despite a small size.**
48
 
49
  Please first clone our [GitHub Repo](https://github.com/Liyan06/MiniCheck) and install necessary packages from `requirements.txt`.
50
 
51
- ### Throughput
52
-
53
- We speed up Llama-3.1-Bespoke-MiniCheck-7B inference with [vLLM](https://github.com/vllm-project/vllm). Based on our test on a single A6000 (48 VRAM), Llama-3.1-Bespoke-MiniCheck-7B with vLLM and MiniCheck-Flan-T5-Large have throughputs > 500 docs/min.
54
 
55
  ### Below is a simple use case
56
 
@@ -64,13 +61,34 @@ claim_2 = "The students are on vacation."
64
 
65
  # model_name can be one of:
66
  # ['roberta-large', 'deberta-v3-large', 'flan-t5-large', 'Bespoke-MiniCheck-7B']
67
- scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', cache_dir='./ckpts')
68
  pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])
69
 
70
  print(pred_label) # [1, 0]
71
  print(raw_prob) # [0.9840446675150499, 0.010986349594852094]
72
  ```
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ### Test on our [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact) Benchmark
75
 
76
  ```python
@@ -85,7 +103,7 @@ df = pd.DataFrame(load_dataset("lytang/LLM-AggreFact")['test'])
85
  docs = df.doc.values
86
  claims = df.claim.values
87
 
88
- scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', cache_dir='./ckpts')
89
  pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2]) # ~ 500 docs/min, depending on hardware
90
  ```
91
 
 
48
 
49
  Please first clone our [GitHub Repo](https://github.com/Liyan06/MiniCheck) and install necessary packages from `requirements.txt`.
50
 
 
 
 
51
 
52
  ### Below is a simple use case
53
 
 
61
 
62
  # model_name can be one of:
63
  # ['roberta-large', 'deberta-v3-large', 'flan-t5-large', 'Bespoke-MiniCheck-7B']
64
+ scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', enable_prefix_caching=False, cache_dir='./ckpts')
65
  pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2])
66
 
67
  print(pred_label) # [1, 0]
68
  print(raw_prob) # [0.9840446675150499, 0.010986349594852094]
69
  ```
70
 
71
+ ### Throughput
72
+
73
+ We speed up Llama-3.1-Bespoke-MiniCheck-7B inference with [vLLM](https://github.com/vllm-project/vllm). Based on our test on
74
+ a single A6000 (48 VRAM), Llama-3.1-Bespoke-MiniCheck-7B with vLLM and MiniCheck-Flan-T5-Large have throughputs > 500 docs/min.
75
+
76
+ ### Automatic Prefix Caching
77
+
78
+ > Automatic Prefix Caching (APC in short) caches the KV cache of existing queries, so that a new query can directly reuse the KV
79
+ > cache if it shares the same prefix with one of the existing queries, allowing the new query to skip the computation of the shared part.
80
+
81
+ To enable automatic prefix caching for `Bespoke-MiniCheck-7B`, simply set `enable_prefix_caching=True` when initializing the
82
+ MiniCheck model (no other changes are needed):
83
+
84
+ ```python
85
+ scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', enable_prefix_caching=True, cache_dir='./ckpts')
86
+ ```
87
+
88
+ How automatic prefix caching affects the throughput and model performance can be found in the [GitHub Repo](https://github.com/Liyan06/MiniCheck).
89
+
90
+
91
+
92
  ### Test on our [LLM-AggreFact](https://huggingface.co/datasets/lytang/LLM-AggreFact) Benchmark
93
 
94
  ```python
 
103
  docs = df.doc.values
104
  claims = df.claim.values
105
 
106
+ scorer = MiniCheck(model_name='Bespoke-MiniCheck-7B', enable_prefix_caching=False, cache_dir='./ckpts')
107
  pred_label, raw_prob, _, _ = scorer.score(docs=[doc, doc], claims=[claim_1, claim_2]) # ~ 500 docs/min, depending on hardware
108
  ```
109