Update README.md
Browse files
README.md
CHANGED
@@ -84,3 +84,47 @@ Next, we evaluate the model in a cross-language context, with queries in French
|
|
84 |
As observed, the cross-language context does not significantly impact the behavior of our models. If the model is used in a reranking context along with filtering of the
|
85 |
Top-K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts
|
86 |
for RAG-type applications.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
84 |
As observed, the cross-language context does not significantly impact the behavior of our models. If the model is used in a reranking context along with filtering of the
|
85 |
Top-K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts
|
86 |
for RAG-type applications.
|
87 |
+
|
88 |
+
How to Use Bloomz-3b-reranking
|
89 |
+
------------------------------
|
90 |
+
|
91 |
+
The following example utilizes the API Pipeline of the Transformers library.
|
92 |
+
|
93 |
+
```python
|
94 |
+
import numpy as np
|
95 |
+
from transformers import pipeline
|
96 |
+
from scipy.spatial.distance import cdist
|
97 |
+
|
98 |
+
retriever = pipeline('feature-extraction', 'cmarkea/bloomz-3b-retriever')
|
99 |
+
|
100 |
+
# Inportant: take only last token!
|
101 |
+
infer = lambda x: [ii[0][-1] for ii in retriever(x)]
|
102 |
+
|
103 |
+
list_of_contexts = [...]
|
104 |
+
emb_contexts = np.concatenate(infer(list_of_contexts), axis=0)
|
105 |
+
list_of_queries = [...]
|
106 |
+
emb_queries = np.concatenate(infer(list_of_queries), axis=0)
|
107 |
+
|
108 |
+
# Important: take l2 distance!
|
109 |
+
dist = cdist(emb_queries, emb_contexts, 'euclidean')
|
110 |
+
top_k = lambda x: [
|
111 |
+
[list_of_contexts[qq] for qq in ii]
|
112 |
+
for ii in dist.argsort(axis=-1)[:,:x]
|
113 |
+
]
|
114 |
+
|
115 |
+
# top 5 nearest contexts for each queries
|
116 |
+
top_contexts = top_k(5)
|
117 |
+
```
|
118 |
+
|
119 |
+
Citation
|
120 |
+
--------
|
121 |
+
|
122 |
+
```bibtex
|
123 |
+
@online{DeBloomzReranking,
|
124 |
+
AUTHOR = {Cyrile Delestre},
|
125 |
+
ORGANIZATION = {Cr{\'e}dit Mutuel Ark{\'e}a},
|
126 |
+
URL = {https://huggingface.co/cmarkea/bloomz-3b-reranking},
|
127 |
+
YEAR = {2024},
|
128 |
+
KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz},
|
129 |
+
}
|
130 |
+
```
|