|
--- |
|
license: bigscience-bloom-rail-1.0 |
|
datasets: |
|
- unicamp-dl/mmarco |
|
- rajpurkar/squad |
|
language: |
|
- fr |
|
- en |
|
pipeline_tag: sentence-similarity |
|
--- |
|
|
|
# Bloomz-3b Reranking |
|
|
|
This reranking model is built from [cmarkea/bloomz-3b-dpo-chat](https://huggingface.co/cmarkea/bloomz-3b-dpo-chat) model and aims to measure the semantic correspondence between |
|
a question (query) and a context. With its normalized scoring, it helps to filter the query/context matchings outputted by a retriever in an ODQA (Open-Domain Question Answering)context. |
|
Moreover, it allows to reorder the results using a more efficient modeling approach than the retriever one. However, this modeling type is not conducive to direct |
|
database searching due to its high computational cost. |
|
|
|
Developed to be language-agnostic, this model supports both French and English. Consequently, it can effectively score in a cross-language context without being |
|
influenced by its behavior in a monolingual context (English or French). |
|
|
|
## Dataset |
|
The training dataset is composed of the [mMARCO dataset](https://huggingface.co/datasets/unicamp-dl/mmarco), consisting of query/positive/hard negative triplets. Additionally, |
|
we have included [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) data from the "train" split, forming query/positive/hard negative triplets. In order to generate hard |
|
negative data for SQuAD, we considered contexts from the same theme as the query but from a different set of queries. Hence, the negative observations belong to the same |
|
themes as the queries but presumably do not contain the answer to the question. |
|
|
|
Finally, the triplets are flattened to obtain pairs of query/context sentences with a label 1 if query/positive and a label 0 if query/negative. In each element of the |
|
pair (query and context), the language, French or English, is randomly and uniformly chosen. |
|
|
|
## Evaluation |
|
|
|
To assess the performance of the reranker, we will make use of the "validation" split of the [SQuAD](https://huggingface.co/datasets/rajpurkar/squad) dataset. We will select |
|
the first question from each paragraph, along with the paragraph constituting the context that should be ranked Top-1 for an Oracle modeling. What's intriguing is that |
|
the number of themes is limited, and each context from a corresponding theme that does not match the query is considered as a hard negative (other contexts outside the theme are |
|
simple negatives). Thus, we can construct the following table, with each theme showing the number of contexts and associated query: |
|
|
|
| Theme name | Context number | |
|
|---------------------------------------------:|:---------------| |
|
| Normans | 39 | |
|
| Computational_complexity_theory | 48 | |
|
| Southern_California | 39 | |
|
| Sky_(United_Kingdom) | 22 | |
|
| Victoria_(Australia) | 25 | |
|
| Huguenot | 44 | |
|
| Steam_engine | 46 | |
|
| Oxygen | 43 | |
|
| 1973_oil_crisis | 24 | |
|
| European_Union_law | 40 | |
|
| Amazon_rainforest | 21 | |
|
| Ctenophora | 31 | |
|
| Fresno,_California | 28 | |
|
| Packet_switching | 23 | |
|
| Black_Death | 23 | |
|
| Geology | 25 | |
|
| Pharmacy | 26 | |
|
| Civil_disobedience | 26 | |
|
| Construction | 22 | |
|
| Private_school | 26 | |
|
| Harvard_University | 30 | |
|
| Jacksonville,_Florida | 21 | |
|
| Economic_inequality | 44 | |
|
| University_of_Chicago | 37 | |
|
| Yuan_dynasty | 47 | |
|
| Immune_system | 49 | |
|
| Intergovernmental_Panel_on_Climate_Change | 24 | |
|
| Prime_number | 31 | |
|
| Rhine | 44 | |
|
| Scottish_Parliament | 39 | |
|
| Islamism | 39 | |
|
| Imperialism | 39 | |
|
| Warsaw | 49 | |
|
| French_and_Indian_War | 46 | |
|
| Force | 44 | |
|
|
|
The evaluation corpus consists of 1204 pairs of query/context to be ranked. |
|
|
|
Firstly, the evaluation scores were computed in cases where both the query and the context are in the same language (French/French). |
|
|
|
| Model (French/French) | Top-mean | Top-std | Top-1 (%) | Top-10 (%) | Top-100 (%) | MRR (x100) | mean score Top | std score Top | |
|
|:-----------------------------:|:----------:|:---------:|:---------:|:----------:|:-----------:|:----------:|:----------------:|:---------------:| |
|
| BM25 | 14.47 | 92.19 | 69.77 | 92.03 | 98.09 | 77.74 | NA | NA | |
|
| [CamemBERT](https://huggingface.co/antoinelouis/crossencoder-camembert-base-mmarcoFR) | 5.72 | 36.88 | 69.35 | 95.51 | 98.92 | 79.51 | 0.83 | 0.37 | |
|
| [DistilCamemBERT](https://huggingface.co/antoinelouis/crossencoder-distilcamembert-mmarcoFR) | 5.54 | 25.90 | 66.11 | 92.77 | 99.17 | 76.00 | 0.80 | 0.39 | |
|
| [mMiniLMv2-L12](https://huggingface.co/antoinelouis/crossencoder-mMiniLMv2-L12-mmarcoFR) | 4.43 | 30.27 | 71.51 | 95.68 | 99.42 | 80.17 | 0.78 | 0.38 | |
|
| [RoBERTa (multilingual)](https://huggingface.co/abbasgolestani/ag-nli-DeTS-sentence-similarity-v2) | 15.13 | 60.39 | 57.23 | 83.87 | 96.18 | 66.21 | 0.53 | 0.11 | |
|
| [cmarkea/bloomz-560m-reranking](https://huggingface.co/cmarkea/bloomz-560m-reranking) | 1.49 | 2.58 | 83.55 | 99.17 | 100 | 89.98 | 0.93 | 0.15 | |
|
| [cmarkea/bloomz-3b-reranking](https://huggingface.co/cmarkea/bloomz-3b-reranking) | 1.22 | 1.06 | 89.37 | 99.75 | 100 | 93.79 | 0.94 | 0.10 | |
|
|
|
|
|
Then, we evaluated the model in a cross-language context, with queries in French and contexts in English. |
|
|
|
| Model (French/English) | Top-mean | Top-std | Top-1 (%) | Top-10 (%) | Top-100 (%) | MRR (x100) | mean score Top | std score Top | |
|
|:-----------------------------:|:----------:|:---------:|:---------:|:----------:|:-----------:|:----------:|:----------------:|:---------------:| |
|
| BM25 | 288.04 | 371.46 | 21.93 | 41.93 | 55.15 | 28.41 | NA | NA | |
|
| [CamemBERT](https://huggingface.co/antoinelouis/crossencoder-camembert-base-mmarcoFR) | 12.20 | 61.39 | 59.55 | 89.71 | 97.42 | 70.38 | 0.65 | 0.47 | |
|
| [DistilCamemBERT](https://huggingface.co/antoinelouis/crossencoder-distilcamembert-mmarcoFR) | 40.97 | 104.78 | 25.66 | 64.78 | 88.62 | 38.83 | 0.53 | 0.49 | |
|
| [mMiniLMv2-L12](https://huggingface.co/antoinelouis/crossencoder-mMiniLMv2-L12-mmarcoFR) | 6.91 | 32.16 | 59.88 | 89.95 | 99.09 | 70.39 | 0.61 | 0.46 | |
|
| [RoBERTa (multilingual)](https://huggingface.co/abbasgolestani/ag-nli-DeTS-sentence-similarity-v2) | 79.32 | 153.62 | 27.91 | 49.50 | 78.16 | 35.41 | 0.40 | 0.12 | |
|
| [cmarkea/bloomz-560m-reranking](https://huggingface.co/cmarkea/bloomz-560m-reranking) | 1.51 | 1.92 | 81.89 | 99.09 | 100 | 88.64 | 0.92 | 0.15 | |
|
| [cmarkea/bloomz-3b-reranking](https://huggingface.co/cmarkea/bloomz-3b-reranking) | 1.22 | 0.98 | 89.20 | 99.84 | 100 | 93.63 | 0.94 | 0.10 | |
|
|
|
As observed, the cross-language context does not significantly impact the behavior of our models. If the model were used in a context of reranking and filtering the |
|
Top-K results from a search, a threshold of 0.8 could be applied to filter the contexts outputted by the retriever, thereby reducing noise issues present in the contexts |
|
for RAG-type applications. |
|
|
|
How to Use Bloomz-3b-reranking |
|
------------------------------ |
|
|
|
The following example is based on the API Pipeline of the Transformers library. |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
reranker = pipeline( |
|
task='feature-extraction', |
|
model='cmarkea/bloomz-3b-reranking', |
|
top_k=None |
|
) |
|
|
|
similarities = reranker( |
|
[ |
|
dict( |
|
text=context, # the model was trained with context in `text` |
|
text_pair=query # and query in `text_pair` argument. |
|
) |
|
for context in contexts |
|
] |
|
) |
|
contexts_reranked = sorted( |
|
filter( |
|
lambda x: x[0]['label'] == "LABEL_1", |
|
zip(similarities, contexts) |
|
), |
|
key=lambda x: x[0] |
|
) |
|
score, contexts_cleaned = zip( |
|
*filter( |
|
lambda x: x[0] >= 0.8 |
|
) |
|
) |
|
``` |
|
|
|
Citation |
|
-------- |
|
|
|
```bibtex |
|
@online{DeBloomzReranking, |
|
AUTHOR = {Cyrile Delestre}, |
|
ORGANIZATION = {Cr{\'e}dit Mutuel Ark{\'e}a}, |
|
URL = {https://huggingface.co/cmarkea/bloomz-3b-reranking}, |
|
YEAR = {2024}, |
|
KEYWORDS = {NLP ; Transformers ; LLM ; Bloomz}, |
|
} |
|
``` |