jinaai
/

jina-reranker-v2-base-multilingual

@@ -23,6 +23,13 @@ license: cc-by-nc-4.0
 # jina-reranker-v2-base-multilingual
 # Usage
 1. The easiest way to starting using `jina-reranker-v2-base-multilingual` is to use Jina AI's [Reranker API](https://jina.ai/reranker/).
@@ -57,9 +64,11 @@ curl https://api.jina.ai/v1/rerank \
 from transformers import AutoModelForSequenceClassification
 model = AutoModelForSequenceClassification.from_pretrained(
-    'jinaai/jina-reranker-v2-base-multilingual', trust_remote_code=True,
 )
-model.to('cuda')
 # Example query and documents
 query = "Organic skincare products for sensitive skin"
@@ -79,7 +88,27 @@ documents = [
 # construct sentence pairs
 sentence_pairs = [[query, doc] for doc in documents]
-scores = model.compute_score(sentence_pairs)
 ```
 That's it! You can now use the `jina-reranker-v2-base-multilingual` model in your projects.

 # jina-reranker-v2-base-multilingual
+Compared with the state-of-the-art reranker models, including the previous released `jina-reranker-v1-base-en`, the **Jina Reranker v2** model has demonstrated competitiveness across a series of benchmarks targeting for text retrieval, multilingual capability, function-calling-aware and text-to-SQL-aware reranking, and code retrieval tasks.
+The `jina-reranker-v2-base-multilingual` model is capable of handling long texts with a context length of up to `1024` tokens, enabling the processing of extensive inputs. To enable the model to handle long texts that exceed 1024 tokens, the model uses a sliding window approach to chunk the input text into smaller pieces and rerank each chunk separately.
+The model is also equipped with a flash attention mechanism, which significantly improves the model's performance.
 # Usage
 1. The easiest way to starting using `jina-reranker-v2-base-multilingual` is to use Jina AI's [Reranker API](https://jina.ai/reranker/).
 from transformers import AutoModelForSequenceClassification
 model = AutoModelForSequenceClassification.from_pretrained(
+    'jinaai/jina-reranker-v2-base-multilingual',
+    device_map="cuda",
+    torch_dtype="auto",
+    trust_remote_code=True,
 )
 # Example query and documents
 query = "Organic skincare products for sensitive skin"
 # construct sentence pairs
 sentence_pairs = [[query, doc] for doc in documents]
+scores = model.compute_score(sentence_pairs, max_length=1024)
 ```
 That's it! You can now use the `jina-reranker-v2-base-multilingual` model in your projects.
+Note that by default, the `jina-reranker-v2-base-multilingual` model uses [flash attention](https://github.com/Dao-AILab/flash-attention), which requires certain types of GPU hardware to run. If you encounter any issues, you can try call `AutoModelForSequenceClassification.from_pretrained()` with `use_flash_attn=False`. This will use the standard attention mechanism instead of flash attention. You can also try running the model on a CPU by setting `device_map="cpu"`.
+In addition to the `compute_score()` function, the `jina-reranker-v2-base-multilingual` model also provides a `model.rerank()` function that can be used to rerank documents based on a query. You can use it as follows:
+```python
+result = model.rerank(
+    query,
+    documents,
+    max_query_length=512,
+    max_length=1024,
+    top_n=3
+)
+```
+Inside the `result` object, you will find the reranked documents along with their scores. You can use this information to further process the documents as needed.
+What's more, the `rerank()` function will automatically chunk the input documents into smaller pieces if they exceed the model's maximum input length. This allows you to rerank long documents without running into memory issues.
+Specifically, the `rerank()` function will split the documents into chunks of size `max_length` and rerank each chunk separately. The scores from all the chunks are then combined to produce the final reranking results. You can control the query length and document length in each chunk by setting the `max_query_length` and `max_length` parameters. The `rerank()` function also supports the `overlap` parameter (default is `80`) which determines how much overlap there is between adjacent chunks. This can be useful when reranking long documents to ensure that the model has enough context to make accurate predictions.