SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("adriansanz/sqv2")
# Run inference
sentences = [
    'La presentació de la sol·licitud no dona dret al muntatge de la parada.',
    'Quin és el requisit per a la presentació de la sol·licitud d’autorització?',
    'Quin és el motiu per canviar la persona titular dels drets funeraris?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.044
cosine_accuracy@3 0.116
cosine_accuracy@5 0.18
cosine_accuracy@10 0.3507
cosine_precision@1 0.044
cosine_precision@3 0.0387
cosine_precision@5 0.036
cosine_precision@10 0.0351
cosine_recall@1 0.044
cosine_recall@3 0.116
cosine_recall@5 0.18
cosine_recall@10 0.3507
cosine_ndcg@10 0.1659
cosine_mrr@10 0.111
cosine_map@100 0.1341

Information Retrieval

Metric Value
cosine_accuracy@1 0.0413
cosine_accuracy@3 0.116
cosine_accuracy@5 0.1787
cosine_accuracy@10 0.3627
cosine_precision@1 0.0413
cosine_precision@3 0.0387
cosine_precision@5 0.0357
cosine_precision@10 0.0363
cosine_recall@1 0.0413
cosine_recall@3 0.116
cosine_recall@5 0.1787
cosine_recall@10 0.3627
cosine_ndcg@10 0.169
cosine_mrr@10 0.1116
cosine_map@100 0.1341

Information Retrieval

Metric Value
cosine_accuracy@1 0.0467
cosine_accuracy@3 0.116
cosine_accuracy@5 0.1787
cosine_accuracy@10 0.356
cosine_precision@1 0.0467
cosine_precision@3 0.0387
cosine_precision@5 0.0357
cosine_precision@10 0.0356
cosine_recall@1 0.0467
cosine_recall@3 0.116
cosine_recall@5 0.1787
cosine_recall@10 0.356
cosine_ndcg@10 0.1677
cosine_mrr@10 0.1121
cosine_map@100 0.1346

Information Retrieval

Metric Value
cosine_accuracy@1 0.0387
cosine_accuracy@3 0.1067
cosine_accuracy@5 0.1707
cosine_accuracy@10 0.3413
cosine_precision@1 0.0387
cosine_precision@3 0.0356
cosine_precision@5 0.0341
cosine_precision@10 0.0341
cosine_recall@1 0.0387
cosine_recall@3 0.1067
cosine_recall@5 0.1707
cosine_recall@10 0.3413
cosine_ndcg@10 0.1587
cosine_mrr@10 0.1046
cosine_map@100 0.129

Information Retrieval

Metric Value
cosine_accuracy@1 0.0493
cosine_accuracy@3 0.1227
cosine_accuracy@5 0.1987
cosine_accuracy@10 0.3667
cosine_precision@1 0.0493
cosine_precision@3 0.0409
cosine_precision@5 0.0397
cosine_precision@10 0.0367
cosine_recall@1 0.0493
cosine_recall@3 0.1227
cosine_recall@5 0.1987
cosine_recall@10 0.3667
cosine_ndcg@10 0.1759
cosine_mrr@10 0.119
cosine_map@100 0.142

Information Retrieval

Metric Value
cosine_accuracy@1 0.0373
cosine_accuracy@3 0.0947
cosine_accuracy@5 0.1573
cosine_accuracy@10 0.34
cosine_precision@1 0.0373
cosine_precision@3 0.0316
cosine_precision@5 0.0315
cosine_precision@10 0.034
cosine_recall@1 0.0373
cosine_recall@3 0.0947
cosine_recall@5 0.1573
cosine_recall@10 0.34
cosine_ndcg@10 0.1535
cosine_mrr@10 0.0987
cosine_map@100 0.1226

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 6,749 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 6 tokens
    • mean: 42.03 tokens
    • max: 106 tokens
    • min: 10 tokens
    • mean: 20.32 tokens
    • max: 54 tokens
  • Samples:
    positive anchor
    Aquest tràmit us permet compensar deutes de naturalesa pública a favor de l'Ajuntament, sigui quin sigui el seu estat (voluntari/executiu), amb crèdits reconeguts per aquest a favor del mateix deutor, i que el seu estat sigui pendent de pagament. Quin és el benefici de la compensació de deutes amb crèdits?
    El seu objecte és que -prèviament a la seva execució material- l'Ajuntament comprovi l'adequació de l’actuació a la normativa i planejament, així com a les ordenances municipals sobre l’ús del sòl i edificació. Quin és el paper de les ordenances municipals en aquest tràmit?
    Comunicació prèvia del manteniment en espais, zones o instal·lacions comunitàries interiors dels edificis (reparació i/o millora de materials). Quin és el límit del manteniment en espais comunitaris interiors dels edificis?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.2
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.3791 10 3.0867 - - - - - -
0.7583 20 2.4414 - - - - - -
0.9858 26 - 0.1266 0.1255 0.1232 0.1257 0.1091 0.1345
1.1351 30 1.7091 - - - - - -
1.5142 40 1.2495 - - - - - -
1.8934 50 0.9813 - - - - - -
1.9692 52 - 0.1315 0.1325 0.1285 0.1328 0.1218 0.1309
2.2701 60 0.6918 - - - - - -
2.6493 70 0.7146 - - - - - -
2.9905 79 - 0.1370 0.1344 0.1355 0.1338 0.1269 0.1363
3.0261 80 0.6002 - - - - - -
3.4052 90 0.4816 - - - - - -
3.7844 100 0.4949 - - - - - -
3.9739 105 - 0.1357 0.1393 0.1302 0.1347 0.1204 0.1354
4.1611 110 0.474 - - - - - -
4.5403 120 0.4692 - - - - - -
4.9194 130 0.4484 0.1341 0.142 0.129 0.1346 0.1226 0.1341
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.35.0.dev0
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
10
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for adriansanz/sqv-v2

Base model

BAAI/bge-m3
Finetuned
(182)
this model

Evaluation results