Edit model card

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from SQAI/bge-embedding-model2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: SQAI/bge-embedding-model2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("SQAI/bge-embedding-model")
# Run inference
sentences = [
    'Name of streetlight failure',
    'all failure types for geozone = 26 in streetlighting',
    'failure count for streetlight for time = last 3 months',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 0.0909
cosine_accuracy@5 0.0909
cosine_accuracy@10 0.1818
cosine_precision@1 0.0
cosine_precision@3 0.0303
cosine_precision@5 0.0182
cosine_precision@10 0.0182
cosine_recall@1 0.0
cosine_recall@3 0.0909
cosine_recall@5 0.0909
cosine_recall@10 0.1818
cosine_ndcg@10 0.0897
cosine_mrr@10 0.0606
cosine_map@100 0.0947

Information Retrieval

Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 0.0909
cosine_accuracy@5 0.0909
cosine_accuracy@10 0.1818
cosine_precision@1 0.0
cosine_precision@3 0.0303
cosine_precision@5 0.0182
cosine_precision@10 0.0182
cosine_recall@1 0.0
cosine_recall@3 0.0909
cosine_recall@5 0.0909
cosine_recall@10 0.1818
cosine_ndcg@10 0.0897
cosine_mrr@10 0.0606
cosine_map@100 0.0947

Information Retrieval

Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 0.0
cosine_accuracy@5 0.0909
cosine_accuracy@10 0.1818
cosine_precision@1 0.0
cosine_precision@3 0.0
cosine_precision@5 0.0182
cosine_precision@10 0.0182
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0909
cosine_recall@10 0.1818
cosine_ndcg@10 0.0676
cosine_mrr@10 0.0333
cosine_map@100 0.0701

Information Retrieval

Metric Value
cosine_accuracy@1 0.0
cosine_accuracy@3 0.0
cosine_accuracy@5 0.0909
cosine_accuracy@10 0.1818
cosine_precision@1 0.0
cosine_precision@3 0.0
cosine_precision@5 0.0182
cosine_precision@10 0.0182
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0909
cosine_recall@10 0.1818
cosine_ndcg@10 0.0715
cosine_mrr@10 0.0379
cosine_map@100 0.0706

Information Retrieval

Metric Value
cosine_accuracy@1 0.0909
cosine_accuracy@3 0.1818
cosine_accuracy@5 0.1818
cosine_accuracy@10 0.1818
cosine_precision@1 0.0909
cosine_precision@3 0.0606
cosine_precision@5 0.0364
cosine_precision@10 0.0182
cosine_recall@1 0.0909
cosine_recall@3 0.1818
cosine_recall@5 0.1818
cosine_recall@10 0.1818
cosine_ndcg@10 0.1364
cosine_mrr@10 0.1212
cosine_map@100 0.1479

Training Details

Training Dataset

Unnamed Dataset

  • Size: 96 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 6 tokens
    • mean: 10.04 tokens
    • max: 14 tokens
    • min: 12 tokens
    • mean: 17.74 tokens
    • max: 31 tokens
  • Samples:
    positive anchor
    Number of power failures in streetlight failure report failure count for street name = 'Willow Lane', 'Cedar Road' for time = last 45 days in streetlighting in streetlighting
    Datetime of the streetlight failure failure count for streetlight for streets = 'Oak Street', 'Pine Avenue' for past days = 60
    type of streetlight failure event failure types and count for each for geozone = 5 in streetlighting in streetlighting
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            384,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 11 evaluation samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 6 tokens
    • mean: 8.91 tokens
    • max: 12 tokens
    • min: 12 tokens
    • mean: 16.36 tokens
    • max: 24 tokens
  • Samples:
    positive anchor
    event time of streetlight failure failure count for streetlight for time = last 3 months
    Geographical zone identifier in streetlight energy report current energy utilization for geozone = 233 in streetlighting
    event time of streetlight failure failure count for streetlight for streets = 'Oak Street', 'Pine Avenue' for past days = 60
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            384,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-06
  • weight_decay: 0.03
  • num_train_epochs: 100
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.2
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-06
  • weight_decay: 0.03
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 100
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.2
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
1.0 1 1.5172 3.308 0.1019 0.0993 0.1004 0.0839 0.1004
2.0 2 5.0385 3.3083 0.1015 0.0987 0.1004 0.0830 0.1004
3.0 3 4.9605 - - - - - -
4.0 4 1.3564 3.2953 0.1015 0.0993 0.1004 0.0835 0.1004
4.0 5 3.8004 - - - - - -
6.0 6 2.4973 3.2846 0.1017 0.0986 0.0985 0.0835 0.0985
5.0 7 2.5891 - - - - - -
8.0 8 3.4016 3.2645 0.1023 0.0991 0.0990 0.0841 0.0990
6.0 9 1.5213 3.2505 0.1023 0.0990 0.0990 0.1616 0.0990
6.6667 10 4.5108 - - - - - -
7.0 11 0.4746 3.2425 0.1023 0.0984 0.0989 0.1629 0.0989
8.0 12 4.8935 - - - - - -
8.6667 13 0.9945 3.2139 0.1019 0.0984 0.0989 0.1625 0.0989
9.0 14 4.0429 - - - - - -
10.6667 15 2.0389 3.2113 0.1019 0.0975 0.0989 0.1615 0.0989
10.0 16 2.7488 - - - - - -
12.6667 17 3.0149 3.2042 0.0997 0.0980 0.0994 0.1626 0.0994
11.0 18 1.7699 3.1955 0.0993 0.0975 0.0989 0.1621 0.0989
11.3333 19 3.8838 - - - - - -
12.0 20 0.9574 3.1893 0.0724 0.0980 0.0994 0.1626 0.0994
13.0 21 4.9939 - - - - - -
13.3333 22 0.4745 3.1885 0.0721 0.0975 0.0994 0.1626 0.0994
14.0 23 4.2322 - - - - - -
15.3333 24 1.716 3.1807 0.0721 0.0975 0.0989 0.1621 0.0989
15.0 25 3.2712 - - - - - -
17.3333 26 2.8033 3.1809 0.0721 0.0980 0.0982 0.1626 0.0982
16.0 27 2.0541 - - - - - -
19.3333 28 3.371 3.1766 0.0721 0.0975 0.0989 0.1621 0.0989
17.0 29 1.5369 3.1774 0.0721 0.0975 0.0989 0.1626 0.0989
18.0 30 4.8708 3.1747 0.0721 0.0980 0.0995 0.1626 0.0995
1.0 1 1.5516 3.308 0.1019 0.0993 0.1004 0.0839 0.1004
2.0 2 5.0106 3.3059 0.1016 0.0987 0.1004 0.0830 0.1004
3.0 3 5.1464 - - - - - -
4.0 4 1.4257 3.3023 0.1018 0.0998 0.1015 0.0839 0.1015
4.0 5 3.7121 - - - - - -
6.0 6 2.4988 3.2836 0.1021 0.0997 0.0989 0.0835 0.0989
5.0 7 2.5883 - - - - - -
8.0 8 3.4871 3.2643 0.1019 0.0980 0.0985 0.1616 0.0985
6.0 9 1.5461 3.2566 0.1019 0.0980 0.0985 0.1626 0.0985
6.6667 10 4.5728 - - - - - -
7.0 11 0.4992 - - - 0.0986 - 0.0986
1.0 1 1.4962 3.2465 0.1019 0.0984 0.0986 0.1625 0.0986
2.0 2 4.9421 3.2406 0.1019 0.0984 0.0986 0.1623 0.0986
3.0 3 4.8646 - - - - - -
4.0 4 1.3279 3.2409 0.1019 0.0984 0.0986 0.1627 0.0986
4.0 5 3.7291 - - - - - -
6.0 6 2.4406 3.2408 0.1019 0.0984 0.0986 0.1625 0.0986
5.0 7 2.561 - - - - - -
8.0 8 3.3645 3.2243 0.1019 0.0984 0.0989 0.1625 0.0989
6.0 9 1.5068 3.2288 0.1023 0.0995 0.0994 0.1625 0.0994
6.6667 10 4.4725 - - - - - -
7.0 11 0.4709 3.2209 0.1019 0.0965 0.0994 0.1625 0.0994
8.0 12 4.8686 - - - - - -
8.6667 13 0.9881 3.1989 0.1022 0.0975 0.0989 0.1628 0.0989
9.0 14 4.0255 - - - - - -
10.6667 15 2.0269 3.1991 0.0735 0.0980 0.0994 0.1635 0.0994
10.0 16 2.7374 - - - - - -
12.6667 17 3.0016 3.1837 0.0721 0.0975 0.0989 0.1621 0.0989
11.0 18 1.7615 3.1818 0.0721 0.0975 0.0989 0.1621 0.0989
11.3333 19 3.8629 - - - - - -
12.0 20 0.9504 3.1707 0.0721 0.0975 0.0989 0.1621 0.0989
13.0 21 4.9551 - - - - - -
13.3333 22 0.4699 3.1482 0.0724 0.0975 0.0977 0.1621 0.0977
14.0 23 4.1848 - - - - - -
15.3333 24 1.6928 3.1363 0.0727 0.0975 0.0976 0.1626 0.0976
15.0 25 3.2244 - - - - - -
17.3333 26 2.7449 3.1242 0.0706 0.0979 0.0980 0.1633 0.0980
16.0 27 2.0236 - - - - - -
19.3333 28 3.2878 3.1061 0.0706 0.0974 0.0980 0.1633 0.0980
17.0 29 1.5054 3.0932 0.0702 0.0974 0.0975 0.1629 0.0975
18.0 30 4.7552 3.0935 0.0702 0.0974 0.0975 0.1628 0.0975
19.0 31 4.7242 - - - - - -
20.0 32 1.3348 3.0866 0.0719 0.0706 0.0980 0.1633 0.0980
20.0 33 3.4499 - - - - - -
22.0 34 2.2464 3.0752 0.0714 0.0701 0.0975 0.1628 0.0975
21.0 35 2.4421 - - - - - -
24.0 36 3.361 3.0733 0.0702 0.0701 0.0975 0.1619 0.0975
22.0 37 1.445 3.0611 0.0702 0.0701 0.0949 0.1619 0.0949
22.6667 38 4.2549 - - - - - -
23.0 39 0.4309 3.0643 0.0702 0.0701 0.0949 0.1468 0.0949
24.0 40 4.5979 - - - - - -
24.6667 41 0.9199 3.0597 0.0705 0.0701 0.0949 0.1468 0.0949
25.0 42 3.7976 - - - - - -
26.6667 43 2.0184 3.0376 0.0702 0.0701 0.0949 0.1468 0.0949
26.0 44 2.7693 - - - - - -
28.6667 45 2.8735 3.0315 0.0716 0.0701 0.0949 0.1472 0.0949
27.0 46 1.8392 3.0433 0.0716 0.0701 0.0949 0.1472 0.0949
27.3333 47 3.7241 - - - - - -
28.0 48 0.9107 3.0330 0.0716 0.0852 0.0949 0.1467 0.0949
29.0 49 4.6802 - - - - - -
29.3333 50 0.4656 3.0391 0.0716 0.0706 0.0949 0.1471 0.0949
30.0 51 4.1725 - - - - - -
31.3333 52 1.5874 3.0268 0.0725 0.0852 0.0954 0.1474 0.0954
31.0 53 2.8912 - - - - - -
33.3333 54 2.576 3.0398 0.0724 0.0706 0.0949 0.1478 0.0949
32.0 55 2.0706 - - - - - -
35.3333 56 3.2267 3.0450 0.0711 0.0852 0.0954 0.1474 0.0954
33.0 57 1.4441 3.0368 0.0727 0.0857 0.0954 0.1478 0.0954
34.0 58 4.5517 3.0355 0.0711 0.0858 0.0949 0.1474 0.0949
35.0 59 4.5666 - - - - - -
36.0 60 1.1933 3.0420 0.0704 0.0858 0.0949 0.1474 0.0949
36.0 61 3.3879 - - - - - -
38.0 62 2.1786 3.0319 0.0703 0.0858 0.0949 0.1472 0.0949
37.0 63 2.372 - - - - - -
40.0 64 2.9968 3.0433 0.0704 0.0858 0.0952 0.1470 0.0952
38.0 65 1.4256 3.0468 0.0699 0.0701 0.0947 0.1467 0.0947
38.6667 66 4.059 - - - - - -
39.0 67 0.4209 3.0429 0.0717 0.0701 0.0952 0.1462 0.0952
40.0 68 4.4681 - - - - - -
40.6667 69 0.8233 3.0443 0.0689 0.0701 0.0951 0.1467 0.0951
41.0 70 3.6476 - - - - - -
42.6667 71 1.9094 3.0405 0.0699 0.0707 0.0956 0.1467 0.0956
42.0 72 2.77 - - - - - -
44.6667 73 2.7653 3.0387 0.0706 0.0701 0.0947 0.1467 0.0947
43.0 74 1.6915 3.0436 0.0685 0.0701 0.0947 0.1462 0.0947
43.3333 75 3.7964 - - - - - -
44.0 76 0.8672 3.0414 0.0682 0.0701 0.0956 0.1479 0.0956
45.0 77 4.667 - - - - - -
45.3333 78 0.4751 3.0436 0.0682 0.0701 0.0952 0.1483 0.0952
46.0 79 4.207 - - - - - -
47.3333 80 1.5904 3.0396 0.0685 0.0701 0.0951 0.1482 0.0951
47.0 81 2.9637 - - - - - -
49.3333 82 2.537 3.0498 0.0685 0.0701 0.0947 0.1479 0.0947
48.0 83 2.0395 - - - - - -
51.3333 84 3.0573 3.0450 0.0682 0.0701 0.0952 0.1479 0.0952
49.0 85 1.4035 3.0411 0.0689 0.0701 0.0947 0.1479 0.0947
50.0 86 4.4484 3.0448 0.0682 0.0701 0.0678 0.1479 0.0678
51.0 87 4.5125 - - - - - -
52.0 88 1.2247 3.0401 0.0682 0.0701 0.0952 0.1479 0.0952
52.0 89 3.4724 - - - - - -
54.0 90 2.1704 3.0456 0.0685 0.0701 0.0947 0.1483 0.0947
53.0 91 2.417 - - - - - -
56.0 92 3.2605 3.0421 0.0685 0.0707 0.0678 0.1483 0.0678
54.0 93 1.4848 3.0397 0.0682 0.0701 0.0947 0.1479 0.0947
54.6667 94 4.1817 - - - - - -
55.0 95 0.4151 3.0415 0.0700 0.0701 0.0956 0.1483 0.0956
56.0 96 4.4844 - - - - - -
56.6667 97 0.7948 3.0465 0.0682 0.0701 0.0952 0.1479 0.0952
57.0 98 3.6202 - - - - - -
58.6667 99 1.8255 3.0401 0.0696 0.0701 0.0947 0.1479 0.0947
58.0 100 2.6248 3.0421 0.0706 0.0701 0.0947 0.1479 0.0947
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
19
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from

Evaluation results