Edit model card

SentenceTransformer based on vinai/phobert-base-v2

This is a sentence-transformers model finetuned from vinai/phobert-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: vinai/phobert-base-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Tuy_nhiên , nếu bệnh không tự lành và vẫn tiếp_tục chảy_máu , cần phải sử_dụng các liệu_pháp cầm máu để bù lại lượng máu đã mất .',
    'Một_số yếu_tố làm tăng nguy_cơ mắc bệnh như : Yếu_tố nội_tiết : bệnh thường gặp ở phụ_nữ chậm có kinh và sớm mãn_kinh .',
    'Nguyễn_Thị_Thanh_Tuyền ( 1995 ) .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 362,208 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 3 tokens
    • mean: 22.64 tokens
    • max: 104 tokens
    • min: 3 tokens
    • mean: 23.25 tokens
    • max: 222 tokens
    • min: 0.1
    • mean: 0.82
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    Hiệu_lực của vaccine AstraZeneca ra sao ? Hiệu_lực của vaccine AstraZeneca ra sao ? 1.0
    Gần đây , tôi có quen một bạn gái , mỗi lần ngồi gần nhau có cử_chỉ thân_mật thì tôi gần như không kìm chế được có_thể nói là giống như hiện_tượng xuất_tinh sớm . Chụp CT scanner sọ não : là hình_ảnh tốt nhất để đánh_giá tổn_thương não vì có_thể hiển_thị mô não hoặc xuất_huyết não hoặc nhũn_não . 0.6540138125419617
    Sốt siêu_vi sau quan_hệ tình_dục không an_toàn có phải đã nhiễm HIV không ? Sốt siêu_vi sau quan_hệ tình_dục không an_toàn có phải đã nhiễm HIV không ? 1.0
  • Loss: ContrastiveLoss with these parameters:
    {
        "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
        "margin": 0.5,
        "size_average": true
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 4
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss
0.0221 500 0.0168
0.0442 1000 0.0139
0.0663 1500 0.0142
0.0883 2000 0.0139
0.1104 2500 0.0137
0.1325 3000 0.0139
0.1546 3500 0.0137
0.1767 4000 0.0139
0.1988 4500 0.0136
0.2209 5000 0.0135
0.2430 5500 0.0137
0.2650 6000 0.0138
0.2871 6500 0.0136
0.3092 7000 0.0137
0.3313 7500 0.0138
0.3534 8000 0.0135
0.3755 8500 0.0138
0.3976 9000 0.0138
0.4196 9500 0.0141
0.4417 10000 0.0139
0.4638 10500 0.0139
0.4859 11000 0.0138
0.5080 11500 0.0141
0.5301 12000 0.0138
0.5522 12500 0.0138
0.5743 13000 0.0138
0.5963 13500 0.0138
0.6184 14000 0.0136
0.6405 14500 0.0139
0.6626 15000 0.0151
0.6847 15500 0.019
0.7068 16000 0.0184
0.7289 16500 0.018
0.7509 17000 0.0163
0.7730 17500 0.0164
0.7951 18000 0.0158
0.8172 18500 0.0155
0.8393 19000 0.0151
0.8614 19500 0.0151
0.8835 20000 0.0152
0.9056 20500 0.0152
0.9276 21000 0.0151
0.9497 21500 0.0148
0.9718 22000 0.015
0.9939 22500 0.0147
1.0160 23000 0.0149
1.0381 23500 0.0151
1.0602 24000 0.015
1.0823 24500 0.0148
1.1043 25000 0.0147
1.1264 25500 0.0149
1.1485 26000 0.0147
1.1706 26500 0.015
1.1927 27000 0.0146
1.2148 27500 0.0145
1.2369 28000 0.0147
1.2589 28500 0.0149
1.2810 29000 0.0147
1.3031 29500 0.0144
1.3252 30000 0.0147
1.3473 30500 0.0147
1.3694 31000 0.0145
1.3915 31500 0.0149
1.4136 32000 0.0147
1.4356 32500 0.0148
1.4577 33000 0.0148
1.4798 33500 0.0145
1.5019 34000 0.0149
1.5240 34500 0.0147
1.5461 35000 0.0146
1.5682 35500 0.0144
1.5902 36000 0.0146
1.6123 36500 0.0143
1.6344 37000 0.0145
1.6565 37500 0.0145
1.6786 38000 0.0146
1.7007 38500 0.0143
1.7228 39000 0.0149
1.7449 39500 0.0143
1.7669 40000 0.0146
1.7890 40500 0.0146
1.8111 41000 0.0146
1.8332 41500 0.0142
1.8553 42000 0.0144
1.8774 42500 0.0146
1.8995 43000 0.0147
1.9215 43500 0.0144
1.9436 44000 0.0145
1.9657 44500 0.0143
1.9878 45000 0.0146
2.0099 45500 0.0143
2.0320 46000 0.0147
2.0541 46500 0.0146
2.0762 47000 0.0144
2.0982 47500 0.0144
2.1203 48000 0.0144
2.1424 48500 0.0145
2.1645 49000 0.0144
2.1866 49500 0.0144
2.2087 50000 0.0141
2.2308 50500 0.0142
2.2528 51000 0.0145
2.2749 51500 0.0143
2.2970 52000 0.0141
2.3191 52500 0.0144
2.3412 53000 0.0143
2.3633 53500 0.0144
2.3854 54000 0.0144
2.4075 54500 0.0144
2.4295 55000 0.0145
2.4516 55500 0.0145
2.4737 56000 0.0144
2.4958 56500 0.0147
2.5179 57000 0.0145
2.5400 57500 0.0144
2.5621 58000 0.0143
2.5842 58500 0.0144
2.6062 59000 0.0143
2.6283 59500 0.0142
2.6504 60000 0.0143
2.6725 60500 0.0143
2.6946 61000 0.0143
2.7167 61500 0.0144
2.7388 62000 0.0143
2.7608 62500 0.0143
2.7829 63000 0.0146
2.8050 63500 0.0144
2.8271 64000 0.0141
2.8492 64500 0.0142
2.8713 65000 0.0143
2.8934 65500 0.0146
2.9155 66000 0.0143
2.9375 66500 0.0143
2.9596 67000 0.0141
2.9817 67500 0.0144
3.0038 68000 0.0143
3.0259 68500 0.0145
3.0480 69000 0.0142
3.0701 69500 0.0145
3.0921 70000 0.0142
3.1142 70500 0.0143
3.1363 71000 0.0142
3.1584 71500 0.0143
3.1805 72000 0.0143
3.2026 72500 0.014
3.2247 73000 0.0141
3.2468 73500 0.0142
3.2688 74000 0.0143
3.2909 74500 0.0141
3.3130 75000 0.0141
3.3351 75500 0.0143
3.3572 76000 0.0141
3.3793 76500 0.0143
3.4014 77000 0.0143
3.4234 77500 0.0146
3.4455 78000 0.0144
3.4676 78500 0.0143
3.4897 79000 0.0144
3.5118 79500 0.0145
3.5339 80000 0.0142
3.5560 80500 0.0144
3.5781 81000 0.0143
3.6001 81500 0.0142
3.6222 82000 0.0142
3.6443 82500 0.0142
3.6664 83000 0.014
3.6885 83500 0.0144
3.7106 84000 0.0141
3.7327 84500 0.0143
3.7547 85000 0.014
3.7768 85500 0.0146
3.7989 86000 0.0143
3.8210 86500 0.0142
3.8431 87000 0.0139
3.8652 87500 0.0143
3.8873 88000 0.0144
3.9094 88500 0.0143
3.9314 89000 0.0142
3.9535 89500 0.0142
3.9756 90000 0.0142

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.1.0.dev0
  • Transformers: 4.39.3
  • PyTorch: 2.1.2
  • Accelerate: 0.29.3
  • Datasets: 2.18.0
  • Tokenizers: 0.15.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveLoss

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)}, 
    title={Dimensionality Reduction by Learning an Invariant Mapping}, 
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}
Downloads last month
26
Safetensors
Model size
135M params
Tensor type
F32
·
Inference API
This model can be loaded on Inference API (serverless).

Finetuned from