Edit model card

SentenceTransformer based on huudan123/stage1

This is a sentence-transformers model finetuned from huudan123/stage1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: huudan123/stage1
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("huudan123/stage2")
# Run inference
sentences = [
    'bạn tiếp_tục nhập thông_tin cơ_sở dữ_liệu',
    'bạn mọi thứ bạn bắt_đầu_từ',
    'bạn tiếp_tục bạn nhập mọi thứ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.7133
spearman_cosine 0.714
pearson_manhattan 0.6924
spearman_manhattan 0.6987
pearson_euclidean 0.6928
spearman_euclidean 0.6988
pearson_dot 0.6562
spearman_dot 0.6553
pearson_max 0.7133
spearman_max 0.714

Training Details

Training Dataset

Unnamed Dataset

  • Size: 254,546 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 3 tokens
    • mean: 14.78 tokens
    • max: 110 tokens
    • min: 3 tokens
    • mean: 14.78 tokens
    • max: 110 tokens
    • min: 3 tokens
    • mean: 10.19 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    conceptualy kem skiming hai kích_thước cơ_bản sản_phẩm địa_lý sản_phẩm địa_lý làm kem skiming làm_việc kem skiming hai tập_trung sản_phẩm địa_lý
    sản_phẩm địa_lý làm kem skiming làm_việc conceptualy kem skiming hai kích_thước cơ_bản sản_phẩm địa_lý kem skiming hai tập_trung sản_phẩm địa_lý
    bạn biết trong mùa giải tôi đoán ở mức_độ bạn bạn mất chúng đến mức_độ tiếp_theo họ quyết_định nhớ đội_ngũ cha_mẹ chiến_binh quyết_định gọi nhớ một người ba a một người đàn_ông đi đến thay_thế anh ta một người đàn_ông nào đi thay_thế anh ta recals thực_hiện thứ sáu anh mất mọi thứ ở mức_độ người dân nhớ
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 1,660 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 13.54 tokens
    • max: 51 tokens
    • min: 4 tokens
    • mean: 13.54 tokens
    • max: 51 tokens
    • min: 3 tokens
    • mean: 8.78 tokens
    • max: 22 tokens
  • Samples:
    anchor positive negative
    anh ấy nói mẹ con về nhà xuống xe_buýt trường anh ấy gọi mẹ anh nói mẹ anh về nhà
    xuống xe_buýt trường anh ấy gọi mẹ anh ấy nói mẹ con về nhà anh nói mẹ anh về nhà
    tôi biết mình hướng tới mục_đích báo_cáo một địa_chỉ ở washington tôi bao_giờ đến washington tôi chỉ_định ở tôi lạc cố_gắng tìm tôi hoàn_toàn chắc_chắn tôi làm tôi đi đến washington tôi giao báo_cáo
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • overwrite_output_dir: True
  • eval_strategy: epoch
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • num_train_epochs: 20
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.05
  • fp16: True
  • load_best_model_at_end: True
  • gradient_checkpointing: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: True
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 256
  • per_device_eval_batch_size: 256
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.05
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine
0 0 - - 0.5307
0.0503 50 9.1742 - -
0.1005 100 5.9716 - -
0.1508 150 4.6737 - -
0.2010 200 3.2819 - -
0.2513 250 2.8832 - -
0.3015 300 2.7327 - -
0.3518 350 2.6305 - -
0.4020 400 2.6239 - -
0.4523 450 2.5527 - -
0.5025 500 2.5271 - -
0.5528 550 2.4904 - -
0.6030 600 2.4987 - -
0.6533 650 2.4009 - -
0.7035 700 2.3944 - -
0.7538 750 2.5054 - -
0.8040 800 2.3989 - -
0.8543 850 2.4019 - -
0.9045 900 2.3638 - -
0.9548 950 2.3478 - -
1.0 995 - 3.0169 0.7322
1.0050 1000 2.4424 - -
1.0553 1050 2.2478 - -
1.1055 1100 2.2448 - -
1.1558 1150 2.205 - -
1.2060 1200 2.1811 - -
1.2563 1250 2.1794 - -
1.3065 1300 2.1495 - -
1.3568 1350 2.1548 - -
1.4070 1400 2.1299 - -
1.4573 1450 2.1335 - -
1.5075 1500 2.1388 - -
1.5578 1550 2.0999 - -
1.6080 1600 2.0859 - -
1.6583 1650 2.0959 - -
1.7085 1700 2.0334 - -
1.7588 1750 2.0647 - -
1.8090 1800 2.0261 - -
1.8593 1850 2.0133 - -
1.9095 1900 2.0517 - -
1.9598 1950 2.0152 - -
2.0 1990 - 3.1210 0.7187
2.0101 2000 1.924 - -
2.0603 2050 1.7472 - -
2.1106 2100 1.7485 - -
2.1608 2150 1.7536 - -
2.2111 2200 1.751 - -
2.2613 2250 1.7172 - -
2.3116 2300 1.7269 - -
2.3618 2350 1.7352 - -
2.4121 2400 1.7019 - -
2.4623 2450 1.7278 - -
2.5126 2500 1.7046 - -
2.5628 2550 1.6962 - -
2.6131 2600 1.6881 - -
2.6633 2650 1.6806 - -
2.7136 2700 1.6614 - -
2.7638 2750 1.6918 - -
2.8141 2800 1.6794 - -
2.8643 2850 1.6708 - -
2.9146 2900 1.6531 - -
2.9648 2950 1.6236 - -
3.0 2985 - 3.2556 0.7140
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.4
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for huudan123/stage2

Finetuned
huudan123/stage1
Finetuned
(1)
this model

Evaluation results