ANGKJ1995's picture
Add new SentenceTransformer model
8ed69fe verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:897
  - loss:TripletLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
  - source_sentence: >-
      Well driller/borer and related mining worker operates, assembles and
      monitors machines for cutting channels in a mine workface or for the
      drilling and sinking of wells, extraction of ore, liquids and gases or for
      a variety of other purposes.
    sentences:
      - >-
        Prepare detailed drawings of architectural and structural features of
        buildings or drawings and topographical relief maps used in civil
        engineering projects, such as highways, bridges, and public works. Use
        knowledge of building materials, engineering practices, and mathematics
        to complete drawings.
      - >-
        Operate self-propelled mining machines that rip coal, metal and nonmetal
        ores, rock, stone, or sand from the mine face and load it onto
        conveyors, shuttle cars, or trucks in a continuous operation.
      - >-
        Conduct investigations related to suspected violations of federal,
        state, or local laws to prevent or solve crimes.
  - source_sentence: >-
      Van driver drives a van to pick up and deliver non-mail documents and
      parcels.
    sentences:
      - >-
        Drive a light vehicle, such as a truck or van, with a capacity of less
        than 26,001 pounds Gross Vehicle Weight (GVW), primarily to pick up
        merchandise or packages from a distribution center and deliver. May load
        and unload vehicle.
      - >-
        Plan, direct, or coordinate human resources activities and staff of an
        organization.
      - >-
        Devise methods to improve oil and gas extraction and production and
        determine the need for new or modified tool designs. Oversee drilling
        and offer technical advice.
  - source_sentence: >-
      Library officer assists librarians by helping readers in the use of
      library catalogues, databases, and indexes to locate books and other
      materials. He/she also compiles records, sorts and shelves books or other
      media, removes or repairs damaged books or other media, registers patrons
      and checks materials in and out of the circulation process. He/she
      replaces materials in shelving areas.
    sentences:
      - >-
        Assist librarians by helping readers in the use of library catalogs,
        databases, and indexes to locate books and other materials; and by
        answering questions that require only brief consultation of standard
        reference. Compile records; sort and shelve books or other media; remove
        or repair damaged books or other media; register patrons; and check
        materials in and out of the circulation process. Replace materials in
        shelving area (stacks) or files. Includes bookmobile drivers who assist
        with providing services in mobile libraries.
      - >-
        Perform engineering duties in planning and designing tools, engines,
        machines, and other mechanically functioning equipment. Oversee
        installation, operation, maintenance, and repair of equipment such as
        centralized heat, gas, water, and steam systems.
      - >-
        Perform a variety of food preparation duties other than cooking, such as
        preparing cold foods and shellfish, slicing meat, and brewing coffee or
        tea.
  - source_sentence: >-
      Pre-press trades worker proofs, formats, sets and composes text and
      graphics into a form suitable for use in various printing processes and
      representation in other visual media.
    sentences:
      - >-
        Directly supervise and coordinate activities of workers engaged in
        landscaping or groundskeeping activities. Work may involve reviewing
        contracts to ascertain service, machine, and workforce requirements;
        answering inquiries from potential customers regarding methods,
        material, and price ranges; and preparing estimates according to labor,
        material, and machine costs.
      - >-
        Plan, direct, or coordinate transportation, storage, or distribution
        activities in accordance with organizational policies and applicable
        government laws or regulations. Includes logistics managers.
      - >-
        Engrave or etch metal, wood, rubber, or other materials. Includes such
        workers as etcher-circuit processors, pantograph engravers, and silk
        screen etchers.
  - source_sentence: >-
      Composer/Orchestrator writes musical compositions such as symphonies,
      sonatas or operas. He/she translates compositions into standard musical
      signs and symbols on scored music paper. He/she may write words to
      accompany music. He/she adapts melodies to suit the type and style of
      orchestras or bands and to produce various kinds of effects. He/she
      determines instruments to be employed, writes musical scores to produce
      the desired musical effect, rewrites music written for one instrument or
      purpose into suitable forms for other instruments or purposes.
    sentences:
      - >-
        Evaluate materials and develop machinery and processes to manufacture
        materials for use in products that must meet specialized design and
        performance specifications. Develop new uses for known materials.
        Includes those engineers working with composite materials or
        specializing in one type of material, such as graphite, metal and metal
        alloys, ceramics and glass, plastics and polymers, and naturally
        occurring materials. Includes metallurgists and metallurgical engineers,
        ceramic engineers, and welding engineers.
      - >-
        Plan, direct, or coordinate the actual distribution or movement of a
        product or service to the customer. Coordinate sales distribution by
        establishing sales territories, quotas, and goals and establish training
        programs for sales representatives. Analyze sales statistics gathered by
        staff to determine sales potential and inventory requirements and
        monitor the preferences of customers.
      - >-
        Conduct, direct, plan, and lead instrumental or vocal performances by
        musical artists or groups, such as orchestras, bands, choirs, and glee
        clubs; or create original works of music.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: job description eval
          type: job-description-eval
        metrics:
          - type: cosine_accuracy
            value: 0.7288888692855835
            name: Cosine Accuracy

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ANGKJ1995/all-MiniLM-L6-v2-job-description")
# Run inference
sentences = [
    'Composer/Orchestrator writes musical compositions such as symphonies, sonatas or operas. He/she translates compositions into standard musical signs and symbols on scored music paper. He/she may write words to accompany music. He/she adapts melodies to suit the type and style of orchestras or bands and to produce various kinds of effects. He/she determines instruments to be employed, writes musical scores to produce the desired musical effect, rewrites music written for one instrument or purpose into suitable forms for other instruments or purposes.',
    'Conduct, direct, plan, and lead instrumental or vocal performances by musical artists or groups, such as orchestras, bands, choirs, and glee clubs; or create original works of music.',
    'Evaluate materials and develop machinery and processes to manufacture materials for use in products that must meet specialized design and performance specifications. Develop new uses for known materials. Includes those engineers working with composite materials or specializing in one type of material, such as graphite, metal and metal alloys, ceramics and glass, plastics and polymers, and naturally occurring materials. Includes metallurgists and metallurgical engineers, ceramic engineers, and welding engineers.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.7289

Training Details

Training Dataset

Unnamed Dataset

  • Size: 897 training samples
  • Columns: SSOC_DESCRIPTION, ONET_DESCRIPTION, and shuffled_ONET_DESCRIPTION
  • Approximate statistics based on the first 897 samples:
    SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION
    type string string string
    details
    • min: 14 tokens
    • mean: 66.05 tokens
    • max: 166 tokens
    • min: 9 tokens
    • mean: 44.67 tokens
    • max: 161 tokens
    • min: 7 tokens
    • mean: 44.52 tokens
    • max: 161 tokens
  • Samples:
    SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION
    Consumer audio/video equipment/radar broadcasting/transmitting equipment fitter/mechanic fits, adjusts, installs and repairs radio, television, transmitters, receivers and radar equipment in factory, workshop or place of use. He/she specialises in television transmitters/receivers, radar equipment, radio transmitters/receivers and two way radio communications equipment. He/she examines drawings and wiring diagrams, and diagnoses faults with aid of testing equipment. Repair, test, adjust, or install electronic equipment, such as industrial controls, transmitters, and antennas. Conduct programs of compensation and benefits and job analysis for employer. May specialize in specific areas, such as position classification and pension programs.
    Window cleaner washes and polishes windows and other glass fittings. He/she uses cleaning tools such as sponges and detergents to clean and polish windows, mirrors and other glass surfaces of buildings, both on the interior and exterior. He/she uses specific ladders to clean taller buildings with safety belts for support. Keep buildings in clean and orderly condition. Perform heavy cleaning duties, such as cleaning floors, shampooing rugs, washing walls and glass, and removing rubbish. Duties may include tending furnace and boiler, performing routine maintenance activities, notifying management of need for repairs, and cleaning snow or debris from sidewalk. Service automobiles, buses, trucks, boats, and other automotive or marine vehicles with fuel, lubricants, and accessories. Collect payment for services and supplies. May lubricate vehicle, change motor oil, refill antifreeze, or replace lights or other accessories, such as windshield wiper blades or fan belts. May repair or replace tires.
    Instrumentalist plays one or more musical instruments as a soloist, accompanist or member of an orchestra, band or other musical group. He/she studies and rehearses scores, tunes instruments to the proper pitch, plays music by manipulating keys, bows, valves, strings or percussion devices, depending on the type of instrument being played. He/she may improvise or transpose music or compose or arrange music. In an orchestra, he/she is usually designated according to the instrument played such as violinist, drummer or pianist. Play one or more musical instruments or sing. May perform on stage, for broadcasting, or for sound or video recording. Drive a light vehicle, such as a truck or van, with a capacity of less than 26,001 pounds Gross Vehicle Weight (GVW), primarily to pick up merchandise or packages from a distribution center and deliver. May load and unload vehicle.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 225 evaluation samples
  • Columns: SSOC_DESCRIPTION, ONET_DESCRIPTION, and shuffled_ONET_DESCRIPTION
  • Approximate statistics based on the first 225 samples:
    SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION
    type string string string
    details
    • min: 16 tokens
    • mean: 64.88 tokens
    • max: 130 tokens
    • min: 7 tokens
    • mean: 43.49 tokens
    • max: 161 tokens
    • min: 9 tokens
    • mean: 44.06 tokens
    • max: 161 tokens
  • Samples:
    SSOC_DESCRIPTION ONET_DESCRIPTION shuffled_ONET_DESCRIPTION
    Salesperson (door-to-door) describes, demonstrates and sells goods and services and solicits business for establishments by approaching or visiting potential customers, usually residents in private homes, by going from door to door. He/she gives details of what establishment can supply and quotes prices and terms. Contact new or existing customers to determine their solar equipment needs, suggest systems or equipment, or estimate costs. Recruit, screen, interview, or place individuals within an organization. May perform other activities in multiple human resources areas.
    Secretary performs a variety of administrative tasks to help keep an organisation running smoothly. He/she answers telephone calls, drafts and sends e-mails, maintains diaries, arranges appointments, takes messages, files documents, organises and services meetings, and manages databases. Perform secretarial duties using specific knowledge of medical terminology and hospital, clinic, or laboratory procedures. Duties may include scheduling appointments, billing patients, and compiling and recording medical charts, reports, and correspondence. Set up, operate, or tend forging machines to taper, shape, or form metal or plastic parts.
    Purchasing agent buys machinery, equipment, raw materials, services and other supplies for use by the enterprise. He/she ascertains the requirements of the enterprise and studies market information on varieties and qualities available. He/she interviews vendors to ascertain their ability to meet the organisation’s specific requirements for design, performance, price and delivery. He/she may approve bills for payment. Purchase machinery, equipment, tools, parts, supplies, or services necessary for the operation of an establishment. Purchase raw or semifinished materials for manufacturing. May negotiate contracts. Evaluate and treat musculoskeletal injuries or illnesses. Provide preventive, therapeutic, emergency, and rehabilitative care.
  • Loss: TripletLoss with these parameters:
    {
        "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
        "triplet_margin": 5
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 1e-05
  • num_train_epochs: 16
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 16
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Validation Loss job-description-eval_cosine_accuracy
-1 -1 - 0.1867
1.0 57 4.5738 0.4844
2.0 114 4.3775 0.7022
3.0 171 4.2681 0.7289

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}