Shashwat13333's picture
Model save
caf671e verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:154
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/msmarco-distilbert-base-v4
widget:
  - source_sentence: Hey, what career oppotunities do you provide?
    sentences:
      - >-
        TechChefz Digital is present in two countries. Its headquarters is in
        Noida, India, with additional offices in Delaware, United States, and
        Gauram Nagar, Delhi, India.
      - >
        Customer Experience & Marketing Technology

        Covering journey science, content architecture, personalization,
        campaign management, and conversion rate optimization, driving customer
        experiences and engagements


        Enterprise Platforms & Systems Integration

        Platform selection services in CMS, e-commerce, and learning management
        systems, with a focus on marketplace commerce


        Analytics, Data Science & Business Intelligence

        Engage in analytics, data science, and machine learning to derive
        insights. Implement intelligent search, recommendation engines, and
        predictive models for optimization and enhanced decision-making.
        TechChefz Digital seeks passionate individuals to join our innovative
        team. We offer dynamic work environments fostering creativity and
        expertise. Whether you're seasoned or fresh, exciting career
        opportunities await in technology, consulting, design, and more. Join us
        in shaping digital transformation and unlocking possibilities for
        clients and the industry.

        7+ Years Industry Experience


        300+ Enthusiasts


        80% Employee Retention Rate
      - >-
        How long does it take to develop an e-commerce website?

        The development time for an e-commerce website can vary widely depending
        on its complexity, features, and the platform chosen. A basic online
        store might take a few weeks to set up, while a custom, feature-rich
        site could take several months to develop. Clear communication of your
        requirements and timely decision-making can help streamline the process.
  - source_sentence: What technologies are used for web development?
    sentences:
      - >-
        Our Featured Insights

        Simplifying Image Loading in React with Lazy Loading and Intersection
        Observer API


        What Is React Js?


        The Role of Artificial Intelligence (AI) in Personalizing Digital
        Marketing Campaigns


        Mastering Personalization in Digital Marketing: Tailoring Campaigns for
        Success


        How Customer Experience Drives Your Business Growth


        Which is the best CMS for your Digital Transformation Journey?


        The Art of Test Case Creation Templates
      - >-
        DISCOVER TECHSTACK

        Empowering solutions

        with cutting-edge technology stacks

        Web & Mobile Development

        Crafting dynamic and engaging online experiences tailored to your
        brand's vision and objectives.

        Content Management Systems

        3D, AR & VR

        Learning Management System

        Commerce

        Analytics

        Personalization & Marketing Cloud

        Cloud & DevSecOps

        Tech Stack

        HTML, JS, CSS

        React JS

        Angular JS

        Vue JS

        Next JS

        React Native

        Flutter

        Node JS

        Python

        Frappe

        Java

        Spring Boot

        Go Lang

        Mongo DB

        PostgreSQL

        MySQL
      - >-
        Can you help migrate our existing infrastructure to a DevOps model?

        Yes, we specialize in transitioning traditional IT infrastructure to a
        DevOps model. Our process includes assessing your current setup,
        planning the migration, implementing the necessary tools and practices,
        and providing ongoing support to ensure a smooth transition.
  - source_sentence: Where is TechChefz based?
    sentences:
      - >-
        CLIENT TESTIMONIALS

        Worked with TCZ on two business critical website development projects.
        The TCZ team is a group of experts in their respective domains and have
        helped us with excellent end-to-end development of a website right from
        the conceptualization to implementation and maintenance. By Dr. Kunal
        Joshi - Healthcare Marketing & Strategy Professional


        TCZ helped us with our new website launch in a seamless manner. Through
        all our discussions, they made sure to have the website designed as we
        had envisioned it to be. Thank you team TCZ.

        By Dr. Sarita Ahlawat - Managing Director and Co-Founder, Botlab
        Dynamics 
      - >-
        TechChefz Digital is present in two countries. Its headquarters is in
        Noida, India, with additional offices in Delaware, United States, and
        Gauram Nagar, Delhi, India.
      - >2
          What we do

        Digital Strategy

        Creating digital frameworks that transform your digital enterprise and
        produce a return on investment.


        Platform Selection

        Helping you select the optimal digital experience, commerce, cloud and
        marketing platform for your enterprise.


        Platform Builds

        Deploying next-gen scalable and agile enterprise digital platforms,
        along with multi-platform integrations.


        Product Builds

        Help you ideate, strategize, and engineer your product with help of our
        enterprise frameworks 


        Team Augmentation

        Help you scale up and augment your existing team to solve your hiring
        challenges with our easy to deploy staff augmentation offerings .

        Managed Services

        Operate and monitor your business-critical applications, data, and IT
        workloads, along with Application maintenance and operations
  - source_sentence: Will you assess our current infrastructure before migrating?
    sentences:
      - >-
        Introducing the world of Global EdTech Firm.


        In this project, We implemented a comprehensive digital platform
        strategy to unify user experience across platforms, integrating diverse
        tech stacks and specialized platforms to enhance customer engagement and
        streamline operations.

        Develop tailored online tutoring and learning hub platforms, leveraging
        AI/ML for personalized learning experiences, thus accelerating user
        journeys and improving conversion rates.

        Provide managed services for seamless application support and platform
        stabilization, optimizing operational efficiency and enabling scalable
        B2B subscriptions for schools and districts, facilitating easy
        onboarding and growth across the US States.


        We also achieved 200% Improvement in Courses & Content being delivered
        to Students. 50% Increase in Student’s Retention 150%, Increase in
        Teacher & Tutor Retention.
      - >-
        TechChefz Digital has established its presence in two countries,
        showcasing its global reach and influence. The company’s headquarters is
        strategically located in Noida, India, serving as the central hub for
        its operations and leadership. In addition to the headquarters,
        TechChefz Digital has expanded its footprint with offices in Delaware,
        United States, allowing the company to cater to the North American
        market with ease and efficiency.
      - >-
        Can you help migrate our existing infrastructure to a DevOps model?

        Yes, we specialize in transitioning traditional IT infrastructure to a
        DevOps model. Our process includes assessing your current setup,
        planning the migration, implementing the necessary tools and practices,
        and providing ongoing support to ensure a smooth transition.
  - source_sentence: What steps do you take to understand a business's needs?
    sentences:
      - >-
        How do you customize your DevOps solutions for different industries?

        We understand that each industry has unique challenges and requirements.
        Our approach involves a thorough analysis of your business needs,
        industry standards, and regulatory requirements to tailor a DevOps
        solution that meets your specific objectives
      - >-
        Inception: Pioneering the Digital Frontier In our foundational year,
        TechChefz embarked on a journey of digital transformation, laying the
        groundwork for our future endeavors. We began working on Cab Accelerator
        Apps akin to Uber and Ola, deploying them across Europe, Africa, and
        Australia, marking our initial foray into global markets. Alongside, we
        successfully delivered technology trainings across USA & India. 

        queries-techchefz-website

        queries-techchefz-website

        100%

        10

        A4


        Accelerating Momentum: A year of strategic partnerships & Transformative
        Projects. In 2018, TechChefz continued to build on its strong
        foundation, expanding its global footprint and forging strategic
        partnerships. Our collaboration with digital agencies and system
        integrators propelled us into enterprise accounts, focusing on digital
        experience development. This year marked significant collaborations with
        leading automotive brands and financial institutions, enhancing our
        portfolio and establishing TechChefz as a trusted partner in the
        industry. 
         
      - >-
        Our Vision Be a partner for industry verticals on the inevitable journey
        towards enterprise transformation and future readiness, by harnessing
        the growing power of Artificial Intelligence, Machine Learning, Data
        Science and emerging methodologies, with immediacy of impact and
        swiftness of outcome.Our Mission

        To decode data, and code new intelligence into products and automation,
        engineer, develop and deploy systems and applications that redefine
        experiences and realign business growth.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: BGE base Financial Matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.03896103896103896
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4805194805194805
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5714285714285714
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.6493506493506493
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.03896103896103896
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.1601731601731602
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.11428571428571425
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.06493506493506492
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.03896103896103896
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4805194805194805
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5714285714285714
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6493506493506493
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.3349468392248154
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.23376623376623376
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.24652168791713625
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.025974025974025976
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4935064935064935
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5844155844155844
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.6493506493506493
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.025974025974025976
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.1645021645021645
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.11688311688311684
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.06493506493506492
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.025974025974025976
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4935064935064935
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5844155844155844
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6493506493506493
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.3381817622000061
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.23697691197691195
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.2485755814005223
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.05194805194805195
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4675324675324675
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5194805194805194
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.6233766233766234
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.05194805194805195
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.15584415584415587
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1038961038961039
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.062337662337662324
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.05194805194805195
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4675324675324675
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5194805194805194
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6233766233766234
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.3379715765084199
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.24577922077922074
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.2597360814073472
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.05194805194805195
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.44155844155844154
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5584415584415584
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.6623376623376623
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.05194805194805195
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.14718614718614723
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.11168831168831166
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0662337662337662
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.05194805194805195
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.44155844155844154
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5584415584415584
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6623376623376623
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.34288867015255386
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.24065656565656557
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.2507978917088375
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.06493506493506493
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4155844155844156
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.5064935064935064
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.5974025974025974
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.06493506493506493
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.13852813852813856
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1012987012987013
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.05974025974025971
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.06493506493506493
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.4155844155844156
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.5064935064935064
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.5974025974025974
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.32285221821950844
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.23481240981240978
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.24816289395996594
            name: Cosine Map@100

BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from sentence-transformers/msmarco-distilbert-base-v4. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/msmarco-distilbert-base-v4
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Shashwat13333/msmarco-distilbert-base-v4")
# Run inference
sentences = [
    "What steps do you take to understand a business's needs?",
    'How do you customize your DevOps solutions for different industries?\nWe understand that each industry has unique challenges and requirements. Our approach involves a thorough analysis of your business needs, industry standards, and regulatory requirements to tailor a DevOps solution that meets your specific objectives',
    'Our Vision Be a partner for industry verticals on the inevitable journey towards enterprise transformation and future readiness, by harnessing the growing power of Artificial Intelligence, Machine Learning, Data Science and emerging methodologies, with immediacy of impact and swiftness of outcome.Our Mission\nTo decode data, and code new intelligence into products and automation, engineer, develop and deploy systems and applications that redefine experiences and realign business growth.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.039 0.026 0.0519 0.0519 0.0649
cosine_accuracy@3 0.4805 0.4935 0.4675 0.4416 0.4156
cosine_accuracy@5 0.5714 0.5844 0.5195 0.5584 0.5065
cosine_accuracy@10 0.6494 0.6494 0.6234 0.6623 0.5974
cosine_precision@1 0.039 0.026 0.0519 0.0519 0.0649
cosine_precision@3 0.1602 0.1645 0.1558 0.1472 0.1385
cosine_precision@5 0.1143 0.1169 0.1039 0.1117 0.1013
cosine_precision@10 0.0649 0.0649 0.0623 0.0662 0.0597
cosine_recall@1 0.039 0.026 0.0519 0.0519 0.0649
cosine_recall@3 0.4805 0.4935 0.4675 0.4416 0.4156
cosine_recall@5 0.5714 0.5844 0.5195 0.5584 0.5065
cosine_recall@10 0.6494 0.6494 0.6234 0.6623 0.5974
cosine_ndcg@10 0.3349 0.3382 0.338 0.3429 0.3229
cosine_mrr@10 0.2338 0.237 0.2458 0.2407 0.2348
cosine_map@100 0.2465 0.2486 0.2597 0.2508 0.2482

Training Details

Training Dataset

Unnamed Dataset

  • Size: 154 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 154 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 12.43 tokens
    • max: 20 tokens
    • min: 20 tokens
    • mean: 126.6 tokens
    • max: 378 tokens
  • Samples:
    anchor positive
    What kind of websites can you help us with? CLIENT TESTIMONIALS
    Worked with TCZ on two business critical website development projects. The TCZ team is a group of experts in their respective domains and have helped us with excellent end-to-end development of a website right from the conceptualization to implementation and maintenance. By Dr. Kunal Joshi - Healthcare Marketing & Strategy Professional

    TCZ helped us with our new website launch in a seamless manner. Through all our discussions, they made sure to have the website designed as we had envisioned it to be. Thank you team TCZ.
    By Dr. Sarita Ahlawat - Managing Director and Co-Founder, Botlab Dynamics
    What does DevSecOps mean? How do you ensure the security of our DevOps pipeline?
    Security is a top priority in our DevOps solutions. We implement DevSecOps practices, integrating security measures into the CI/CD pipeline from the outset. This includes automated security scans, compliance checks, and vulnerability assessments to ensure your infrastructure is secure
    do you work with tech like nlp ? What AI solutions does Techchefz specialize in?
    We specialize in a range of AI solutions including recommendation engines, NLP, computer vision, customer segmentation, predictive analytics, operational efficiency through machine learning, risk management, and conversational AI for customer service.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • gradient_accumulation_steps: 4
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • push_to_hub: True
  • hub_model_id: Shashwat13333/msmarco-distilbert-base-v4_1
  • push_to_hub_model_id: msmarco-distilbert-base-v4_1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: Shashwat13333/msmarco-distilbert-base-v4_1
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: msmarco-distilbert-base-v4_1
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.2 1 4.0076 - - - - -
1.0 5 4.8662 0.3288 0.3390 0.3208 0.3246 0.2749
2.0 10 4.1825 0.3288 0.3456 0.3306 0.3405 0.2954
3.0 15 3.048 0.3329 0.3313 0.3346 0.3392 0.3227
4.0 20 2.5029 0.3349 0.3382 0.338 0.3429 0.3229
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}