The model is trained by knowledge distillation between the "princeton-nlp/unsup-simcse-roberta-large" and "prajjwal1/bert-mini" on the 'ffgcc/NEWS5M'.

The model can perform inferenced by Automodel.

The model achieves 0.825 and 0.83 for pearsonr and spearmanr respectively on STS-b test dataset.

For more training detail, the training config and the pytorch forward function is as follows:

config = {
  'epoch' = 200,
  'learning_rate' = 3e-4,
  'batch_size' = 12288,
  'temperature' = 0.05
}
def forward_cos_mse_kd_unsup(self, sentences, teacher_sentence_embs):
    """forward function for the unsupervised News5M dataset"""
    _, o = self.bert(**sentences)

    # cosine similarity between the first half batch and the second half batch
    half_batch = o.size(0) // 2
    higher_half = half_batch * 2 #skip the last datapoint when the batch size number is odd
    
    cos_sim = cosine_sim(o[:half_batch], o[half_batch:higher_half])
    cos_sim_teacher = cosine_sim(teacher_sentence_embs[:half_batch], teacher_sentence_embs[half_batch:higher_half])

    # KL Divergence between student and teacher probabilities
    soft_teacher_probs = F.softmax(cos_sim_teacher / self.temperature, dim=1)
    kd_contrastive_loss = F.kl_div(F.log_softmax(cos_sim / self.temperature, dim=1),
                            soft_teacher_probs,
                            reduction='batchmean')

    # MSE loss
    kd_mse_loss = nn.MSELoss()(o, teacher_sentence_embs)/3

    # equal weight for the two losses
    total_loss = kd_contrastive_loss*0.5 + kd_mse_loss*0.5

    return total_loss, kd_contrastive_loss, kd_mse_loss
Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train zen-E/bert-mini-sentence-distil-unsupervised