--- language: - en library_name: transformers datasets: - facebook/anli - zen-E/ANLI-simcse-roberta-large-embeddings-pca-256 metrics: - spearmanr - pearsonr --- The model is trained by knowledge distillation between the "princeton-nlp/unsup-simcse-roberta-large" and "zen-E/bert-mini-sentence-distil-unsupervised" on the 'ANLI'. The model can perform inferencing by Automodel. The model achieves 0.836 and 0.840 for pearsonr and spearmanr respectively on STS-b test dataset. For more training detail, the training config and the pytorch forward function is as follows. The teacher's fearure is first transform to a size of 256 by the PCA object in "zen-E/bert-mini-sentence-distil-unsupervised" which can be loaded by: ```python import joblib pca = joblib.load('ANLI-simcse-roberta-large-embeddings-pca-256/pca_model.sav') features_256 = pca.transform(features) ``` ```python config = { 'epoch' = 10, 'learning_rate' = 5e-5, 'batch_size' = 512, 'temperature' = 0.05 } ``` ```python def forward_cos_mse_kd(self, sentence1s, sentence2s, sentence3s, teacher_sentence1_embs, teacher_sentence2_embs, teacher_sentence3_embs): """forward function for the ANLI dataset""" _, o1 = self.bert(**sentence1s) _, o2 = self.bert(**sentence2s) _, o3 = self.bert(**sentence3s) # compute student's cosine similarity between sentences cos_o1_o2 = cosine_sim(o1, o2) cos_o1_o3 = cosine_sim(o1, o3) # compute teacher's cosine similarity between sentences cos_o1_o2_t = cosine_sim(teacher_sentence1_embs, teacher_sentence2_embs) cos_o1_o3_t = cosine_sim(teacher_sentence1_embs, teacher_sentence3_embs) cos_sim = torch.cat((cos_o1_o2, cos_o1_o3), dim=-1) cos_sim_t = torch.cat((cos_o1_o2_t, cos_o1_o3_t), dim=-1) # KL Divergence between student and teacher probabilities soft_teacher_probs = F.softmax(cos_sim_t / self.temperature, dim=1) kd_cos_loss = F.kl_div(F.log_softmax(cos_sim / self.temperature, dim=1), soft_teacher_probs, reduction='batchmean') # mse loss o = torch.cat([o1, o2, o3], dim=0) teacher_embs = torch.cat([teacher_sentence1_embs, teacher_sentence2_embs, teacher_sentence3_embs], dim=0) kd_mse_loss = nn.MSELoss()(o, teacher_embs)/3 # equal weight for the two losses total_loss = kd_cos_loss*0.5+kd_mse_loss*0.5 return total_loss, kd_cos_loss, kd_mse_loss ```