competitions/news-unmasked · Clarifying the competition's metric

May 6, 2023

•

edited May 6, 2023

Thank you for organizing this competition :) @srishtiy

I have a few questions about the competition's evaluation metric.
I hope the answers can help me to improve my local validation scheme.

Which embeddings are being used to compute cosine similarity?
Given that there are 7766 masks in the test set and cosine similarity for each mask is in [-1, 1].
Why are the scores on LB so low? I assumed you build a simple sum over the similarities. Is this assumption wrong?

srishtiy

May 7, 2023

•

edited May 7, 2023

Thanks for the questions, @gregorlied :) Please find the answers below:

| Which embeddings are being used to compute cosine similarity?
We use the cosine similarity of the predicted words with the groundtruth, details here: https://huggingface.co/spaces/competitions/news-unmasked.

|Given that there are 7766 masks in the test set and cosine similarity for each mask is in [-1, 1].
|Why are the scores on LB so low? I assumed you build a simple sum over the similarities. Is this assumption wrong?
Good question. Cosine similarities are indeed [-1,1]. However, for the final accuracy we take the threshold of 0.5 i.e. if the predicted word and masked word cosine similarity is >=0.5, it's calculated as a correct prediction (this helps give equal weights to words like "death" and "buried" for example). Final accuracy is (total correct predictions)/ ( total words)I will add this details where we mention the metrics. Apologies for missing this detail, it's now on the page.

A pseudo code snipped example would be as below:

# cosine_sim = cosine similarity between the predicted and masked words
# cosine_sim_threshold = 0.5

if cosine_sim >= cosine_sim_threshold:
    num_correct += 1
total_pairs += 1

def calculate_accuracy(self):
    # Calculate the accuracy of the model based on the cosine similarity threshold
    accuracy = num_correct / total_pairs * 100
    return accuracy

soarerz

May 10, 2023

•

edited May 10, 2023

Hi @srishtiy - since cosine similarity can be used for any kind of embedding - is there a specific one used for this competition? Thanks!

srishtiy

May 10, 2023

•

edited May 10, 2023

@soarerz @gregorlied Apologies for missing the detail. We use en_core_web_lgfrom spacy. Link: https://spacy.io/models/en#en_core_web_lg. Please let me know if that helps and if there are more questions.