Does not work at all, i tried to calculate cola

by zokica - opened Apr 15, 2023

Apr 15, 2023

•

edited Apr 15, 2023

I Tried to calculate cola as described here (https://huggingface.co/sileod/deberta-v3-base-tasksource-adapters), but it does not work at all

I get error:

reshaped_logits = logits.view(-1, num_choices)
RuntimeError: shape '[-1, 6]' is invalid for input of size 4

#############################################################
##############################################################
from tasknet import Adapter
from transformers import AutoModelForMultipleChoice,AutoTokenizer

model_name="sileod/deberta-v3-base-tasksource-nli"
tokenizer3 = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
model3 = AutoModelForMultipleChoice.from_pretrained(model_name,ignore_mismatched_sizes=True, cache_dir="/root/Desktop/models/", low_cpu_mem_usage=True)
adapter = Adapter.from_pretrained(model_name.replace('nli','adapters'))
model_for_rlhf = adapter.adapt_model_to_task(model3, 'glue/cola') #glue/cola #hh-rlhf

if 1==1:
def cola(sentences1):
import torch
import time
beg = time.time()
inputs = tokenizer3(sentences1, return_tensors="pt", padding=True, truncation=True, max_length=40).to("cpu")
with torch.no_grad():
outputs = model_for_rlhf(**inputs)

    the_cola_scores = []
    print("outputs.logits",outputs.logits)
    for aout in outputs.logits:
        cola_prediction = torch.nn.functional.softmax(aout)[1].item()
        the_cola_scores.append(round(cola_prediction,2))
    
    if 1==1:
        try: del cola_prediction
        except:pass    
    
    return the_cola_scores

import time
timea = time.time
sentences1 = ["I likes apples","I love apples."]
cola = cola(sentences1)
print(cola,timea - time.time )
####################################################
####################################################

sileod

Owner Apr 15, 2023

Hi!
For cola, you have to use auto model for sequence classification.

zokica

Apr 15, 2023

•

edited Apr 15, 2023

Hi, thanks, it work when I use model3 = AutoModelForSequenceClassification.from_pretrained(

However results are pretty bad, not sure why:
sentences1 = ["I was there because I like apples.","I are there because i loves apples."]
[0.59, 0.65]

So it comes out as if second sentence is better than the first one. I checked other models and probability for second sentence is low around 0.5 while the first should be around 0.99.
Logits are (first and second sentence) outputs.logits tensor([[ 0.0077, 0.3910],
[-0.1509, 0.4893]])

sileod

Owner Apr 15, 2023

It is possible that cola was undertrained. There are 500 tasks and cola is relatively small (8k samples).
I am currently running much longer tasksource pretraining. But it could also due failure in the tasknet code to load the adapter. I'll look into it monday and I'll get back to you

PS: I advise you to use pipeline to do what you do.

# !pip install tasknet
from tasknet import Adapter
from transformers import AutoModelForMultipleChoice, AutoModelForSequenceClassification, TextClassificationPipeline, AutoTokenizer

model_name="sileod/deberta-v3-base-tasksource-nli"
model = AutoModelForSequenceClassification.from_pretrained(model_name,ignore_mismatched_sizes=True)
adapter = Adapter.from_pretrained(model_name.replace('nli','adapters'))
model = adapter.adapt_model_to_task(model, 'glue/cola')

pipe = TextClassificationPipeline(
    model=model,
    tokenizer=AutoTokenizer.from_pretrained(model_name))
pipe(["We yelled ourselves.","We yelled ourselves hoarse."])

zokica

Apr 15, 2023

•

edited Apr 15, 2023

Ok, thanks very much for the fast answer. I think this model should be state of the art for Cola, all models are trained on just this dataset so it should be sufficient data.

What is strange is that I get the same result with adapter model and model without using the adapt_model_to_task. However, when replaced 'glue/cola' with other task it gave different output.

I tried the pipeline, it yields the same result.

sileod

Owner Apr 15, 2023

I think that FINE-TUNING this model on cola will lead to state of the art (for base-size)
You can see that it is ranked first on cola https://ibm.github.io/model-recycling/microsoft_deberta-v3-base_table

At the current state, it should have good results but not better than a model that is fine-tuned, deberta-v3-base-tasksource is currently trained for less than one epoch, as multi-task training is way longer.

zokica

Apr 17, 2023

Yes, deberta is beter even than roberta

sileod

Owner Apr 21, 2023

•

edited Apr 21, 2023

Hi.
There was a problem with the adapters.
The poolers were not shared during the multi-task training. They should have been included in the adapter. I am re-running the training from scratch (with more tasks as well) with a single pooler.
There will be a new version in a week, and pipeline loading will be much cleaner.

from tasknet import load_pipeline
pipeline = load_pipeline(model_name, task_name)
print(pipeline.model, pipeline('example to test'))

zokica

Apr 21, 2023

Awesome, thanks.

jiwan-chung

Apr 27, 2023

•

edited Apr 27, 2023

I'm looking forward to your update as well :)

zokica

Apr 27, 2023

You did the update?

sileod

Owner Jul 10, 2023

(It is done)

sileod changed discussion status to closed Jul 10, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment