|
--- |
|
language: |
|
- ar |
|
- fr |
|
- es |
|
- de |
|
- el |
|
- bg |
|
- ru |
|
- tr |
|
- vi |
|
- th |
|
- zh |
|
- hi |
|
- sw |
|
- ur |
|
datasets: |
|
- xnli |
|
- Babelscape/REDFM |
|
widget: |
|
- text: >- |
|
The Red Hot Chili Peppers were formed in Los Angeles by Kiedis, Flea, Hillel |
|
Slovak and Jack Irons. [SEP] Jack Irons place of birth Los Angeles |
|
--- |
|
|
|
# Model Card for mdeberta-v3-base-triplet-critic-xnli |
|
|
|
<!-- Provide a quick summary of what the model is/does. [Optional] --> |
|
This is the Triplit Critic model presented in the ACL 2023 paper [RED^{FM}: a Filtered and Multilingual Relation Extraction Dataset](https://arxiv.org/abs/2306.09802). If you use the model, please reference this work in your paper: |
|
|
|
@inproceedings{huguet-cabot-et-al-2023-redfm-dataset, |
|
title = "RED$^{\rm FM}$: a Filtered and Multilingual Relation Extraction Dataset", |
|
author = "Huguet Cabot, Pere-Llu{\'\i}s and Tedeschi, Simone and Ngonga Ngomo, Axel-Cyrille and |
|
Navigli, Roberto", |
|
booktitle = "Proc. of the 61st Annual Meeting of the Association for Computational Linguistics: ACL 2023", |
|
month = jul, |
|
year = "2023", |
|
address = "Toronto, Canada", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/2306.09802", |
|
} |
|
|
|
The Triplit Critic is based on mdeberta-v3-base and it was trained as a multitask system to filter triplets as well as on the XNLI dataset. The model weights contain the two classification heads, however loading it using the huggingface library will only load those for Triplet filtering (ie. a binary classification head), if one wants to use it for XNLI it needs a custom script. While it is defined and trained as a classification system, we use the positive score (ie. Label_1) as the confidence score for a triplet. For SRED<sup>FM</sup> the confidence score thresshold was set at 0.75. |
|
|
|
|
|
|
|
|
|
To load the multitask model: |
|
```python |
|
from transformers import DebertaV2PreTrainedModel, DebertaV2Model |
|
from torch import nn |
|
from transformers.models.deberta_v2.modeling_deberta_v2 import * |
|
from transformers.file_utils import ModelOutput |
|
|
|
@dataclass |
|
class TXNLIClassifierOutput(ModelOutput): |
|
""" |
|
Base class for outputs of sentence classification models. |
|
|
|
Args: |
|
loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided): |
|
Classification (or regression if config.num_labels==1) loss. |
|
logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.num_labels)`): |
|
Classification (or regression if config.num_labels==1) scores (before SoftMax). |
|
logits_xnli (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, config.num_labels)`): |
|
Classification (or regression if config.num_labels==1) scores (before SoftMax). |
|
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``): |
|
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) |
|
of shape :obj:`(batch_size, sequence_length, hidden_size)`. |
|
|
|
Hidden-states of the model at the output of each layer plus the initial embedding outputs. |
|
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``): |
|
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape :obj:`(batch_size, num_heads, |
|
sequence_length, sequence_length)`. |
|
|
|
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention |
|
heads. |
|
""" |
|
|
|
loss: Optional[torch.FloatTensor] = None |
|
logits: torch.FloatTensor = None |
|
logits_xnli: torch.FloatTensor = None |
|
hidden_states: Optional[Tuple[torch.FloatTensor]] = None |
|
attentions: Optional[Tuple[torch.FloatTensor]] = None |
|
|
|
class DebertaV2ForTripletClassification(DebertaV2PreTrainedModel): |
|
def __init__(self, config): |
|
super().__init__(config) |
|
|
|
num_labels = getattr(config, "num_labels", 2) |
|
self.num_labels = num_labels |
|
|
|
self.deberta = DebertaV2Model(config) |
|
self.pooler = ContextPooler(config) |
|
output_dim = self.pooler.output_dim |
|
|
|
self.classifier = nn.Linear(output_dim, num_labels) |
|
drop_out = getattr(config, "cls_dropout", None) |
|
drop_out = self.config.hidden_dropout_prob if drop_out is None else drop_out |
|
self.dropout = StableDropout(drop_out) |
|
self.classifier_xnli = nn.Linear(output_dim, 3) |
|
|
|
# Initialize weights and apply final processing |
|
self.post_init() |
|
|
|
def get_input_embeddings(self): |
|
return self.deberta.get_input_embeddings() |
|
|
|
def set_input_embeddings(self, new_embeddings): |
|
self.deberta.set_input_embeddings(new_embeddings) |
|
|
|
@add_start_docstrings_to_model_forward(DEBERTA_INPUTS_DOCSTRING.format("batch_size, sequence_length")) |
|
def forward( |
|
self, |
|
input_ids=None, |
|
attention_mask=None, |
|
token_type_ids=None, |
|
position_ids=None, |
|
inputs_embeds=None, |
|
labels=None, |
|
output_attentions=None, |
|
output_hidden_states=None, |
|
return_dict=None, |
|
): |
|
r""" |
|
labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*): |
|
Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), |
|
If `config.num_labels > 1` a classification loss is computed (Cross-Entropy). |
|
""" |
|
return_dict = return_dict if return_dict is not None else self.config.use_return_dict |
|
|
|
outputs = self.deberta( |
|
input_ids, |
|
token_type_ids=token_type_ids, |
|
attention_mask=attention_mask, |
|
position_ids=position_ids, |
|
inputs_embeds=inputs_embeds, |
|
output_attentions=output_attentions, |
|
output_hidden_states=output_hidden_states, |
|
return_dict=return_dict, |
|
) |
|
|
|
encoder_layer = outputs[0] |
|
pooled_output = self.pooler(encoder_layer) |
|
pooled_output = self.dropout(pooled_output) |
|
logits = self.classifier(pooled_output) |
|
logits_xnli = self.classifier_xnli(pooled_output) |
|
|
|
loss = None |
|
if labels is not None: |
|
if labels.dtype != torch.bool: |
|
loss_fct = CrossEntropyLoss() |
|
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1)) |
|
else: |
|
loss_fct = BCEWithLogitsLoss() |
|
loss = loss_fct(logits_xnli.view(-1, 3), labels.view(-1).long()) |
|
if not return_dict: |
|
output = (logits,) + outputs[1:] |
|
return ((loss,) + output) if loss is not None else output |
|
|
|
return TXNLIClassifierOutput( |
|
loss=loss, logits=logits, logits_xnli=logits_xnli, hidden_states=outputs.hidden_states, attentions=outputs.attentions |
|
) |
|
``` |
|
|
|
|
|
## License |
|
|
|
This model is licensed under the CC BY-SA 4.0 license. The text of the license can be found [here](https://creativecommons.org/licenses/by-sa/4.0/). |