license: unknown
datasets:
- raicrits/YouTube_RAI_dataset
language:
- it
pipeline_tag: text-classification
tags:
- LLM
- Italian
- Classification
- BERT
- Topics
library_name: transformers
Model Card raicrits/BERT_ChangeOfTopic
bert-base-multilingual-cased finetuned to be capable of detecting a change of topic in a given text.
Model Description
The model is finetuned for the specific task of detecting a change of topic in a given text. Given a text the model answers with "1" in the case that it detects a change of topic and "0" otherwise. The training has been done using the chapters in the Youtube videos contained in the train split of the dataset raicrits/YouTube_RAI_dataset.
- Developed by: Stefano Scotta (stefano.scotta@rai.it)
- Model type: LLM finetuned on the specific task of detect a change of topic in a given text
- Language(s) (NLP): Italian
- License: unknown
- Finetuned from model [optional]: bert-base-multilingual-cased
Uses
The model can be used to check if in a given text occurs a change of topic or not.
How to Get Started with the Model
Use the code below to get started with the model.
Usage: Use the code below to get started with the model.
import torch
from transformers import AutoTokenizer, BertForSequenceClassification, BertTokenizer, AutoModelForCausalLM, pipeline
model_bert = torch.load('raicrits/BERT_ChangeOfTopic')
model_bert = model_bert.to(device_bert)
tokenizer_bert = AutoTokenizer.from_pretrained('bert-base-multilingual-cased')
encoded_dict = tokenizer_bert.encode_plus(
'<text>',
add_special_tokens = True,
max_length = 256,
# max_length = min(max_len, 512),
truncation = True,
padding='max_length',
return_attention_mask = True,
return_tensors = 'pt',
)
input_ids = encoded_dict['input_ids'].to(device_bert)
input_mask = encoded_dict['attention_mask'].to(device_bert)
with torch.no_grad():
output= model_bert(input_ids,
token_type_ids=None,
attention_mask=input_mask)
logits = output.logits
logits = logits.detach().cpu().numpy()
pred_flat = np.argmax(logits, axis=1).flatten()
print(pred_flat[0])
Training Details
Training Data
Chapters in the Youtube videos contained in the train split of the dataset raicrits/YouTube_RAI_dataset
Training Procedure
Training setting:
train epochs=18,
learning_rate=2e-05
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: 1 NVIDIA A100/40Gb
- Hours used: 20
- Cloud Provider: Private Infrastructure
- Carbon Emitted: 2.38kg eq. CO2
Model Card Authors
Stefano Scotta (stefano.scotta@rai.it)