Model Card raicrits/BERT_ChangeOfTopic

bert-base-multilingual-cased finetuned to be capable of detecting a change of topic in a given text.

Model Description

The model is finetuned for the specific task of detecting a change of topic in a given text. Given a text the model answers with "1" in the case that it detects a change of topic and "0" otherwise. The training has been done using the chapters in the Youtube videos contained in the train split of the dataset raicrits/YouTube_RAI_dataset.

  • Developed by: Stefano Scotta (stefano.scotta@rai.it)
  • Model type: LLM finetuned on the specific task of detect a change of topic in a given text
  • Language(s) (NLP): Italian
  • License: unknown
  • Finetuned from model [optional]: bert-base-multilingual-cased

Uses

The model can be used to check if in a given text occurs a change of topic or not.

How to Get Started with the Model

Use the code below to get started with the model.

Usage: Use the code below to get started with the model.


import torch
from transformers import AutoTokenizer, BertForSequenceClassification, BertTokenizer, AutoModelForCausalLM, pipeline


model_bert = torch.load('raicrits/BERT_ChangeOfTopic')
model_bert = model_bert.to(device_bert)

tokenizer_bert = AutoTokenizer.from_pretrained('bert-base-multilingual-cased')

encoded_dict = tokenizer_bert.encode_plus(
                   '<text>',                     
                   add_special_tokens = True, 
                   max_length = 256,
                 # max_length = min(max_len, 512),           
                   truncation = True,
                   padding='max_length',
                   return_attention_mask = True,
                   return_tensors = 'pt',
              )
input_ids = encoded_dict['input_ids'].to(device_bert)
input_mask = encoded_dict['attention_mask'].to(device_bert)
with torch.no_grad():        
   output= model_bert(input_ids, 
                          token_type_ids=None, 
                          attention_mask=input_mask)
   logits = output.logits
   logits = logits.detach().cpu().numpy()
   pred_flat = np.argmax(logits, axis=1).flatten()
print(pred_flat[0])

Training Details

Training Data

Chapters in the Youtube videos contained in the train split of the dataset raicrits/YouTube_RAI_dataset

Training Procedure

Training setting:

  • train epochs=18,

  • learning_rate=2e-05

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: 1 NVIDIA A100/40Gb
  • Hours used: 20
  • Cloud Provider: Private Infrastructure
  • Carbon Emitted: 2.38kg eq. CO2

Model Card Authors

Stefano Scotta (stefano.scotta@rai.it)

Model Card Contact

stefano.scotta@rai.it

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train raicrits/BERT_ChangeOfTopic