raicrits
/

BERT_ChangeOfTopic

Text Classification

Inference Endpoints

Model card Files Files and versions Community

BERT_ChangeOfTopic / README.md

stefanoscotta's picture

Update README.md

19629f7 verified 6 months ago

|

3.74 kB

	---
	license: unknown
	datasets:
	- raicrits/YouTube_RAI_dataset
	language:
	- it
	pipeline_tag: text-classification
	tags:
	- LLM
	- Italian
	- Classification
	- BERT
	- Topics
	library_name: transformers
	---

	---

	# Model Card raicrits/BERT_ChangeOfTopic

	<!-- Provide a quick summary of what the model is/does. -->

	[bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) finetuned to be capable of detecting
	a change of topic in a given text.


	### Model Description


	The model is finetuned for the specific task of detecting a change of topic in a given text. Given a text the model answers with "1" in the case that it detects a change of topic and "0" otherwise.
	The training has been done using the chapters in the Youtube videos contained in the train split of the dataset [raicrits/YouTube_RAI_dataset](https://huggingface.co/meta-llama/raicrits/YouTube_RAI_dataset).


	- Developed by: Stefano Scotta (stefano.scotta@rai.it)
	- Model type: LLM finetuned on the specific task of detect a change of topic in a given text
	- Language(s) (NLP): Italian
	- License: unknown
	- Finetuned from model [optional]: [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)


	## Uses

	The model can be used to check if in a given text occurs a change of topic or not.

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->


	## How to Get Started with the Model

	Use the code below to get started with the model.

	Usage:
	Use the code below to get started with the model.
	``` python

	import torch
	from transformers import AutoTokenizer, BertForSequenceClassification, BertTokenizer, AutoModelForCausalLM, pipeline


	model_bert = torch.load('raicrits/BERT_ChangeOfTopic')
	model_bert = model_bert.to(device_bert)

	tokenizer_bert = AutoTokenizer.from_pretrained('bert-base-multilingual-cased')

	encoded_dict = tokenizer_bert.encode_plus(
	'<text>',
	add_special_tokens = True,
	max_length = 256,
	# max_length = min(max_len, 512),
	truncation = True,
	padding='max_length',
	return_attention_mask = True,
	return_tensors = 'pt',
	)
	input_ids = encoded_dict['input_ids'].to(device_bert)
	input_mask = encoded_dict['attention_mask'].to(device_bert)
	with torch.no_grad():
	output= model_bert(input_ids,
	token_type_ids=None,
	attention_mask=input_mask)
	logits = output.logits
	logits = logits.detach().cpu().numpy()
	pred_flat = np.argmax(logits, axis=1).flatten()
	print(pred_flat[0])
	```

	## Training Details

	### Training Data

	Chapters in the Youtube videos contained in the train split of the dataset [raicrits/YouTube_RAI_dataset](https://huggingface.co/meta-llama/raicrits/YouTube_RAI_dataset)

	### Training Procedure


	Training setting:
	- train epochs=18,

	- learning_rate=2e-05


	## Environmental Impact

	<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).

	- Hardware Type: 1 NVIDIA A100/40Gb
	- Hours used: 20
	- Cloud Provider: Private Infrastructure
	- Carbon Emitted: 2.38kg eq. CO2

	## Model Card Authors

	Stefano Scotta (stefano.scotta@rai.it)

	## Model Card Contact

	stefano.scotta@rai.it