hse-teddy-bear
/

xlm-roberta-russian-stock-sentiment

Text Classification

Inference Endpoints

Model card Files Files and versions Community

xlm-roberta-russian-stock-sentiment / README.md

hse-teddy-bear's picture

Update README.md

203263b verified 7 months ago

|

2.3 kB

	---
	language:
	- ru
	- en
	license: mit
	tags:
	- finance
	- sentiment
	- stocks
	metrics:
	- accuracy
	widget:
	- text: Нуу, эту папиру надо лонговать!
	example_title: long sentiment
	- text: Не уверен. Нужно подумать, перед тем, как брать.
	example_title: neutral sentiment
	- text: Такое только хомяки берут. Нужно сливать эту бумажку поскорее.
	example_title: short sentiment
	---

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: Alexander Nikitin
	- Model type: XLM-RoBERTa-base Fine-Tuned on my labelled dataset
	- Language(s) (NLP): Russian, English
	- License: MIT
	- Finetuned from model: FacebookAI/xlm-roberta-base

	## Dataset

	This transformer model was fine-tuned on parsed comments from "Tinkoff Pulse".

	First step:
	Comments were preprocessed, for each stock ticker subcomment for ticker was extracted.
	Example: "{$GAZP} {$TCSG} {$RTKM} По газрому все хорошо. По Ростелекому не очень. Тинек идет вниз!" -> "{$GAZP} По газрому все хорошо."

	Next step:
	Labelling dataset of 10K preprocessed comments, evenly distributed from 10 russian stocks.
	Used Mistral-7b LLM to label comments on 3 categories: "buy" - if author wants or encourages to buy (long), "sell" - if author wants or encourages to sell or short, "neutral" - if this is news or we cannot say for sure.
	Plans for further research: label 100k comments and train on them.

	## Bias, Risks, and Limitations

	1. Model is trained on Russian/English comments;
	2. Model is not good at extracting sentiment from comments with bright keywords in different directions, like "I wanna sell. But probably I should buy back later.";
	3. Model performs good on short-medium texts like comments, which are usually skewed to one side (strong buy or strong sell).

	### Recommendations

	## How to Get Started with the Model

	Download the model with huggingface pipeline and use it!

	Labels:
	- LABEL_0 = SELL
	- LABEL_1 = NEUTRAL
	- LABEL_2 = BUY

	## Evaluation

	- Accuracy on validation dataset: 0.786
	- Notice: this is accuracy on ~1.5k comments.

	## Model Card Authors

	https://t.me/pivo_txt