--- license: mit language: - en pipeline_tag: text-classification tags: - finance - topic-classification library_name: transformers widget: - text: unemployment hits record low as job opportunities soar --- `Topic-xDistil` is a model based on [`xtremedistil-l12-h384-uncased`](https://huggingface.co/microsoft/xtremedistil-l12-h384-uncased) fine-tuned for classifying the topic of news headlines on a dataset annotated by [Chat GPT 3.5](https://platform.openai.com/docs/models/gpt-3-5). It is built, together with [`Sentiment-xDistil`](https://huggingface.co/hakonmh/sentiment-xdistil-uncased), as a tool for filtering out financial news headlines and classifying their sentiment. The code used to train both models and build the dataset are found [here](https://github.com/hakonmh/distilnews). *Notes*: The output labels are either `Economics` or `Other`. The model is suitable for English. ## Performance Results Here are the performance metrics for both models on the test set: | Model | Test Set Size | Accuracy | F1 Score | | --- | --- | --- | --- | | `topic-xdistil-uncased` | 32 799 | 94.44 % | 92.59 % | | `sentiment-xdistil-uncased` | 17 527 | 94.59 % | 93.44 % | ## Data The training data consists of ~600k news headlines and tweets, and was annotated by [Chat GPT 3.5](https://platform.openai.com/docs/models/gpt-3-5), which has shown to [outperform crowd-workers for text annotation tasks](https://arxiv.org/pdf/2303.15056.pdf). The sentence labels are defined by the Chat GPT prompt as follows: ```python """ [...] - Economic headlines generally cover topics such as financial markets, \ business, financial assets, trade, employment, GDP, inflation, or fiscal \ and monetary policy. - Non-economic headlines might include sports, entertainment, politics, \ science, weather, health, or other unrelated news events. [...] """ ``` ## Example Usage Here's a simple example: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("hakonmh/topic-xdistil-uncased") tokenizer = AutoTokenizer.from_pretrained("hakonmh/topic-xdistil-uncased") SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!" inputs = tokenizer(SENTENCE, return_tensors="pt") output = model(**inputs).logits predicted_label = model.config.id2label[output.argmax(-1).item()] print(predicted_label) ``` ```text Economics ``` Or, as a pipeline together with `Sentiment-xDistil`: ```python from transformers import pipeline topic_classifier = pipeline("sentiment-analysis", model="hakonmh/topic-xdistil-uncased", tokenizer="hakonmh/topic-xdistil-uncased") sentiment_classifier = pipeline("sentiment-analysis", model="hakonmh/sentiment-xdistil-uncased", tokenizer="hakonmh/sentiment-xdistil-uncased") SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!" print(topic_classifier(SENTENCE)) print(sentiment_classifier(SENTENCE)) ``` ```text [{'label': 'Economics', 'score': 0.9970171451568604}] [{'label': 'Positive', 'score': 0.9997037053108215}] ``` Tested on `transformers` 4.30.1, and `torch` 2.0.0.