---
license: mit
language:
- en
pipeline_tag: text-classification
tags:
  - finance
  - topic-classification
library_name: transformers
widget:
  - text: unemployment hits record low as job opportunities soar
---

`Topic-xDistil` is a model based on 
[`xtremedistil-l12-h384-uncased`](https://huggingface.co/microsoft/xtremedistil-l12-h384-uncased) 
fine-tuned for classifying the topic of news headlines on a dataset annotated by 
[Chat GPT 3.5](https://platform.openai.com/docs/models/gpt-3-5). It is built, together with 
[`Sentiment-xDistil`](https://huggingface.co/hakonmh/sentiment-xdistil-uncased), 
as a tool for filtering out financial news headlines and classifying their sentiment. 
The code used to train both models and build the dataset are found [here](https://github.com/hakonmh/distilnews). 

*Notes*: The output labels are either `Economics` or `Other`. The model is suitable for English. 

## Performance Results

Here are the performance metrics for both models on the test set:

| Model | Test Set Size | Accuracy | F1 Score |
| --- | --- | --- | --- |
| `topic-xdistil-uncased` | 32 799 | 94.44 % | 92.59 % |
| `sentiment-xdistil-uncased` | 17 527 | 94.59 % | 93.44 % |

## Data

The training data consists of ~600k news headlines and tweets, and was annotated by 
[Chat GPT 3.5](https://platform.openai.com/docs/models/gpt-3-5), which has shown to 
[outperform crowd-workers for text annotation tasks](https://arxiv.org/pdf/2303.15056.pdf).

The sentence labels are defined by the Chat GPT prompt as follows:
```python
"""
[...]
    - Economic headlines generally cover topics such as financial markets, \
 business, financial assets, trade, employment, GDP, inflation, or fiscal \
and monetary policy.
    - Non-economic headlines might include sports, entertainment, politics, \
science, weather, health, or other unrelated news events.
[...]
"""
```

## Example Usage

Here's a simple example:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("hakonmh/topic-xdistil-uncased")
tokenizer = AutoTokenizer.from_pretrained("hakonmh/topic-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
inputs = tokenizer(SENTENCE, return_tensors="pt")
output = model(**inputs).logits
predicted_label = model.config.id2label[output.argmax(-1).item()]

print(predicted_label)
```

```text
Economics
```

Or, as a pipeline together with `Sentiment-xDistil`:

```python
from transformers import pipeline

topic_classifier = pipeline("sentiment-analysis",
                            model="hakonmh/topic-xdistil-uncased",
                            tokenizer="hakonmh/topic-xdistil-uncased")
sentiment_classifier = pipeline("sentiment-analysis",
                                model="hakonmh/sentiment-xdistil-uncased",
                                tokenizer="hakonmh/sentiment-xdistil-uncased")

SENTENCE = "Global Growth Surges as New Technologies Drive Innovation and Productivity!"
print(topic_classifier(SENTENCE))
print(sentiment_classifier(SENTENCE))
```

```text
[{'label': 'Economics', 'score': 0.9970171451568604}]
[{'label': 'Positive', 'score': 0.9997037053108215}]
```

Tested on `transformers` 4.30.1, and `torch` 2.0.0.