File size: 2,942 Bytes
af24f2b
 
 
 
 
82aa653
 
 
 
 
 
af24f2b
 
82aa653
af24f2b
82aa653
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d0e3169
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
language:
- en
pipeline_tag: text-classification
widget:
- text: "And it was great to see how our Chinese team very much aware of that and of shifting all the resourcing to really tap into these opportunities."
  example_title: "Examplary Transformation Sentence"
- text: "But we will continue to recruit even after that because we expect that the volumes are going to continue to grow."
  example_title: "Examplary Non-Transformation Sentence"
- text: "So and again, we'll be disclosing the current taxes that are there in Guyana, along with that revenue adjustment."
  example_title: "Examplary Non-Transformation Sentence" 
---

# TransformationTransformer

**TransformationTransformer** is a fine-tuned [distilroberta](https://huggingface.co/distilroberta-base) model. It is trained and evaluated on 10,000 manually annotated sentences gleaned from the Q&A-section of quarterly earnings conference calls. In particular, it was trained on sentences issued by firm executives to discriminate between setnences that allude to **business transformation** vis-à-vis those that discuss topics other than business transformations. More details about the training procedure can be found [below](#model-training).


## Background

Context on the project. 


## Usage

The model is intented to be used for sentence classification: It creates a contextual text representation from the input sentence and outputs a probability value. `LABEL_1` refers to a sentence that is predicted to contains transformation-related content (vice versa for `LABEL_0`). The query should consist of a single sentence.


## Usage (API)

```python
import json
import requests

API_TOKEN = <TOKEN>

headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://api-inference.huggingface.co/models/simonschoe/call2vec"

def query(payload):
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8"))

query({"inputs": "<insert-sentence-here>"})
```

## Usage (transformers)

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("simonschoe/TransformationTransformer")
model = AutoModelForSequenceClassification.from_pretrained("simonschoe/TransformationTransformer")

classifier = pipeline('text-classification',  model=model, tokenizer=tokenizer)
classifier('<insert-sentence-here>')
```


## Model Training

The model has been trained on text data stemming from earnings call transcripts. The data is restricted to a call's question-and-answer (Q&A) section and the remarks by firm executives. The data has been segmented into individual sentences using [`spacy`](https://spacy.io/).

**Statistics of Training Data:**
- Labeled sentences: 10,000
- Data distribution: xxx
- Inter-coder agreement: xxx

The following code snippets presents the training pipeline:
<link to script>