File size: 7,059 Bytes
1b0d91b
 
 
dec0d0e
1b0d91b
 
dec0d0e
19aab71
dec0d0e
19aab71
dec0d0e
4490f2c
dec0d0e
19aab71
e882bc6
 
19aab71
 
e882bc6
 
 
 
 
 
 
 
 
 
 
 
47243b7
 
e882bc6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19aab71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e882bc6
 
 
19aab71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e882bc6
 
19aab71
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3ffd10a
 
dec0d0e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
---
language:
- en
license: apache-2.0
---

# LLM user flow classification

This a ONNX quantized model and is fined-tuned version of [MiniLMv2-L6-H384](https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large).

This model identifies common events and patterns within the conversation flow. Such events include, for example, complaint, when a user expresses dissatisfaction.

This model is used *only* for the user texts. For the LLM texts in the dialog use this [agent model](https://huggingface.co/minuva/MiniLMv2-agentflow-v2-onnx). 


# Optimum

## Installation

Install from source: 
```bash
python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git
```


## Run the Model
```py
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

model = ORTModelForSequenceClassification.from_pretrained('minuva/MiniLMv2-userflow-v2-onnx', provider="CPUExecutionProvider")
tokenizer = AutoTokenizer.from_pretrained('minuva/MiniLMv2-userflow-v2-onnx', use_fast=True, model_max_length=256, truncation=True, padding='max_length')

pipe = pipeline(task='text-classification', model=model, tokenizer=tokenizer, )
texts = ["that's wrong", "can you please answer me?"]
pipe(texts)
# [{'label': 'model_wrong_or_try_again', 'score': 0.9737648367881775},
# {'label': 'user_wants_agent_to_answer', 'score': 0.9105103015899658}]
```


# ONNX Runtime only

A lighter solution for deployment

## Installation 

```bash
pip install tokenizers
pip install onnxruntime
git clone https://huggingface.co/minuva/MiniLMv2-userflow-v2-onnx
```


## Run the Model

```py
import os
import numpy as np
import json

from tokenizers import Tokenizer
from onnxruntime import InferenceSession


model_name = "minuva/MiniLMv2-userflow-v2-onnx"

tokenizer = Tokenizer.from_pretrained(model_name)
tokenizer.enable_padding(
    pad_token="<pad>",
    pad_id=1,
)
tokenizer.enable_truncation(max_length=256)
batch_size = 16

texts = ["that's wrong", "can you please answer me?"]


outputs = []
model = InferenceSession("MiniLMv2-userflow-v2-onnx/model_optimized_quantized.onnx", providers=['CPUExecutionProvider'])

with open(os.path.join("MiniLMv2-userflow-v2-onnx", "config.json"), "r") as f:
            config = json.load(f)

output_names = [output.name for output in model.get_outputs()]
input_names = [input.name for input in model.get_inputs()]

for subtexts in np.array_split(np.array(texts), len(texts) // batch_size + 1):
            encodings = tokenizer.encode_batch(list(subtexts))
            inputs = {
                "input_ids": np.vstack(
                    [encoding.ids for encoding in encodings],
                ),
                "attention_mask": np.vstack(
                    [encoding.attention_mask for encoding in encodings],
                ),
                "token_type_ids": np.vstack(
                    [encoding.type_ids for encoding in encodings],
                ),
            }

            for input_name in input_names:
                if input_name not in inputs:
                    raise ValueError(f"Input name {input_name} not found in inputs")

            inputs = {input_name: inputs[input_name] for input_name in input_names}
            output = np.squeeze(
                np.stack(
                    model.run(output_names=output_names, input_feed=inputs)
                ),
                axis=0,
            )
            outputs.append(output)

outputs = np.concatenate(outputs, axis=0)
scores = 1 / (1 + np.exp(-outputs))
results = []
for item in scores:
    labels = []
    scores = []
    for idx, s in enumerate(item):
        labels.append(config["id2label"][str(idx)])
        scores.append(float(s))
    results.append({"labels": labels, "scores": scores})


res = []

for result in results:
    joined = list(zip(result['labels'], result['scores']))
    max_score = max(joined, key=lambda x: x[1])    
    res.append(max_score)

res
#[('model_wrong_or_try_again', 0.9737648367881775),
# ('user_wants_agent_to_answer', 0.9105103015899658)]
```

# Categories Explanation

<details>
  <summary>Click to expand!</summary>
  
  - OTHER: Responses that do not fit into any predefined categories or are outside the scope of the specific interaction types listed.

  - agrees_praising_thanking: When the user agrees with the provided information, offers praise, or expresses gratitude.

  - asks_source: The user requests the source of the information or the basis for the answer provided.

  - continue: Indicates a prompt for the conversation to proceed or continue without a specific directional change.

   - continue_or_finnish_code: Signals either to continue with the current line of discussion or code execution, or to conclude it.

  - improve_or_modify_answer: The user requests an improvement or modification to the provided answer.

  -  lack_of_understandment: Reflects the user's or agent confusion or lack of understanding regarding the information provided.

   - model_wrong_or_try_again: Indicates that the model's response was incorrect or unsatisfactory, suggesting a need to attempt another answer.

   - more_listing_or_expand: The user requests further elaboration, expansion from the given list by the agent.

   - repeat_answers_or_question: The need to reiterate a previous answer or question.

  - request_example: The user asks for examples to better understand the concept or answer provided.

  - user_complains_repetition: The user notes that the information or responses are repetitive, indicating a need for new or different content.

  - user_doubts_answer: The user expresses skepticism or doubt regarding the accuracy or validity of the provided answer.

  - user_goodbye: The user says goodbye to the agent.

  - user_reminds_question: The user reiterates the question.

  - user_wants_agent_to_answer: The user explicitly requests a response from the agent, when the agent refuses to do so.

  - user_wants_explanation: The user seeks an explanation behind the information or answer provided.

  - user_wants_more_detail: Indicates the user's desire for more comprehensive or detailed information on the topic.

  - user_wants_shorter_longer_answer: The user requests that the answer be condensed or expanded to better meet their informational needs.

  - user_wants_simplier_explanation: The user seeks a simpler, more easily understood explanation.

  - user_wants_yes_or_no: The user is asking for a straightforward affirmative or negative answer, without additional detail or explanation.
</details>

<br>


# Metrics in our private test dataset
| Model (params)    |    Loss      |    Accuracy |  F1 |
|--------------------|-------------|----------|--------| 
| minuva/MiniLMv2-userflow-v2 (33M) |   0.6738 | 0.7236 | 0.7313 |
| minuva/MiniLMv2-userflow-v2-onnx (33M) |   - | 0.7195 | 0.7189 |

# Deployment

Check [our llm-flow-classification repository](https://github.com/minuva/llm-flow-classification) for a FastAPI and ONNX based server to deploy this model on CPU devices.