Update README.md
Browse files
README.md
CHANGED
@@ -13,19 +13,19 @@ widget:
|
|
13 |
DistilCamemBERT-Sentiment
|
14 |
=========================
|
15 |
|
16 |
-
We present DistilCamemBERT-Sentiment which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) fine
|
17 |
|
18 |
-
This modelization is close to [tblard/tf-allocine](https://huggingface.co/tblard/tf-allocine) based on [CamemBERT](https://huggingface.co/camembert-base) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase for example. Indeed, inference cost can be a technological issue. To counteract this effect, we propose this modelization which **divides the inference time by
|
19 |
|
20 |
Dataset
|
21 |
-------
|
22 |
|
23 |
-
The dataset
|
24 |
-
* 1 star: represents a
|
25 |
* 2 stars: bad appreciation,
|
26 |
* 3 stars: neutral appreciation,
|
27 |
* 4 stars: good appreciation,
|
28 |
-
* 5 stars:
|
29 |
|
30 |
Evaluation results
|
31 |
------------------
|
@@ -48,10 +48,10 @@ where \\(\hat{f}_l\\) is the l-th largest predicted label, \\(y\\) the true labe
|
|
48 |
Benchmark
|
49 |
---------
|
50 |
|
51 |
-
This model is compared to 3 reference models (see below). As each model doesn't have the
|
52 |
|
53 |
#### bert-base-multilingual-uncased-sentiment
|
54 |
-
[nlptown/bert-base-multilingual-uncased-sentiment](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) is based on BERT model in the multilingual and uncased version. This sentiment analyzer is trained on Amazon reviews
|
55 |
|
56 |
| **model** | **time (ms)** | **exact accuracy (%)** | **top-2 acc (%)** |
|
57 |
| :-------: | :-----------: | :--------------------: | :---------------: |
|
@@ -96,6 +96,24 @@ result
|
|
96 |
'score': 0.13417290151119232}]
|
97 |
```
|
98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
Citation
|
100 |
--------
|
101 |
```bibtex
|
|
|
13 |
DistilCamemBERT-Sentiment
|
14 |
=========================
|
15 |
|
16 |
+
We present DistilCamemBERT-Sentiment, which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) fine-tuned for the sentiment analysis task for the French language. This model is built using two datasets: [Amazon Reviews](https://huggingface.co/datasets/amazon_reviews_multi) and [Allociné.fr](https://huggingface.co/datasets/allocine) to minimize the bias. Indeed, Amazon reviews are similar in messages and relatively shorts, contrary to Allociné critics, who are long and rich texts.
|
17 |
|
18 |
+
This modelization is close to [tblard/tf-allocine](https://huggingface.co/tblard/tf-allocine) based on [CamemBERT](https://huggingface.co/camembert-base) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase, for example. Indeed, inference cost can be a technological issue. To counteract this effect, we propose this modelization which **divides the inference time by two** with the same consumption power thanks to [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base).
|
19 |
|
20 |
Dataset
|
21 |
-------
|
22 |
|
23 |
+
The dataset comprises 204,993 reviews for training and 4,999 reviews for the test from Amazon, and 235,516 and 4,729 critics from [Allocine website](https://www.allocine.fr/). The dataset is labeled into five categories:
|
24 |
+
* 1 star: represents a terrible appreciation,
|
25 |
* 2 stars: bad appreciation,
|
26 |
* 3 stars: neutral appreciation,
|
27 |
* 4 stars: good appreciation,
|
28 |
+
* 5 stars: excellent appreciation.
|
29 |
|
30 |
Evaluation results
|
31 |
------------------
|
|
|
48 |
Benchmark
|
49 |
---------
|
50 |
|
51 |
+
This model is compared to 3 reference models (see below). As each model doesn't have the exact definition of targets, we detail the performance measure used for each. An **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** was used for the mean inference time measure.
|
52 |
|
53 |
#### bert-base-multilingual-uncased-sentiment
|
54 |
+
[nlptown/bert-base-multilingual-uncased-sentiment](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) is based on BERT model in the multilingual and uncased version. This sentiment analyzer is trained on Amazon reviews, similar to our model. Hence the targets and their definitions are the same.
|
55 |
|
56 |
| **model** | **time (ms)** | **exact accuracy (%)** | **top-2 acc (%)** |
|
57 |
| :-------: | :-----------: | :--------------------: | :---------------: |
|
|
|
96 |
'score': 0.13417290151119232}]
|
97 |
```
|
98 |
|
99 |
+
### Optimum + ONNX
|
100 |
+
|
101 |
+
```python
|
102 |
+
from optimum.onnxruntime import ORTModelForSequenceClassification
|
103 |
+
from transformers import AutoTokenizer, pipeline
|
104 |
+
|
105 |
+
HUB_MODEL = "cmarkea/distilcamembert-base-nli"
|
106 |
+
|
107 |
+
tokenizer = AutoTokenizer.from_pretrained(HUB_MODEL)
|
108 |
+
model = ORTModelForSequenceClassification.from_pretrained(HUB_MODEL)
|
109 |
+
onnx_qa = pipeline("text-classification", model=model, tokenizer=tokenizer)
|
110 |
+
|
111 |
+
# Quantized onnx model
|
112 |
+
quantized_model = ORTModelForSequenceClassification.from_pretrained(
|
113 |
+
HUB_MODEL, file_name="model_quantized.onnx"
|
114 |
+
)
|
115 |
+
```
|
116 |
+
|
117 |
Citation
|
118 |
--------
|
119 |
```bibtex
|