Merge pull request #27 from dahongj/milestone-4
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ license: unknown
|
|
11 |
---
|
12 |
|
13 |
# csuy-4613-Project
|
14 |
-
Milestone 1
|
15 |
|
16 |
The operating system that is being used is Windows 10 Home. In order to run Docker on this operating system, a
|
17 |
Windows Subsystem for Linux (WSL) must be used.
|
@@ -31,7 +31,7 @@ that the docker is working.
|
|
31 |
|
32 |
![docker](https://user-images.githubusercontent.com/33811542/227808275-baf0dec3-181c-4b04-beeb-b42c35667edb.jpg)
|
33 |
|
34 |
-
Milestone 2
|
35 |
|
36 |
Hugging Face URL:
|
37 |
https://huggingface.co/spaces/dahongj/sentiment-analysis
|
@@ -44,7 +44,21 @@ https://huggingface.co/finiteautomata/bertweet-base-sentiment-analysis
|
|
44 |
|
45 |
https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest
|
46 |
|
47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
Finetuned Model URL: https://huggingface.co/dahongj/finetuned_toxictweets
|
50 |
|
@@ -57,4 +71,18 @@ into variables and ran through a Dataset class. A tokenizer for Distilbert was c
|
|
57 |
Then using the multivariable version of the distilbert-base-uncased model because there are 6 forms
|
58 |
of toxicity included in the dataset that we want to finetune for. Using the native pytorch method
|
59 |
of training as demonstrated on the HuggingFace documentation, the model was trained and evaluated.
|
60 |
-
Both the finetuned model and its tokenizer are saved and uploaded onto HuggingFace.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
---
|
12 |
|
13 |
# csuy-4613-Project
|
14 |
+
## Milestone 1
|
15 |
|
16 |
The operating system that is being used is Windows 10 Home. In order to run Docker on this operating system, a
|
17 |
Windows Subsystem for Linux (WSL) must be used.
|
|
|
31 |
|
32 |
![docker](https://user-images.githubusercontent.com/33811542/227808275-baf0dec3-181c-4b04-beeb-b42c35667edb.jpg)
|
33 |
|
34 |
+
## Milestone 2
|
35 |
|
36 |
Hugging Face URL:
|
37 |
https://huggingface.co/spaces/dahongj/sentiment-analysis
|
|
|
44 |
|
45 |
https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest
|
46 |
|
47 |
+
In order to use the HuggingFace space for our application, we had to create an empty model on Huggingface
|
48 |
+
initially. From there, we included the information box as shown on the top of this README. We created a
|
49 |
+
secret key token on github that is linked with our HuggingFace account and used that key to create an
|
50 |
+
action file or .yml. This file ensures that everytime there is an update to main, the website on HuggingFace
|
51 |
+
would start building based on the updated code.
|
52 |
+
|
53 |
+
By using the streamlit library, we were able to incorporate the pipeline function which allows us to access
|
54 |
+
and use a pretrained model from HuggingFace with ease. We created an app interface with includes a textbox
|
55 |
+
and a selection menu which allows users to input any text into the textbox before selecting the model that
|
56 |
+
they would like to use. The models used are listed above. Each of the models output a resulting sentiment
|
57 |
+
analysis of the input text as well as a probability score, which we used with the help of the pipeline
|
58 |
+
functionality to output back onto HuggingFace's interface for the user to see. This was done for all three
|
59 |
+
models.
|
60 |
+
|
61 |
+
## Milestone 3
|
62 |
|
63 |
Finetuned Model URL: https://huggingface.co/dahongj/finetuned_toxictweets
|
64 |
|
|
|
71 |
Then using the multivariable version of the distilbert-base-uncased model because there are 6 forms
|
72 |
of toxicity included in the dataset that we want to finetune for. Using the native pytorch method
|
73 |
of training as demonstrated on the HuggingFace documentation, the model was trained and evaluated.
|
74 |
+
Both the finetuned model and its tokenizer are saved and uploaded onto HuggingFace.
|
75 |
+
|
76 |
+
## Milestone 4
|
77 |
+
|
78 |
+
Results:
|
79 |
+
The resulting web application on HuggingFace is a sentiment analysis application that allows users
|
80 |
+
to input a text of any kind and receive results of the toxicity levels. The first three pretrained
|
81 |
+
model has only two variances in the output, stating whether the text is majority positive or negative
|
82 |
+
as well as the degree that it is so. The fourth option on the selection bar allows users to select
|
83 |
+
our finetuned model, which determines six levels of toxicity: toxic, severe_toxic, obscene, insult,
|
84 |
+
threat, and identity_hate. This option determines the initial toxicity level as well as the second
|
85 |
+
highest level of toxicity. An example of 10 texts and their results are shown as an image on the website.
|
86 |
+
|
87 |
+
The landing page for the application and a video to demonstrate how to use the application are included
|
88 |
+
on this github.
|