Himanshu Mohanty commited on
Commit
3ecbc02
Β·
unverified Β·
1 Parent(s): 8766819

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TOSRoberta: Terms of Service Analyzer πŸ“œπŸ€–
2
+
3
+ [![GitHub license](https://img.shields.io/github/license/HimanshuMohanty-Git24/TOSRoberta.svg)](https://github.com/HimanshuMohanty-Git24/TOSRoberta/blob/main/LICENSE)
4
+ [![GitHub stars](https://img.shields.io/github/stars/HimanshuMohanty-Git24/TOSRoberta.svg)](https://github.com/HimanshuMohanty-Git24/TOSRoberta/stargazers)
5
+ [![GitHub forks](https://img.shields.io/github/forks/HimanshuMohanty-Git24/TOSRoberta.svg)](https://github.com/HimanshuMohanty-Git24/TOSRoberta/network)
6
+ [![GitHub issues](https://img.shields.io/github/issues/HimanshuMohanty-Git24/TOSRoberta.svg)](https://github.com/HimanshuMohanty-Git24/TOSRoberta/issues)
7
+
8
+ TOSRoberta is an advanced Terms of Service (ToS) analyzer powered by a fine-tuned RoBERTa-large model. It classifies clauses in ToS documents based on their fairness level, helping users quickly identify potentially unfair terms.
9
+
10
+ ![image](https://github.com/HimanshuMohanty-Git24/TOSRoberta/assets/94133298/c4f6a293-9109-4e63-86e6-766dc16ad589)
11
+
12
+
13
+ ## 🌟 Features
14
+
15
+ - πŸ“Š Analyzes ToS documents and classifies clauses into three categories:
16
+ - βœ… Clearly Fair
17
+ - ⚠️ Potentially Unfair
18
+ - ❌ Clearly Unfair
19
+ - πŸ“ Supports both PDF and text file uploads
20
+ - πŸ’» User-friendly web interface built with Streamlit
21
+ - 🧠 Powered by a fine-tuned RoBERTa-large model (CodeHima/Tos-Roberta)
22
+
23
+ ## πŸš€ Model Performance
24
+
25
+ Our Tos-Roberta model demonstrates strong performance on the task of ToS clause classification:
26
+
27
+ - **Validation Accuracy**: 89.64%
28
+ - **Test Accuracy**: 85.84%
29
+
30
+ Detailed performance metrics per epoch:
31
+
32
+ | Epoch | Training Loss | Validation Loss | Accuracy | F1 Score | Precision | Recall |
33
+ |-------|---------------|-----------------|----------|----------|-----------|----------|
34
+ | 1 | 0.443500 | 0.398950 | 0.874699 | 0.858838 | 0.862516 | 0.874699 |
35
+ | 2 | 0.416400 | 0.438409 | 0.853012 | 0.847317 | 0.849916 | 0.853012 |
36
+ | 3 | 0.227700 | 0.505879 | 0.896386 | 0.893325 | 0.891521 | 0.896386 |
37
+ | 4 | 0.052600 | 0.667532 | 0.891566 | 0.893167 | 0.895115 | 0.891566 |
38
+ | 5 | 0.124200 | 0.747090 | 0.884337 | 0.887412 | 0.891807 | 0.884337 |
39
+
40
+ ## πŸ“ Project Structure
41
+
42
+ ```
43
+ tos-analyzer/
44
+ β”‚
45
+ β”œβ”€β”€ app.py
46
+ β”œβ”€β”€ requirements.txt
47
+ β”œβ”€β”€ utils/
48
+ β”‚ β”œβ”€β”€ __init__.py
49
+ β”‚ β”œβ”€β”€ text_processing.py
50
+ β”‚ └── model_utils.py
51
+ └── README.md
52
+ ```
53
+
54
+ ## πŸ› οΈ Installation
55
+
56
+ 1. Clone the repository:
57
+ ```
58
+ git clone https://github.com/HimanshuMohanty-Git24/TOSRoberta.git
59
+ cd TOSRoberta
60
+ ```
61
+
62
+ 2. Install the required dependencies:
63
+ ```
64
+ pip install -r requirements.txt
65
+ ```
66
+
67
+ 3. Run the Streamlit app:
68
+ ```
69
+ streamlit run app.py
70
+ ```
71
+
72
+ ## πŸ“Š Training Visualization
73
+
74
+ We used Weights & Biases for monitoring the training process. Here's a glimpse of our training metrics:
75
+
76
+ ![image](https://github.com/HimanshuMohanty-Git24/TOSRoberta/assets/94133298/d28bbd84-9008-4d19-bff1-4b62874a5faa)
77
+
78
+
79
+ ## 🀝 Contributing
80
+
81
+ Contributions are welcome! Please feel free to submit a Pull Request.
82
+
83
+ ## πŸ“„ License
84
+
85
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
86
+
87
+ ## πŸ™ Acknowledgements
88
+
89
+ - [Hugging Face](https://huggingface.co/) for the Transformers library
90
+ - [Streamlit](https://streamlit.io/) for the easy-to-use web app framework
91
+ - [Weights & Biases](https://wandb.ai/) for experiment tracking
92
+
93
+ ## πŸ“¬ Contact
94
+
95
+ Himanshu Mohanty - [CodingHima](https://x.com/CodingHima) - codehimanshu24@gmail.com
96
+
97
+ Project Link: [https://github.com/HimanshuMohanty-Git24/TOSRoberta](https://github.com/HimanshuMohanty-Git24/TOSRoberta)
98
+
99
+ ---
100
+
101
+ ⭐️ If you find this project useful, please consider giving it a star!