rrayhka commited on
Commit
7df7730
·
1 Parent(s): e3dcddf
Files changed (8) hide show
  1. .gitignore +1 -0
  2. Dockerfile +27 -0
  3. LICENSE +21 -0
  4. README.md +69 -1
  5. app.py +27 -0
  6. download_model.py +6 -0
  7. requirements.txt +4 -0
  8. templates/index.html +217 -0
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ model/*
Dockerfile ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gunakan base image dengan Python
2
+ FROM python:3.11
3
+
4
+ # Set environment variable untuk non-interaktif
5
+ ENV PYTHONDONTWRITEBYTECODE=1
6
+ ENV PYTHONUNBUFFERED=1
7
+
8
+ # Buat direktori kerja untuk aplikasi
9
+ WORKDIR /app
10
+
11
+ # Salin file requirements.txt ke container
12
+ COPY requirements.txt /app/
13
+
14
+ # Install dependencies Python
15
+ RUN pip install --no-cache-dir -r requirements.txt
16
+
17
+ # Salin seluruh file aplikasi ke container
18
+ COPY . /app/
19
+
20
+ # Jalankan skrip untuk mengunduh model dari Hugging Face
21
+ RUN python download_model.py
22
+
23
+ # Ekspos port default Flask (5000)
24
+ EXPOSE 5000
25
+
26
+ # Perintah untuk menjalankan aplikasi Flask
27
+ CMD ["python", "app.py"]
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2024 akhyarr
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -8,4 +8,72 @@ pinned: false
8
  short_description: 'Simple Pos Tagger '
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  short_description: 'Simple Pos Tagger '
9
  ---
10
 
11
+ # POS Tagger
12
+
13
+ Proyek ini adalah implementasi sederhana dari POS (Part-of-Speech) Tagger menggunakan model dari Hugging Face. Aplikasi ini menggunakan Flask untuk backend dan HTML/CSS untuk frontend.
14
+
15
+ ## Persyaratan
16
+
17
+ Pastikan Anda telah menginstal:
18
+
19
+ - Python 3.6 atau lebih tinggi
20
+ - pip (Python package installer)
21
+
22
+ ## Cara Menjalankan
23
+
24
+ Ikuti langkah-langkah berikut untuk menjalankan proyek ini di mesin lokal Anda:
25
+
26
+ ### 1. Clone Repositori
27
+
28
+ Clone repositori ini ke direktori lokal Anda menggunakan perintah berikut:
29
+
30
+ ```bash
31
+ git clone https://github.com/rrayhka/pos-tagger.git
32
+ ```
33
+
34
+ ### 2. Install Dependencies
35
+
36
+ Masuk ke direktori proyek yang baru saja di-clone dan install dependencies yang diperlukan:
37
+
38
+ ```bash
39
+ cd pos-tagger
40
+ pip install -r requirements.txt
41
+ ```
42
+
43
+ ### 3. Unduh Model
44
+
45
+ Jalankan skrip `download_model.py` untuk mengunduh model dan tokenizer dari Hugging Face dan menyimpannya secara lokal:
46
+
47
+ ```bash
48
+ python download_model.py
49
+ ```
50
+
51
+ ### 4. Jalankan Aplikasi
52
+
53
+ Jalankan aplikasi Flask menggunakan perintah berikut:
54
+
55
+ ```bash
56
+ python app.py
57
+ ```
58
+
59
+ ## File Utama
60
+
61
+ - `app.py`: Script untuk menjalankan backend Flask.
62
+ - `download_model.py`: Script untuk mengunduh dan menyimpan model dan tokenizer dari Hugging Face.
63
+ - `index.html`: File HTML untuk frontend aplikasi.
64
+ - `requirements.txt`: Daftar dependencies yang diperlukan untuk menjalankan aplikasi.
65
+
66
+ ## Menggunakan Aplikasi
67
+
68
+ 1. Buka `index.html` di browser Anda.
69
+ 2. Masukkan teks bahasa Indonesia di textarea yang disediakan.
70
+ 3. Klik tombol "Tag POS".
71
+ 4. Hasil POS tagging akan ditampilkan di bawah form input.
72
+
73
+ ## Kontribusi
74
+
75
+ Jika Anda ingin berkontribusi pada proyek ini, silakan buat pull request atau buka isu di repositori GitHub.
76
+
77
+ ## Lisensi
78
+
79
+ Proyek ini dilisensikan di bawah MIT License. Lihat file `LICENSE` untuk informasi lebih lanjut.
app.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Flask, request, jsonify, render_template
2
+ from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
3
+ from flask_cors import CORS
4
+ import logging
5
+
6
+ logging.basicConfig(level=logging.DEBUG)
7
+
8
+ app = Flask(__name__)
9
+ CORS(app)
10
+ model_path = "./model"
11
+ model = AutoModelForTokenClassification.from_pretrained(model_path)
12
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
13
+ nlp = pipeline("token-classification", model=model, tokenizer=tokenizer)
14
+
15
+ @app.route('/')
16
+ def home(): return render_template('index.html')
17
+
18
+ @app.route('/pos_tag', methods=['POST'])
19
+ def pos_tag():
20
+ text = request.json.get('text')
21
+ if not text: return jsonify({"error": "Masukkan Teks"}), 400
22
+ if isinstance(text, str): text = text.encode('utf-8').decode('utf-8')
23
+ results = nlp(text)
24
+ tagged_tokens = [{"word": res["word"].replace('Ġ', ''), "tag": res["entity"].replace('B-', '').replace('I-', '')} for res in results]
25
+ return jsonify({"tagged_tokens": tagged_tokens})
26
+ if __name__ == '__main__':
27
+ app.run(port=5007, debug=True)
download_model.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
2
+ model_name = "w11wo/indonesian-roberta-base-posp-tagger"
3
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
4
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
5
+ model.save_pretrained('./model')
6
+ tokenizer.save_pretrained('./model')
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ transformers
2
+ torch
3
+ flask
4
+ flask-cors
templates/index.html ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+
4
+ <head>
5
+ <meta charset="UTF-8">
6
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
7
+ <title>POS Tagger</title>
8
+ <style>
9
+ body {
10
+ font-family: Arial, sans-serif;
11
+ margin: 0;
12
+ padding: 20px;
13
+ background-color: #f4f4f4;
14
+ }
15
+
16
+ .container {
17
+ max-width: 600px;
18
+ margin: auto;
19
+ padding: 20px;
20
+ background: #fff;
21
+ box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
22
+ }
23
+
24
+ textarea {
25
+ width: 95%;
26
+ height: 100px;
27
+ padding: 10px;
28
+ margin-bottom: 10px;
29
+ }
30
+
31
+ button {
32
+ display: inline-block;
33
+ padding: 10px 20px;
34
+ background: #007bff;
35
+ color: #fff;
36
+ border: none;
37
+ cursor: pointer;
38
+ }
39
+
40
+ button:hover {
41
+ background: #0056b3;
42
+ }
43
+
44
+ .results {
45
+ margin-top: 20px;
46
+ }
47
+
48
+ .results span {
49
+ display: inline-block;
50
+ padding: 5px;
51
+ margin-right: 3.5px;
52
+ margin-bottom: 6.5px;
53
+ border-radius: 4px;
54
+
55
+ }
56
+
57
+ .PPO {
58
+ background-color: #ffadad;
59
+ color: #000;
60
+ }
61
+
62
+ .KUA {
63
+ background-color: #ffd6a5;
64
+ color: #000;
65
+ }
66
+
67
+ .ADV {
68
+ background-color: #fdffb6;
69
+ color: #000;
70
+ }
71
+
72
+ .PRN {
73
+ background-color: #caffbf;
74
+ color: #000;
75
+ }
76
+
77
+ .VBI {
78
+ background-color: #9bf6ff;
79
+ color: #000;
80
+ }
81
+
82
+ .PAR {
83
+ background-color: #a0c4ff;
84
+ color: #000;
85
+ }
86
+
87
+ .VBP {
88
+ background-color: #bdb2ff;
89
+ color: #000;
90
+ }
91
+
92
+ .NNP {
93
+ background-color: #ffc6ff;
94
+ color: #000;
95
+ }
96
+
97
+ .UNS {
98
+ background-color: #fffffc;
99
+ color: #000;
100
+ }
101
+
102
+ .VBT {
103
+ background-color: #d0f4de;
104
+ color: #000;
105
+ }
106
+
107
+ .VBL {
108
+ background-color: #a2d2ff;
109
+ color: #000;
110
+ }
111
+
112
+ .NNO {
113
+ background-color: #ffadad;
114
+ color: #000;
115
+ }
116
+
117
+ .ADJ {
118
+ background-color: #ffd6a5;
119
+ color: #000;
120
+ }
121
+
122
+ .PRR {
123
+ background-color: #fdffb6;
124
+ color: #000;
125
+ }
126
+
127
+ .PRK {
128
+ background-color: #caffbf;
129
+ color: #000;
130
+ }
131
+
132
+ .CCN {
133
+ background-color: #9bf6ff;
134
+ color: #000;
135
+ }
136
+
137
+ .ADK {
138
+ background-color: #bdb2ff;
139
+ color: #000;
140
+ }
141
+
142
+ .ART {
143
+ background-color: #ffc6ff;
144
+ color: #000;
145
+ }
146
+
147
+ .CSN {
148
+ background-color: #979745;
149
+ color: #000;
150
+ }
151
+
152
+ .NUM {
153
+ background-color: #d0f4de;
154
+ color: #000;
155
+ }
156
+
157
+ .SYM {
158
+ background-color: #a2d2ff;
159
+ color: #000;
160
+ }
161
+
162
+ .INT {
163
+ background-color: #ffadad;
164
+ color: #000;
165
+ }
166
+
167
+ .NEG {
168
+ background-color: #ffd6a5;
169
+ color: #000;
170
+ }
171
+
172
+ .PRI {
173
+ background-color: #fdffb6;
174
+ color: #000;
175
+ }
176
+
177
+ .VBE {
178
+ background-color: #caffbf;
179
+ color: #000;
180
+ }
181
+ </style>
182
+ </head>
183
+
184
+ <body>
185
+ <div class="container">
186
+ <h1>POS Tagger</h1>
187
+ <form id="posForm">
188
+ <textarea id="textInput" placeholder="Masukkan teks bahasa Indonesia di sini" required></textarea>
189
+ <button type="submit">Tag POS</button>
190
+ </form>
191
+ <div class="results" id="results"></div>
192
+ </div>
193
+ <script>
194
+ document.getElementById('posForm').addEventListener('submit', async function (e) {
195
+ e.preventDefault();
196
+ const text = document.getElementById('textInput').value;
197
+ const response = await fetch('http://127.0.0.1:5007/pos_tag', {
198
+ method: 'POST',
199
+ headers: {
200
+ 'Content-Type': 'application/json',
201
+ },
202
+ body: JSON.stringify({ text }),
203
+ });
204
+ const data = await response.json();
205
+ const resultsDiv = document.getElementById('results');
206
+ resultsDiv.innerHTML = '';
207
+ data.tagged_tokens.forEach(token => {
208
+ const span = document.createElement('span');
209
+ span.textContent = `${token.word} (${token.tag})`;
210
+ span.className = token.tag;
211
+ resultsDiv.appendChild(span);
212
+ });
213
+ });
214
+ </script>
215
+ </body>
216
+
217
+ </html>