yitingliii commited on
Commit
9c9929c
1 Parent(s): e1bbe05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -1
README.md CHANGED
@@ -2,16 +2,58 @@
2
  This repository provides a pre-trained Support Vector Machine (SVM) model for text classification using Term Frequency-Inverse Document Frequency (TF-IDF). The repository also includes utilities for data preprocessing and feature extraction:
3
 
4
  There are two ways to test our model:
5
- # 1.Colab
6
  ## Start
7
  <br> Download all the files.
8
  <br> Copy all the codes below into Colab
 
 
9
  ```python
10
  pip install nltk beautifulsoup4 scikit-learn pandas datasets fsspec huggingface_hub
11
  ```
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
 
17
  # 2. Termial
 
2
  This repository provides a pre-trained Support Vector Machine (SVM) model for text classification using Term Frequency-Inverse Document Frequency (TF-IDF). The repository also includes utilities for data preprocessing and feature extraction:
3
 
4
  There are two ways to test our model:
5
+ # 1.Colab (can see the file for how the Colab looks like)
6
  ## Start
7
  <br> Download all the files.
8
  <br> Copy all the codes below into Colab
9
+ <br>Before running the code, ensure you have all the required libraries installed:
10
+
11
  ```python
12
  pip install nltk beautifulsoup4 scikit-learn pandas datasets fsspec huggingface_hub
13
  ```
14
 
15
+ <br> Download necessary NTLK resources for preprocessing.
16
+
17
+ ```python
18
+ import nltk
19
+ nltk.download('stopwords')
20
+ nltk.download('wordnet')
21
+ nltk.download('omw-1.4')
22
+ ```
23
+ <br>Clean the Dataset
24
+ ```python
25
+ from data_cleaning import clean
26
+ import pandas as pd
27
+ import nltk
28
+ nltk.download('stopwords')
29
+ ```
30
+
31
+ <br> You can replace with any datasets you want by changing the file name inside ```pd.read_csv()```.
32
+ ```python
33
+
34
+ df = pd.read_csv("hf://datasets/CIS5190abcd/headlines_test/test_cleaned_headlines.csv")
35
 
36
 
37
+ cleaned_df = clean(df)
38
+
39
+ ```
40
+
41
+ - Extract TF-IDF Features
42
+ ```python
43
+ from tfidf import tfidf
44
+
45
+
46
+ X_new_tfidf = tfidf.transform(cleaned_df['title'])
47
+
48
+ ```
49
+
50
+ - Make Predictions
51
+ ```python
52
+
53
+ from svm import svm_model
54
+
55
+ ```
56
+
57
 
58
 
59
  # 2. Termial