# SVM Model with TF-IDF Step by step instruction: ## Installation
Before running the code, ensure you have all the required libraries installed: ```python pip install nltk beautifulsoup4 scikit-learn pandas ```
Download necessary NTLK resources for preprocessing. ```python import nltk nltk.download('stopwords') nltk.download('wordnet') ``` # How to Use: 1. Pre-Trained Model and Vectorizer
The repository includes: - model.pkl : The pre-trained SVM model - tfidf.pkl: The saved TF-IDF vectorizer used to transform the text data. 2. Testing a new dataset
To test the model with the new dataset, follow these steps: - Step 1: Prepare the dataset:
Ensure the dataset is in CVS format and has three columns: title, outlet and labels. title column containing the text data to be classified. - Step 2: Preprocess the Data
Use the clean() function from data_cleaning.py to preprocess the text data: ```python from data_cleaning import clean import pandas as pd # Load your data df = pd.read_csv('test_data_random_subset.csv') # Clean the data cleaned_df = clean(df) ``` - Step 3: Load the pre-trained model and TF-IDF Vectorizer