⏱️ Q1: How long did it take to solve the problem?
The solution was developed in approximately 5 hours (excluding data collection and model training phases).
🔍 Q2: Can you explain your solution approach?
The solution implements a multi-stage document classification pipeline:

1. Direct URL Text Approach:
2. Baseline Approach (ML Model):
3. (DL Model):
🤖 Q3: Which models did you use and why?
Implemented baseline using TF-IDF and Logistic Regression and then used BERT-based model:

Baseline Model:
BERT Model:
⚠️ Q4: What are the current limitations and potential improvements?
Current Implementation & Limitations:
Proposed Improvements:
📊 Q5: What is the model's performance on test data?
BERT Model Performance:

Category Precision Recall F1-Score Support
Cable 1.00 1.00 1.00 92
Fuses 0.95 1.00 0.98 42
Lighting 0.94 1.00 0.97 74
Others 1.00 0.92 0.96 83
Accuracy 0.98 291
Macro Avg 0.97 0.98 0.98 291
Weighted Avg 0.98 0.98 0.98 291
✨ Perfect performance (1.00) for Cable category
📈 High recall (1.00) across most categories
🎯 Overall accuracy of 98%
⚖️ Balanced performance across all metrics
📈 Q6: Why did you choose these particular metrics?
Our metric selection was driven by the dataset characteristics:

Key Considerations:
Selected Metrics:
metrics = { 'Metric': ['Accuracy', 'Precision', 'Recall', 'F1-Score'], 'Baseline': [0.85, 0.83, 0.84, 0.83], 'BERT': [0.98, 0.97, 0.98, 0.98] } df = pd.DataFrame(metrics)