WebashalarForML commited on
Commit
e31d2da
β€’
1 Parent(s): 4956b17

Update README2.md

Browse files
Files changed (1) hide show
  1. README2.md +21 -78
README2.md CHANGED
@@ -55,102 +55,45 @@ integrating the Mistral-Nemo-Instruct-2407 model for primary parsing.
55
  # File Structure Overview:
56
  Spacy_Model_creator/
57
  β”‚
58
- β”œβ”€β”€ Spacy_Models/
59
  β”‚ └── ner_model_05_3 # Pretrained spaCy model directory for resume parsing
 
 
 
 
 
60
  β”‚
61
  β”œβ”€β”€ templates/
62
- β”‚ β”œβ”€β”€ index.html # UI for file upload
63
- β”‚ └── result.html # Display parsed results in structured JSON
 
 
 
 
 
 
64
  β”‚
65
- β”œβ”€β”€ uploads/ # Directory for uploaded resume files
 
66
  β”‚
67
  β”œβ”€β”€ utils/
68
- β”‚ β”œβ”€β”€ mistral.py # Code for calling Mistral API and handling responses
69
- β”‚ β”œβ”€β”€ spacy.py # spaCy fallback model for parsing resumes
70
- β”‚ β”œβ”€β”€ error.py # Error handling utilities
71
- β”‚ └── fileTotext.py # Functions to extract text from different file formats (PDF, DOCX, etc.)
72
  β”‚
73
  β”œβ”€β”€ venv/ # Virtual environment
74
  β”‚
75
  β”œβ”€β”€ .env # Environment variables file (contains Hugging Face token)
76
  β”‚
77
- β”œβ”€β”€ main.py # Flask app handling API routes for uploading and processing resumes
78
  β”‚
79
  └── requirements.txt # Dependencies required for the project
80
 
81
-
82
- # Program Overview:
83
-
84
- # Mistral Integration (utils/mistral.py)
85
- - Mistral API Calls: Uses Hugging Faces Mistral-Nemo-Instruct-2407 model to parse resumes.
86
- - Personal and Professional Extraction: Two functions extract personal and professional information in structured JSON format.
87
- - Fallback Mechanism: If Mistral fails, spaCys NER model is used as a fallback.
88
-
89
- # SpaCy Integration (utils/spacy.py)
90
- - Custom Trained Model: Uses a spaCy model (ner_model_05_3) trained specifically for resume parsing.
91
- - Named Entity Recognition: Extracts key information like Name, Email, Contact, Location, Skills, Experience, etc., from resumes.
92
- - Validation: Includes validation for extracted emails and contacts.
93
-
94
- # File Conversion (utils/fileTotext.py)
95
- - Text Extraction: Handles different resume formats (PDF, DOCX, ODT, RSF, and images like PNG, JPG, JPEG) and extracts text for further processing.
96
- - PDF Files: Uses PyMuPDF to extract text and, if necessary, Tesseract-OCR for image-based PDF content.
97
- - DOCX Files: Uses `python-docx` to extract structured text from Word documents.
98
- - ODT Files: Uses `odfpy` to extract text from ODT (OpenDocument) files.
99
- - RSF Files: Reads plain text from RSF files.
100
- - Images (PNG, JPG, JPEG): Uses Tesseract-OCR to extract text from image-based resumes.
101
- Note: For Tesseract-OCR, install it locally by following the [installation guide](https://github.com/UB-Mannheim/tesseract/wiki).
102
- - Hyperlink Extraction: Extracts hyperlinks from PDF files, capturing any embedded URLs during the parsing process.
103
-
104
-
105
- # Error Handling (utils/error.py)
106
- - Manages API response errors, file format issues, and ensures smooth fallbacks without crashing the app.
107
-
108
- # Flask API (main.py)
109
- Endpoints:
110
- - /upload for uploading resumes.
111
- - Displays parsed results in JSON format on the results page.
112
- - UI: Simple interface for uploading resumes and viewing the parsing results.
113
-
114
-
115
- # Tree map of program:
116
-
117
- main.py
118
- β”œβ”€β”€ Handles API side
119
- β”œβ”€β”€ File upload/remove
120
- β”œβ”€β”€ Process resumes
121
- └── Show result
122
- utils
123
- β”œβ”€β”€ fileTotext.py
124
- β”‚ └── Converts files to text
125
- β”‚ β”œβ”€β”€ PDF
126
- β”‚ β”œβ”€β”€ DOCX
127
- β”‚ β”œβ”€β”€ RTF
128
- β”‚ β”œβ”€β”€ ODT
129
- β”‚ β”œβ”€β”€ PNG
130
- β”‚ β”œβ”€β”€ JPG
131
- β”‚ └── JPEG
132
- β”œβ”€β”€ mistral.py
133
- β”‚ β”œβ”€β”€ Mistral API Calls
134
- β”‚ β”‚ └── Uses Mistral-Nemo-Instruct-2407 model
135
- β”‚ β”œβ”€β”€ Personal and Professional Extraction
136
- β”‚ β”‚ β”œβ”€β”€ Extracts personal information
137
- β”‚ β”‚ └── Extracts professional information
138
- β”‚ └── Fallback Mechanism
139
- β”‚ └── Uses spaCy NER model if Mistral fails
140
- └── spacy.py
141
- β”œβ”€β”€ Custom Trained Model
142
- β”‚ └── Uses spaCy model (ner_model_05_3)
143
- β”œβ”€β”€ Named Entity Recognition
144
- β”‚ └── Extracts key information (Name, Email, Contact, etc.)
145
- └── Validation
146
- └── Validates emails and contacts
147
-
148
-
149
  # References:
150
 
151
  - [Flask Documentation](https://flask.palletsprojects.com/)
152
  - [spaCy Documentation](https://spacy.io/usage)
153
- - [Mistral Documentation](https://docs.mistral.ai/)
154
  - [Hugging Face Hub API](https://huggingface.co/docs/huggingface_hub/index)
155
  - [PyMuPDF (MuPDF) Documentation](https://pymupdf.readthedocs.io/en/latest/)
156
  - [python-docx Documentation](https://python-docx.readthedocs.io/en/latest/)
 
55
  # File Structure Overview:
56
  Spacy_Model_creator/
57
  β”‚
58
+ β”œβ”€β”€ Models/
59
  β”‚ └── ner_model_05_3 # Pretrained spaCy model directory for resume parsing
60
+ β”‚
61
+ β”œβ”€β”€ data/
62
+ β”‚ └── Json_data.json
63
+ β”‚ └── resume_text.txt
64
+ β”‚ └── Spacy_data.spacy
65
  β”‚
66
  β”œβ”€β”€ templates/
67
+ β”‚ β”œβ”€β”€ anoter.html
68
+ β”‚ └── result.html
69
+ β”‚ └── guide.html
70
+ β”‚ └── savejson.html
71
+ β”‚ └── savespacy.html
72
+ β”‚ └── text.html
73
+ β”‚ └── upload.html
74
+ β”‚ └── data_files.html
75
  β”‚
76
+ β”œβ”€β”€ JSON/
77
+ β”‚ └── Json_data.json
78
  β”‚
79
  β”œβ”€β”€ utils/
80
+ β”‚ β”œβ”€β”€ model.py # Code for calling Mistral API and handling responses
81
+ β”‚ β”œβ”€β”€ json_to_spacy.py # spaCy fallback model for parsing resumes
82
+ β”‚ β”œβ”€β”€ anoter_to_json.py # Error handling utilities
83
+ β”‚ └── file_To_text.py # Functions to extract text from different file formats (PDF, DOCX, etc.)
84
  β”‚
85
  β”œβ”€β”€ venv/ # Virtual environment
86
  β”‚
87
  β”œβ”€β”€ .env # Environment variables file (contains Hugging Face token)
88
  β”‚
89
+ β”œβ”€β”€ app.py # Flask app handling API routes for uploading and processing resumes
90
  β”‚
91
  └── requirements.txt # Dependencies required for the project
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  # References:
94
 
95
  - [Flask Documentation](https://flask.palletsprojects.com/)
96
  - [spaCy Documentation](https://spacy.io/usage)
 
97
  - [Hugging Face Hub API](https://huggingface.co/docs/huggingface_hub/index)
98
  - [PyMuPDF (MuPDF) Documentation](https://pymupdf.readthedocs.io/en/latest/)
99
  - [python-docx Documentation](https://python-docx.readthedocs.io/en/latest/)