Spaces:
Runtime error
Runtime error
WebashalarForML
commited on
Commit
β’
e31d2da
1
Parent(s):
4956b17
Update README2.md
Browse files- README2.md +21 -78
README2.md
CHANGED
@@ -55,102 +55,45 @@ integrating the Mistral-Nemo-Instruct-2407 model for primary parsing.
|
|
55 |
# File Structure Overview:
|
56 |
Spacy_Model_creator/
|
57 |
β
|
58 |
-
βββ
|
59 |
β βββ ner_model_05_3 # Pretrained spaCy model directory for resume parsing
|
|
|
|
|
|
|
|
|
|
|
60 |
β
|
61 |
βββ templates/
|
62 |
-
β βββ
|
63 |
-
β βββ result.html
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
β
|
65 |
-
βββ
|
|
|
66 |
β
|
67 |
βββ utils/
|
68 |
-
β βββ
|
69 |
-
β βββ
|
70 |
-
β βββ
|
71 |
-
β βββ
|
72 |
β
|
73 |
βββ venv/ # Virtual environment
|
74 |
β
|
75 |
βββ .env # Environment variables file (contains Hugging Face token)
|
76 |
β
|
77 |
-
βββ
|
78 |
β
|
79 |
βββ requirements.txt # Dependencies required for the project
|
80 |
|
81 |
-
|
82 |
-
# Program Overview:
|
83 |
-
|
84 |
-
# Mistral Integration (utils/mistral.py)
|
85 |
-
- Mistral API Calls: Uses Hugging Faces Mistral-Nemo-Instruct-2407 model to parse resumes.
|
86 |
-
- Personal and Professional Extraction: Two functions extract personal and professional information in structured JSON format.
|
87 |
-
- Fallback Mechanism: If Mistral fails, spaCys NER model is used as a fallback.
|
88 |
-
|
89 |
-
# SpaCy Integration (utils/spacy.py)
|
90 |
-
- Custom Trained Model: Uses a spaCy model (ner_model_05_3) trained specifically for resume parsing.
|
91 |
-
- Named Entity Recognition: Extracts key information like Name, Email, Contact, Location, Skills, Experience, etc., from resumes.
|
92 |
-
- Validation: Includes validation for extracted emails and contacts.
|
93 |
-
|
94 |
-
# File Conversion (utils/fileTotext.py)
|
95 |
-
- Text Extraction: Handles different resume formats (PDF, DOCX, ODT, RSF, and images like PNG, JPG, JPEG) and extracts text for further processing.
|
96 |
-
- PDF Files: Uses PyMuPDF to extract text and, if necessary, Tesseract-OCR for image-based PDF content.
|
97 |
-
- DOCX Files: Uses `python-docx` to extract structured text from Word documents.
|
98 |
-
- ODT Files: Uses `odfpy` to extract text from ODT (OpenDocument) files.
|
99 |
-
- RSF Files: Reads plain text from RSF files.
|
100 |
-
- Images (PNG, JPG, JPEG): Uses Tesseract-OCR to extract text from image-based resumes.
|
101 |
-
Note: For Tesseract-OCR, install it locally by following the [installation guide](https://github.com/UB-Mannheim/tesseract/wiki).
|
102 |
-
- Hyperlink Extraction: Extracts hyperlinks from PDF files, capturing any embedded URLs during the parsing process.
|
103 |
-
|
104 |
-
|
105 |
-
# Error Handling (utils/error.py)
|
106 |
-
- Manages API response errors, file format issues, and ensures smooth fallbacks without crashing the app.
|
107 |
-
|
108 |
-
# Flask API (main.py)
|
109 |
-
Endpoints:
|
110 |
-
- /upload for uploading resumes.
|
111 |
-
- Displays parsed results in JSON format on the results page.
|
112 |
-
- UI: Simple interface for uploading resumes and viewing the parsing results.
|
113 |
-
|
114 |
-
|
115 |
-
# Tree map of program:
|
116 |
-
|
117 |
-
main.py
|
118 |
-
βββ Handles API side
|
119 |
-
βββ File upload/remove
|
120 |
-
βββ Process resumes
|
121 |
-
βββ Show result
|
122 |
-
utils
|
123 |
-
βββ fileTotext.py
|
124 |
-
β βββ Converts files to text
|
125 |
-
β βββ PDF
|
126 |
-
β βββ DOCX
|
127 |
-
β βββ RTF
|
128 |
-
β βββ ODT
|
129 |
-
β βββ PNG
|
130 |
-
β βββ JPG
|
131 |
-
β βββ JPEG
|
132 |
-
βββ mistral.py
|
133 |
-
β βββ Mistral API Calls
|
134 |
-
β β βββ Uses Mistral-Nemo-Instruct-2407 model
|
135 |
-
β βββ Personal and Professional Extraction
|
136 |
-
β β βββ Extracts personal information
|
137 |
-
β β βββ Extracts professional information
|
138 |
-
β βββ Fallback Mechanism
|
139 |
-
β βββ Uses spaCy NER model if Mistral fails
|
140 |
-
βββ spacy.py
|
141 |
-
βββ Custom Trained Model
|
142 |
-
β βββ Uses spaCy model (ner_model_05_3)
|
143 |
-
βββ Named Entity Recognition
|
144 |
-
β βββ Extracts key information (Name, Email, Contact, etc.)
|
145 |
-
βββ Validation
|
146 |
-
βββ Validates emails and contacts
|
147 |
-
|
148 |
-
|
149 |
# References:
|
150 |
|
151 |
- [Flask Documentation](https://flask.palletsprojects.com/)
|
152 |
- [spaCy Documentation](https://spacy.io/usage)
|
153 |
-
- [Mistral Documentation](https://docs.mistral.ai/)
|
154 |
- [Hugging Face Hub API](https://huggingface.co/docs/huggingface_hub/index)
|
155 |
- [PyMuPDF (MuPDF) Documentation](https://pymupdf.readthedocs.io/en/latest/)
|
156 |
- [python-docx Documentation](https://python-docx.readthedocs.io/en/latest/)
|
|
|
55 |
# File Structure Overview:
|
56 |
Spacy_Model_creator/
|
57 |
β
|
58 |
+
βββ Models/
|
59 |
β βββ ner_model_05_3 # Pretrained spaCy model directory for resume parsing
|
60 |
+
β
|
61 |
+
βββ data/
|
62 |
+
β βββ Json_data.json
|
63 |
+
β βββ resume_text.txt
|
64 |
+
β βββ Spacy_data.spacy
|
65 |
β
|
66 |
βββ templates/
|
67 |
+
β βββ anoter.html
|
68 |
+
β βββ result.html
|
69 |
+
β βββ guide.html
|
70 |
+
β βββ savejson.html
|
71 |
+
β βββ savespacy.html
|
72 |
+
β βββ text.html
|
73 |
+
β βββ upload.html
|
74 |
+
β βββ data_files.html
|
75 |
β
|
76 |
+
βββ JSON/
|
77 |
+
β βββ Json_data.json
|
78 |
β
|
79 |
βββ utils/
|
80 |
+
β βββ model.py # Code for calling Mistral API and handling responses
|
81 |
+
β βββ json_to_spacy.py # spaCy fallback model for parsing resumes
|
82 |
+
β βββ anoter_to_json.py # Error handling utilities
|
83 |
+
β βββ file_To_text.py # Functions to extract text from different file formats (PDF, DOCX, etc.)
|
84 |
β
|
85 |
βββ venv/ # Virtual environment
|
86 |
β
|
87 |
βββ .env # Environment variables file (contains Hugging Face token)
|
88 |
β
|
89 |
+
βββ app.py # Flask app handling API routes for uploading and processing resumes
|
90 |
β
|
91 |
βββ requirements.txt # Dependencies required for the project
|
92 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
# References:
|
94 |
|
95 |
- [Flask Documentation](https://flask.palletsprojects.com/)
|
96 |
- [spaCy Documentation](https://spacy.io/usage)
|
|
|
97 |
- [Hugging Face Hub API](https://huggingface.co/docs/huggingface_hub/index)
|
98 |
- [PyMuPDF (MuPDF) Documentation](https://pymupdf.readthedocs.io/en/latest/)
|
99 |
- [python-docx Documentation](https://python-docx.readthedocs.io/en/latest/)
|