T5-Base Job Description to Resume JSON
This model fine-tunes google/t5-base to convert job descriptions into structured resume JSON data.
Model description
This model is based on the T5-base architecture fine-tuned on a dataset of 10,000 job description and resume pairs. It takes a job description as input and generates a JSON representation of a resume tailored to that job.
Base model: google/t5-base
Fine-tuning task: Text-to-JSON conversion
Training data: 10,000 job description and resume pairs
Intended uses & limitations
Intended uses:
- Generating structured resume data from job descriptions
- Assisting job seekers in tailoring resumes to specific job postings
- Automating parts of the resume creation process
Limitations:
- The model's output quality depends on the input job description's detail and clarity
- Generated resumes may require human review and editing
- The model may not capture nuanced or industry-specific requirements
- The model is not tokenized to output "{" or "}", and instead uses "RB>" and "LB>" respectively
Training data
The model was trained on 10,000 pairs of job descriptions and corresponding resume JSON data. The data distribution and any potential biases in the training set are not specified.
Training procedure
The model was fine-tuned using the standard T5 text-to-text framework. Specific hyperparameters and training details are not provided.
How to Get Started with the Model
Use the code below to get started with the model.
Click to expand
from transformers import T5Tokenizer, T5ForConditionalGeneration
def load_model_and_tokenizer(model_path):
"""
Load the tokenizer and model from the specified path.
"""
tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base")
model = T5ForConditionalGeneration.from_pretrained(model_path)
return tokenizer, model
def generate_text(prompt, tokenizer, model):
"""
Generate text using the model based on the given prompt.
"""
# Encode the input prompt to get the tensor
input_ids = tokenizer(prompt, return_tensors="pt", padding=True).input_ids
# Generate the output using the model
outputs = model.generate(input_ids, max_length=512, num_return_sequences=1)
# Decode the output tensor to human-readable text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
return generated_text
def main():
model_path = "nakamoto-yama/t5-resume-generation"
print(f"Loading model and tokenizer from {model_path}")
tokenizer, model = load_model_and_tokenizer(model_path)
# Test the model with a prompt
while True:
prompt = input("Enter a job description or title: ")
if prompt.lower() == 'exit':
break
response = generate_text(f"generate resume JSON for the following job: {prompt}", tokenizer, model)
response = response.replace("LB>", "{").replace("RB>", "}")
print(f"Generated Response: {response}")
if __name__ == "__main__":
main()
See the Hugging Face T5 docs and a Colab Notebook created by the model developers for more examples.
Ethical considerations
This model automates part of the resume creation process, which could have implications for job seeking and hiring practices. Users should be aware of potential biases in the training data that may affect the generated resumes.
Additional information
For more details on the base T5 model, refer to the T5 paper and the google/t5-base model card.
- Downloads last month
- 193