SetFit with sentence-transformers/paraphrase-mpnet-base-v2

This is a SetFit model that can be used for Text Classification. This SetFit model uses sentence-transformers/paraphrase-mpnet-base-v2 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: sentence-transformers/paraphrase-mpnet-base-v2
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
0	"Driver Manager - Days - Driver Manager - Olathe, KS\n\nThe role
1	'Intermediate Data Scientist - The School of Data Science (SDS) at the University of Virginia (UVA) seeks an Intermediate Data Scientist to work in collaboration with Don Brown, PhD and Sana Syed, MD, MS, focusing on understanding gut structure and function in common gastrointestinal (GI) diseases using cutting-edge machine learning and AI methods. The overarching goal of this work is to personalize care for pediatric patients suffering from chronic GI disease by improving diagnostics, predicting future disease complications, and identifying better disease biomarkers and novel drug targets. Details about the Gastro Science Lab and the Syed lab can be found at https://gastrodatasciencelab.org/ and https://med.virginia.edu/sana-syed-lab/.\n\nThis is a one year restricted position continuation is based on the availability of funding and satisfactory performance.\n\nData Scientists provide sophisticated data management and analysis to support University projects or programs. They focus primarily on high-level data projections and statistical analysis. They manage the design and programming of all data entry forms and the training and supervision of project research coders, student workers, and volunteers. They oversee regular assessments of reliability, submit data on a monthly basis, and assist with literature searches pertinent to various research project topics.\n\nThe Successful Candidate Will\n\nWork in a professional manner and have a strong willingness to learn and improve.Promote a culture of excellence by supporting others and generating new ideas to drive the lab forward.Act as a champion for the lab’s research at local, regional, and national conferences.Drive the collection of new data and the refinement of existing data for new purposes.Independently and creatively analyze data to test or refine hypotheses.Explore and examine data from multiple disparate sources in order to identify, analyze, and report trends in the data.Develop and execute of statistical mathematical and predictive models.Visualize and report data findings creatively in a variety of visual formats to support research presentations, manuscripts, and media write-ups.Establish links across existing data sources and find new interesting data correlations.Lead projects in concept formulation, determination of appropriate statistical methodology, data analysis, research evaluation, and final research reporting.Collaborate across faculty and staff to provide actionable data-driven insights.Formulate and define analytic scope and objectives through research and fact-finding as a self-starter.Be a leader of a lab data science team and provide guidance to less experienced data analysts/scientists.\n\nQualifications\n\nMaster's Degree and at least 3 years of relevant experience.Strong Organization and time line management skills .Experience in AI/ML modeling approaches such as: metabolic modeling, convolutional neural networks, and Gradient-weighted Class Activation Mapping.Understand all phases of the analytic process including data collection, preparation, modeling, evaluation, and deployment.\n\nAnticipated hiring range: $100,000 - $120,000 / annual\n\nTo Apply\n\nPlease visit UVA job board: https://jobs.virginia.edu and search for “R0056431”\n\nComplete An Application And Attach\n\nCover LetterCurriculum Vitae \n\nPlease note that multiple documents can be uploaded in the box.\n\nINTERNAL APPLICANTS: Please search for "find jobs" on your workday home page and apply using the internal job board.\n\nReview of applications will begin January 22, 2024 and continue until the position is filled.\n\nFor questions about the position, please contact: Adam Greene, Research Program Officer (arg7ef@virginia.edu) For questions about the application process, please contact: Rhiannon O'Coin (mo2r@virginia.edu)\n\nFor more information about the School of Data Science, please see www.datascience.virginia.edu\n\nFor more information about the University of Virginia and the Charlottesville community, please see www.virginia.edu/life/charlottesville and www.embarkuva.com\n\nThe selected candidate will be required to complete a background check at the time of the offer per University policy.\n\nPHYSICAL DEMANDS This is primarily a sedentary job involving extensive use of desktop computers. The job does occasionally require traveling some distance to attend meetings, and programs.\n\nThe University of Virginia, including the UVA Health System which represents the UVA Medical Center, Schools of Medicine and Nursing, UVA Physician’s Group and the Claude Moore Health Sciences Library, are fundamentally committed to the diversity of our faculty and staff. We believe diversity is excellence expressing itself through every person's perspectives and lived experiences. We are equal opportunity and affirmative action employers. All qualified applicants will receive consideration for employment without regard to age, color, disability, gender identity or expression, marital status, national or ethnic origin, political affiliation, race, religion, sex (including pregnancy), sexual orientation, veteran status, and family medical or genetic information.,' "Artificial Intelligence Engineer - Company Description Shake - social networking \n Role Description This is a part-time hybrid role for an AI Software Engineer at SHAKE. As an AI Software Engineer, you will be responsible for the day-to-day tasks associated with pattern recognition, computer science, neural networks, software development, and natural language processing (NLP). This role is remote work.\n Qualifications Strong knowledge and experience in pattern recognition, computer science, and neural networksProficiency in software development, with a focus on AI technologiesExperience in natural language processing (NLP)Ability to work independently and remotelyExcellent problem-solving and analytical skillsStrong communication and collaboration skillsMaster's or Ph.D. in Computer Science, AI, or related fieldsRelevant industry certifications (e.g., TensorFlow, PyTorch) are a plus," 'Senior Staff Data Scientist (Remote) - Company Description\n\nVericast is a big data company. We receive on average over 100 billion intent signals daily, which assist in generating a deep understanding of a person’s interest and in-market signals across 1,300 interest topics. This is coupled with strong geographic targeting, as over 30 billion location signals are collected daily from over one million retail stores and over 120 million households.\n\nData Science plays a crucial role in delivering our solutions today and will play a more prominent role in our future. A typical data science project has a solid mathematical foundation, an exploratory dimension, and a data-driven workflow. This is also true at Vericast. Our data science projects have strong foundations on machine learning, data engineering, and modeling. We are building a privacy-centric future of digital advertising by focusing on web content. We are connecting web content to consumer interest and action, ultimately driving which ads are shown on a webpage.\n\nTo continue our journey, we are seeking data science experts who are passionate about using cutting edge technology and conceiving innovative methods to solve unique and complex problems. As a Senior Staff Data Scientist at Vericast, your contributions will help us stay at the forefront of the AdTech industry.\n\nJob Description\n\nA Senior Staff Data Scientist is a hands-on expert who is passionate about all aspects of data science and can contribute by designing, conducting, and incorporating analyses of large-scale data from a wide variety of sources. This involves converting ambiguous requirements to concrete solutions for exploring data, designing and/or applying appropriate algorithms, documenting the findings, and incorporating the analysis into end-to-end solutions, systems, and platforms. Effective communication with other job disciplines is required. Contributions are expected at a level of results above and beyond entry-level and mid-level Data Scientists.\n\nKey Duties & Responsibilities\n\nHave a wider impact by providing insights and effective leadership into data science, digital media, and data engineering. This individual will have the hands-on skills to be an individual contributor and the experience for mentoring and leading other data scientists (25%)Act often as a technical lead, determining approach, objectives, requirements, features, milestones, implementation tasks, and tradeoffs of end-to-end large scale data science projects, platforms, and systems (25%)Act as a subject matter expert in data science (ML/AI) algorithms and underlying technologies (programming languages and systems) (15%)Design, conduct, and incorporate analyses of large-scale data from a wide variety of sources (15%)Work within the scrum practices in team projects (10%)Contribute to hiring process by screening higher level candidates, team interviews, manager candidates, i.e., act as a "Bar Raiser" (10%)\n\nQualifications\n\nEducation\n\nBachelor's Degree in a quantitative discipline (Computer Science, Mathematics, Engineering, Statistics) (Required)Master's Degree in a quantitative discipline (Computer Science, Mathematics, Engineering, Statistics) (Desired)Doctorate Degree (Preferred)In lieu of the above education requirements, a combination of experience and education will be considered.\n\nExperience\n\n8 - 10 years Relevant Experience (Required)\n\nKnowledge/Skills/Abilities\n\nStrong analytical skills, with expertise and solid understanding of multiple statistical/analytical machine learning techniques applied at large scale.Technical proficiency in ML algorithms, scalable ML platforms, languages, and tools (Python, Spark, ML/Ops) in a corporate setting is highly desirable.Ability to communicate effectively across multi-disciplinary teams (e.g., data science, engineering and product management, org leadership).Prior experience in applying Data Science in Digital Marketing Technology, Graph Theory, Privacy and Geolocation Data is a plus.\n\nAdditional Information\n\nSalary:$160,000-175,000\n\nThe ultimate compensation offered for the position will depend upon several factors such as skill level, cost of living, experience, and responsibilities.\n\nVericast offers a generous total rewards benefits package that includes medical, dental and vision coverage, 401K and flexible PTO. A wide variety of additional benefits like life insurance, employee assistance and pet insurance are also available, not to mention smart and friendly coworkers!\n\nAt Vericast, we don’t just accept differences - we celebrate them, we support them, and we thrive on them for the benefit of our employees, our clients, and our community.\u202fAs an Equal Opportunity employer, Vericast considers applicants for all positions without regard to race, color, creed, religion, national origin or ancestry, sex, sexual orientation, gender identity, age, disability, genetic information, veteran status, or any other classifications protected by law. Applicants who have disabilities may request that accommodations be made in order to complete the selection process by contacting our Talent Acquisition team at talentacquisition@vericast.com. EEO is the law. To review your rights under Equal Employment Opportunity please visit: www.dol.gov/ofccp/regs/compliance/posters/pdf/eeopost.pdf.\n\n,'

Label

Examples

"Driver Manager - Days - Driver Manager - Olathe, KS\n\nThe role

'Intermediate Data Scientist - The School of Data Science (SDS) at the University of Virginia (UVA) seeks an Intermediate Data Scientist to work in collaboration with Don Brown, PhD and Sana Syed, MD, MS, focusing on understanding gut structure and function in common gastrointestinal (GI) diseases using cutting-edge machine learning and AI methods. The overarching goal of this work is to personalize care for pediatric patients suffering from chronic GI disease by improving diagnostics, predicting future disease complications, and identifying better disease biomarkers and novel drug targets. Details about the Gastro Science Lab and the Syed lab can be found at https://gastrodatasciencelab.org/ and https://med.virginia.edu/sana-syed-lab/.\n\nThis is a one year restricted position continuation is based on the availability of funding and satisfactory performance.\n\nData Scientists provide sophisticated data management and analysis to support University projects or programs. They focus primarily on high-level data projections and statistical analysis. They manage the design and programming of all data entry forms and the training and supervision of project research coders, student workers, and volunteers. They oversee regular assessments of reliability, submit data on a monthly basis, and assist with literature searches pertinent to various research project topics.\n\nThe Successful Candidate Will\n\nWork in a professional manner and have a strong willingness to learn and improve.Promote a culture of excellence by supporting others and generating new ideas to drive the lab forward.Act as a champion for the lab’s research at local, regional, and national conferences.Drive the collection of new data and the refinement of existing data for new purposes.Independently and creatively analyze data to test or refine hypotheses.Explore and examine data from multiple disparate sources in order to identify, analyze, and report trends in the data.Develop and execute of statistical mathematical and predictive models.Visualize and report data findings creatively in a variety of visual formats to support research presentations, manuscripts, and media write-ups.Establish links across existing data sources and find new interesting data correlations.Lead projects in concept formulation, determination of appropriate statistical methodology, data analysis, research evaluation, and final research reporting.Collaborate across faculty and staff to provide actionable data-driven insights.Formulate and define analytic scope and objectives through research and fact-finding as a self-starter.Be a leader of a lab data science team and provide guidance to less experienced data analysts/scientists.\n\nQualifications\n\nMaster's Degree and at least 3 years of relevant experience.Strong Organization and time line management skills .Experience in AI/ML modeling approaches such as: metabolic modeling, convolutional neural networks, and Gradient-weighted Class Activation Mapping.Understand all phases of the analytic process including data collection, preparation, modeling, evaluation, and deployment.\n\nAnticipated hiring range: $100,000 - $120,000 / annual\n\nTo Apply\n\nPlease visit UVA job board: https://jobs.virginia.edu and search for “R0056431”\n\nComplete An Application And Attach\n\nCover LetterCurriculum Vitae \n\nPlease note that multiple documents can be uploaded in the box.\n\nINTERNAL APPLICANTS: Please search for "find jobs" on your workday home page and apply using the internal job board.\n\nReview of applications will begin January 22, 2024 and continue until the position is filled.\n\nFor questions about the position, please contact: Adam Greene, Research Program Officer (arg7ef@virginia.edu) For questions about the application process, please contact: Rhiannon O'Coin (mo2r@virginia.edu)\n\nFor more information about the School of Data Science, please see www.datascience.virginia.edu\n\nFor more information about the University of Virginia and the Charlottesville community, please see www.virginia.edu/life/charlottesville and www.embarkuva.com\n\nThe selected candidate will be required to complete a background check at the time of the offer per University policy.\n\nPHYSICAL DEMANDS This is primarily a sedentary job involving extensive use of desktop computers. The job does occasionally require traveling some distance to attend meetings, and programs.\n\nThe University of Virginia, including the UVA Health System which represents the UVA Medical Center, Schools of Medicine and Nursing, UVA Physician’s Group and the Claude Moore Health Sciences Library, are fundamentally committed to the diversity of our faculty and staff. We believe diversity is excellence expressing itself through every person's perspectives and lived experiences. We are equal opportunity and affirmative action employers. All qualified applicants will receive consideration for employment without regard to age, color, disability, gender identity or expression, marital status, national or ethnic origin, political affiliation, race, religion, sex (including pregnancy), sexual orientation, veteran status, and family medical or genetic information.,'
"Artificial Intelligence Engineer - Company Description Shake - social networking \n Role Description This is a part-time hybrid role for an AI Software Engineer at SHAKE. As an AI Software Engineer, you will be responsible for the day-to-day tasks associated with pattern recognition, computer science, neural networks, software development, and natural language processing (NLP). This role is remote work.\n Qualifications Strong knowledge and experience in pattern recognition, computer science, and neural networksProficiency in software development, with a focus on AI technologiesExperience in natural language processing (NLP)Ability to work independently and remotelyExcellent problem-solving and analytical skillsStrong communication and collaboration skillsMaster's or Ph.D. in Computer Science, AI, or related fieldsRelevant industry certifications (e.g., TensorFlow, PyTorch) are a plus,"
'Senior Staff Data Scientist (Remote) - Company Description\n\nVericast is a big data company. We receive on average over 100 billion intent signals daily, which assist in generating a deep understanding of a person’s interest and in-market signals across 1,300 interest topics. This is coupled with strong geographic targeting, as over 30 billion location signals are collected daily from over one million retail stores and over 120 million households.\n\nData Science plays a crucial role in delivering our solutions today and will play a more prominent role in our future. A typical data science project has a solid mathematical foundation, an exploratory dimension, and a data-driven workflow. This is also true at Vericast. Our data science projects have strong foundations on machine learning, data engineering, and modeling. We are building a privacy-centric future of digital advertising by focusing on web content. We are connecting web content to consumer interest and action, ultimately driving which ads are shown on a webpage.\n\nTo continue our journey, we are seeking data science experts who are passionate about using cutting edge technology and conceiving innovative methods to solve unique and complex problems. As a Senior Staff Data Scientist at Vericast, your contributions will help us stay at the forefront of the AdTech industry.\n\nJob Description\n\nA Senior Staff Data Scientist is a hands-on expert who is passionate about all aspects of data science and can contribute by designing, conducting, and incorporating analyses of large-scale data from a wide variety of sources. This involves converting ambiguous requirements to concrete solutions for exploring data, designing and/or applying appropriate algorithms, documenting the findings, and incorporating the analysis into end-to-end solutions, systems, and platforms. Effective communication with other job disciplines is required. Contributions are expected at a level of results above and beyond entry-level and mid-level Data Scientists.\n\nKey Duties & Responsibilities\n\nHave a wider impact by providing insights and effective leadership into data science, digital media, and data engineering. This individual will have the hands-on skills to be an individual contributor and the experience for mentoring and leading other data scientists (25%)Act often as a technical lead, determining approach, objectives, requirements, features, milestones, implementation tasks, and tradeoffs of end-to-end large scale data science projects, platforms, and systems (25%)Act as a subject matter expert in data science (ML/AI) algorithms and underlying technologies (programming languages and systems) (15%)Design, conduct, and incorporate analyses of large-scale data from a wide variety of sources (15%)Work within the scrum practices in team projects (10%)Contribute to hiring process by screening higher level candidates, team interviews, manager candidates, i.e., act as a "Bar Raiser" (10%)\n\nQualifications\n\nEducation\n\nBachelor's Degree in a quantitative discipline (Computer Science, Mathematics, Engineering, Statistics) (Required)Master's Degree in a quantitative discipline (Computer Science, Mathematics, Engineering, Statistics) (Desired)Doctorate Degree (Preferred)In lieu of the above education requirements, a combination of experience and education will be considered.\n\nExperience\n\n8 - 10 years Relevant Experience (Required)\n\nKnowledge/Skills/Abilities\n\nStrong analytical skills, with expertise and solid understanding of multiple statistical/analytical machine learning techniques applied at large scale.Technical proficiency in ML algorithms, scalable ML platforms, languages, and tools (Python, Spark, ML/Ops) in a corporate setting is highly desirable.Ability to communicate effectively across multi-disciplinary teams (e.g., data science, engineering and product management, org leadership).Prior experience in applying Data Science in Digital Marketing Technology, Graph Theory, Privacy and Geolocation Data is a plus.\n\nAdditional Information\n\nSalary:$160,000-175,000\n\nThe ultimate compensation offered for the position will depend upon several factors such as skill level, cost of living, experience, and responsibilities.\n\nVericast offers a generous total rewards benefits package that includes medical, dental and vision coverage, 401K and flexible PTO. A wide variety of additional benefits like life insurance, employee assistance and pet insurance are also available, not to mention smart and friendly coworkers!\n\nAt Vericast, we don’t just accept differences - we celebrate them, we support them, and we thrive on them for the benefit of our employees, our clients, and our community.\u202fAs an Equal Opportunity employer, Vericast considers applicants for all positions without regard to race, color, creed, religion, national origin or ancestry, sex, sexual orientation, gender identity, age, disability, genetic information, veteran status, or any other classifications protected by law. Applicants who have disabilities may request that accommodations be made in order to complete the selection process by contacting our Talent Acquisition team at talentacquisition@vericast.com. EEO is the law. To review your rights under Equal Employment Opportunity please visit: www.dol.gov/ofccp/regs/compliance/posters/pdf/eeopost.pdf.\n\n,'

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("m-newhauser/setfit-ml-jobs")
# Run inference
preds = model("Data Scientist - AI Investment - Are you interested in revolutionising the future of AI investment?
My client is looking for a data scientist to tackle intricate business challenges through advanced analytics and machine learning techniques.
You will take charge of both technical prowess, overseeing the creation, implementation, and upkeep of sophisticated machine learning models and algorithms, including extensive language models.
This role offers an exceptional chance to make a substantial impact and establish yourself as a visionary in the realms of data science and AI.
Responsibilities:You'll spearhead the development and implementation of groundbreaking AI and data science solutions.Steering the strategic path of the data science community, remaining at the forefront of applied AI and AI research.Effectively communicating with stakeholders and influencing decision-making.Overseeing project delivery from inception to deployment, ensuring alignment with business goals.Identifying and integrating state-of-the-art technologies, tools, and methodologies to drive value through cost reduction, revenue generation, or enhanced customer experience.
Requirements:Proven AI research in finance industry. Ideally published with multiple citations. Ph.D./Masters/Bachelor's degree in computer science, mathematics, statistics, engineering, or relevant field from a top 10 university in the US or equivalent. Proficiency in key data science tools and methodologies, including Python, PyTorch, TensorFlow, Jax, Numpy, Scikit-learn, time-series forecasting, classification, regression, large-language models, and experiment design.A commitment to staying abreast of the latest advancements in AI research and a drive to continuously push boundaries.Extensive relevant work experience, encompassing a solid grasp of statistical data analysis, machine learning algorithms, and deep learning frameworks.
Join my client on this thrilling journey and contribute to shaping the future of data science and AI in the investment sector.,")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	116	700.0417	2183

Label	Training Sample Count
0	10
1	14

Training Hyperparameters

batch_size: (8, 8)
num_epochs: (4, 4)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.025	1	0.1975	-
1.25	50	0.0018	-
2.5	100	0.0002	-
3.75	150	0.0002	-

Framework Versions

Python: 3.10.12
SetFit: 1.0.3
Sentence Transformers: 3.0.0
Transformers: 4.39.0
PyTorch: 2.3.0+cu121
Datasets: 2.19.1
Tokenizers: 0.15.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

m-newhauser
/

setfit-ml-jobs