import streamlit as st def developer_guide(): st.title("For Developers: Create Your Own Bot that Knows Your CV with Hugging Face and RAG") st.subheader("Build Intelligent CV-Aware Assistants with Retrieval-Augmented Generation") st.markdown(""" ### Introduction This guide is designed for developers who want to go beyond CV enhancement and create personalized bots that can intelligently interact with and understand their CV. By combining the power of Hugging Face models with Retrieval-Augmented Generation (RAG), you can build bots capable of fetching relevant information from your CV and generating insightful, context-aware responses. If you're ready to dive into the technical side of AI-driven CV assistants, this step-by-step guide will help you get started. """) st.markdown(""" ### What is Retrieval-Augmented Generation (RAG)? RAG is a technique that combines traditional information retrieval with powerful generation models, such as GPTs. It allows a model to "retrieve" relevant information from a predefined dataset (like your CV) before generating responses. This ensures that the bot can reference specific details from your CV, making it highly personalized and accurate. With RAG, your bot won't just generate generic answers—it will craft responses based on your unique career history and experiences. """) st.markdown(""" ### Step-by-Step Guide to Building Your RAG-based CV Assistant **Step 1: Prepare Your CV Data** Convert your CV into a structured format such as JSON, CSV, or even plain text. Break it down into logical sections like work experience, education, skills, and achievements. This data will be used as the retrieval base for the RAG model. Example structure: ```json { "experience": [ {"title": "Software Engineer", "company": "TechCorp", "years": "2019-2022", "details": "Worked on AI solutions..."}, {"title": "Data Scientist", "company": "DataWorks", "years": "2017-2019", "details": "Led data projects..."} ], "education": [ {"degree": "MSc in Computer Science", "institution": "XYZ University", "years": "2015-2017"} ], "skills": ["Python", "Machine Learning", "NLP"] } ``` **Step 2: Use Hugging Face's Transformers for RAG** Use Hugging Face's `transformers` library to load a pre-trained RAG model. This model will handle both the retrieval of relevant sections of your CV and the generation of a coherent response based on the user’s input. Example setup: ```python from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration # Load tokenizer and model tokenizer = RagTokenizer.from_pretrained('facebook/rag-token-base') retriever = RagRetriever.from_pretrained('facebook/rag-token-base', index_name='custom', passages=[...]) model = RagTokenForGeneration.from_pretrained('facebook/rag-token-base', retriever=retriever) # Tokenize user input inputs = tokenizer("Tell me about my work at TechCorp", return_tensors="pt") # Generate a response using RAG generated = model.generate(input_ids=inputs['input_ids']) response = tokenizer.batch_decode(generated, skip_special_tokens=True) print(response) ``` **Step 3: Customize the Retrieval** You'll need to define how your bot retrieves the most relevant sections of your CV. This can involve tweaking the retriever’s configuration or pre-processing your CV data to improve retrieval accuracy. You can use embeddings and similarity search to match the user's queries with specific sections of the CV. **Step 4: Deploy Your Bot** Once your bot is working, you can deploy it using Hugging Face's Inference API or host it as a service. You can integrate it into your website or app to interact with your users. Consider embedding the bot in your personal website’s career page, or creating a chatbot to handle CV-related questions for job applications. **Step 5: Improve with Feedback** Continuously improve your bot by adding more CV data, refining the retrieval process, or training a fine-tuned RAG model to better understand the language of your career domain. """) st.markdown(""" ### Ready to Build? Follow these steps, and you'll have a personalized CV assistant bot powered by Hugging Face and RAG. Start building now, and see how AI can revolutionize the way you present your career! For more details on RAG and Hugging Face models, check the [official Hugging Face documentation](https://huggingface.co/docs/transformers/model_doc/rag). """) # Call the function to display the developer guide page #developer_guide() import streamlit as st from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.document_loaders import TextLoader from langchain_community.vectorstores import FAISS from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_together import TogetherEmbeddings from langchain_community.llms import Together import PyPDF2 import os # Function to read text from PDF def read_pdf(file): pdf_reader = PyPDF2.PdfReader(file) text = "" for page in pdf_reader.pages: text += page.extract_text() return text # Load and split resume data def load_and_split_resume(text): documents = [text] # Wrapping text in a list to be consistent with TextLoader input text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(documents) return docs # Create vector store and retriever def setup_vector_store(docs): vectorstore = FAISS.from_documents(docs, TogetherEmbeddings(model="togethercomputer/m2-bert-80M-8k-retrieval")) retriever = vectorstore.as_retriever() return retriever # Set up language model def setup_model(): model = Together( model="mistralai/Mixtral-8x7B-Instruct-v0.1", temperature=0.0, max_tokens=500, top_k=0 ) return model # Generate answer based on context and question def generate_answer(question, retriever, model, name="The candidate"): context_instruction = ( f"You are {name}, and your professional experience is outlined in the following resume. " "Answer the question as if you are the candidate, providing details from the resume where relevant." ) # Retrieve relevant documents context_docs = retriever.retrieve(question) context = " ".join([doc.page_content for doc in context_docs]) # Prepare the prompt template = """[INST] answer from context only as if the person is responding (use 'I' instead of 'you' in response). Always answer in short. If asked about greeting, greet back. {context} Question: {question} [/INST]""" prompt = ChatPromptTemplate.from_template(template) # Create the chain with the retriever, prompt, and model chain = ( {"context": context, "question": question} | prompt | model | StrOutputParser() ) answer = chain.invoke() return answer # Streamlit app UI st.title("Resume-based Q&A Bot (Streamlit with Together)") st.write("Upload your resume and ask questions about your professional experience!") # File uploader for the resume uploaded_file = st.file_uploader("Upload your resume (PDF format)", type=["pdf"]) if uploaded_file is not None: resume_text = read_pdf(uploaded_file) # Load and process the resume docs = load_and_split_resume(resume_text) retriever = setup_vector_store(docs) model = setup_model() st.write("Resume successfully uploaded and processed!") # Text input for questions question = st.text_input("Ask a question about the resume") # Name input for the person in the resume candidate_name = st.text_input("Enter the candidate's name (optional)", "The candidate") # Generate and display the answer when the button is clicked if st.button("Generate Answer"): if question: answer = generate_answer(question, retriever, model, candidate_name) st.write("Answer:") st.write(answer) else: st.write("Please enter a question.") else: st.write("Please upload a PDF resume to get started.")