File size: 3,141 Bytes
1cc5bb0
 
0b5c5aa
117a821
823c0be
64703c4
823c0be
 
 
64703c4
 
823c0be
 
cf25467
64703c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cf25467
64703c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#streamlit run app.py

import streamlit as st

st.set_page_config(
    page_title="JobFair: Fairness Benchmark",
    page_icon="👋",
)

st.title('JobFair: A Benchmark for Fairness in LLM Employment Decision-Making')
st.write("Welcome to JobFair! This benchmark is designed to evaluate the fairness of language models in employment decision-making. Our goal is to provide a comprehensive tool for analyzing potential biases in how language models score resumes and make hiring recommendations.")

st.markdown(
    """
    ## About JobFair

    The JobFair benchmark enables users to:
    - **Upload and process** resumes to be evaluated by language models.
    - **Analyze fairness** through various statistical tests, correlations, and divergences.
    - **Download detailed evaluation results** for further review and reporting.

    ### Key Features

    - **Fairness Analysis**: Perform a variety of statistical tests to uncover potential biases in language model evaluations.
    - **Comprehensive Reporting**: Generate detailed reports on the fairness of LLMs, including visualizations and downloadable data.
    - **User-Friendly Interface**: Easily upload data, run analyses, and download results through an intuitive web interface.

    ### How to Use

    1. **Upload Data**: Start by uploading a CSV file containing the resumes and their respective scores.
    2. **Run Evaluations**: Use the provided tools to perform statistical analyses and visualize the results.
    3. **Download Results**: Export the analysis results for further examination and reporting.

    We hope JobFair helps you in making more informed and fair employment decisions using language models.
    """
)

# Sidebar content
st.sidebar.title("Demos")

st.sidebar.subheader("Injection Demo")
st.sidebar.markdown(
    """
    In this demo, you can upload a dataset of resumes and use our language models to process and score them based on various parameters.

    - **Model Settings**: Configure your model settings by selecting the type of agent (GPTAgent or AzureAgent), and specifying the API key, endpoint URL, model name, temperature, and max tokens.
    - **Data Upload**: Choose to upload your own CSV file or use an example dataset.
    - **Process Data**: Enter the relevant details such as occupation, group name, privilege label, and protect label. Specify the number of runs and process the data to get the model's scores.
    - **Download Results**: After processing, download the generated results as a CSV file.
    """
)

st.sidebar.subheader("Evaluation Demo")
st.sidebar.markdown(
    """
    In this demo, you can evaluate the fairness of the scores generated by the language models.

    - **Upload Results**: Upload the CSV file containing the processed results from the injection demo.
    - **Statistical Tests**: Perform a variety of statistical tests to evaluate potential biases in the scores.
    - **Correlations and Divergences**: Calculate correlations and divergences to further analyze the fairness of the results.
    - **Download Evaluation**: Download the comprehensive evaluation results for further analysis.
    """
)