Spaces:
Runtime error
Runtime error
title: Auto Evaluator | |
emoji: :brain | |
colorFrom: blue | |
colorTo: yellow | |
sdk: streamlit | |
sdk_version: 1.19.0 | |
app_file: app.py | |
pinned: false | |
license: mit | |
# `Auto-evaluator` :brain: :memo: | |
This is a lightweight evaluation tool for question-answering using `Langchain` to: | |
- Ask the user to input a set of documents of interest | |
- Apply an LLM (`GPT-3.5-turbo`) to auto-generate `question`-`answer` pairs from these docs | |
- Generate a question-answering chain with a specified set of UI-chosen configurations | |
- Use the chain to generate a response to each `question` | |
- Use an LLM (`GPT-3.5-turbo`) to score the response relative to the `answer` | |
- Explore scoring across various chain configurations | |
**Run as Streamlit app** | |
`pip install -r requirements.txt` | |
`streamlit run auto-evaluator.py` | |
**Inputs** | |
`num_eval_questions` - Number of questions to auto-generate (if the user does not supply an eval set) | |
`split_method` - Method for text splitting | |
`chunk_chars` - Chunk size for text splitting | |
`overlap` - Chunk overlap for text splitting | |
`embeddings` - Embedding method for chunks | |
`retriever_type` - Chunk retrieval method | |
`num_neighbors` - Neighbors for retrieval | |
`model` - LLM for summarization of retrieved chunks | |
`grade_prompt` - Prompt choice for model self-grading | |
**Blog** | |
https://blog.langchain.dev/auto-eval-of-question-answering-tasks/ | |
**UI** | |
![image](https://user-images.githubusercontent.com/122662504/233218347-de10cf41-6230-47a7-aa9e-8ab01673b87a.png) | |
**Hosted app** | |
See: | |
https://github.com/langchain-ai/auto-evaluator | |
And: | |
https://autoevaluator.langchain.com/ | |
**Disclaimer** | |
```You will need an OpenAI API key with access to `GPT-4` and an Anthropic API key to take advantage of all of the default dashboard model settings. However, additional models (e.g., from Hugging Face) can be easily added to the app.``` |