Spaces:
Sleeping
Sleeping
metadata
title: Who killed Laura Palmer?
emoji: π»π»
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.2.0
app_file: app.py
pinned: false
license: Apache-2.0
Who killed Laura Palmer?
π»π» Twin Peaks Question Answering system
WKLP is a simple Question Answering system, based on data crawled from Twin Peaks Wiki. It is built using π Haystack, an awesome open-source framework for building search systems that work intelligently over large document collections.
Project architecture π§±
- Crawler: implemented using Scrapy and fandom-py
- Question Answering pipelines: created with Haystack
- Web app: developed with Streamlit
- Free hosting: Hugging Face Spaces
What can I learn from this project? π
- How to quickly β build a modern Question Answering system using π Haystack
- How to generate questions based on your documents
- How to build a nice Streamlit web app to show your QA system
- How to optimize the web app to π deploy in π€ Spaces
Repository structure π
- app.py: Streamlit web app
- app_utils folder: python modules used in the web app
- crawler folder: Twin Peaks crawler, developed with Scrapy and fandom-py
- notebooks folder: Jupyter/Colab notebooks to create the Search pipeline and generate questions (using Haystack)
- data folder: all necessary data
Within each folder, you can find more in-depth explanations.
Possible improvements β¨
- The reader model (
deepset/roberta-base-squad2
) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the Haystack documentation. - You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in Haystack tutorial.
- ...