Spaces:

anakin87
/

who-killed-laura-palmer

Sleeping

App Files Files Community

who-killed-laura-palmer / README.md

Stefano Fiorucci

Improved README

261cff9 over 2 years ago

preview code

raw

history blame

2.82 kB

	---
	title: Who killed Laura Palmer?
	emoji: 🗻🗻
	colorFrom: blue
	colorTo: green
	sdk: streamlit
	sdk_version: 1.2.0
	app_file: app.py
	pinned: false
	license: Apache-2.0
	---

	# Who killed Laura Palmer?   [![Generic badge](https://img.shields.io/badge/🤗-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)

	## 🗻🗻 Twin Peaks Question Answering system

	WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [🔍 Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.

	---

	## Project architecture 🧱

	![Project architecture](./data/readme_images/project_architecture.png)

	* Crawler: implemented using [Scrapy](https://github.com/scrapy/scrapy) and [fandom-py](https://github.com/NikolajDanger/fandom-py)
	* Question Answering pipelines: created with [Haystack](https://github.com/deepset-ai/haystack)
	* Web app: developed with [Streamlit](https://github.com/streamlit/streamlit)
	* Free hosting: [Hugging Face Spaces](https://huggingface.co/spaces)

	---

	## What can I learn from this project? 📚
	- How to quickly ⌚ build a modern Question Answering system using [🔍 Haystack](https://github.com/deepset-ai/haystack)
	- How to generate questions based on your documents
	- How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system
	- How to optimize the web app to 🚀 deploy in [🤗 Spaces](https://huggingface.co/spaces)

	## Repository structure 📁
	- [app.py](./app.py): Streamlit web app
	- [app_utils folder](./app_utils/): python modules used in the web app
	- [crawler folder](./crawler/): Twin Peaks crawler, developed with Scrapy and fandom-py
	- [notebooks folder](./notebooks/): Jupyter/Colab notebooks to create the Search pipeline and generate questions (using Haystack)
	- [data folder](./data/): all necessary data

	Within each folder, you can find more in-depth explanations.

	## Possible improvements ✨
	- The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader).
	- You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model).
	- ...