Spaces:
Sleeping
Sleeping
Stefano Fiorucci
commited on
Commit
β’
261cff9
1
Parent(s):
1bde059
Improved README
Browse files
README.md
CHANGED
@@ -11,6 +11,7 @@ license: Apache-2.0
|
|
11 |
---
|
12 |
|
13 |
# Who killed Laura Palmer? [![Generic badge](https://img.shields.io/badge/π€-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
|
|
|
14 |
## π»π» Twin Peaks Question Answering system
|
15 |
|
16 |
WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [π Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
|
@@ -29,3 +30,23 @@ WKLP is a simple Question Answering system, based on data crawled from [Twin Pea
|
|
29 |
---
|
30 |
|
31 |
## What can I learn from this project? π
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
---
|
12 |
|
13 |
# Who killed Laura Palmer? [![Generic badge](https://img.shields.io/badge/π€-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
|
14 |
+
|
15 |
## π»π» Twin Peaks Question Answering system
|
16 |
|
17 |
WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [π Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
|
|
|
30 |
---
|
31 |
|
32 |
## What can I learn from this project? π
|
33 |
+
- How to quickly β build a modern Question Answering system using [π Haystack](https://github.com/deepset-ai/haystack)
|
34 |
+
- How to generate questions based on your documents
|
35 |
+
- How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system
|
36 |
+
- How to optimize the web app to π deploy in [π€ Spaces](https://huggingface.co/spaces)
|
37 |
+
|
38 |
+
## Repository structure π
|
39 |
+
- [app.py](./app.py): Streamlit web app
|
40 |
+
- [app_utils folder](./app_utils/): python modules used in the web app
|
41 |
+
- [crawler folder](./crawler/): Twin Peaks crawler, developed with Scrapy and fandom-py
|
42 |
+
- [notebooks folder](./notebooks/): Jupyter/Colab notebooks to create the Search pipeline and generate questions (using Haystack)
|
43 |
+
- [data folder](./data/): all necessary data
|
44 |
+
|
45 |
+
Within each folder, you can find more in-depth explanations.
|
46 |
+
|
47 |
+
## Possible improvements β¨
|
48 |
+
- The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader).
|
49 |
+
- You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model).
|
50 |
+
- ...
|
51 |
+
|
52 |
+
|