davidmezzetti
commited on
Commit
•
7fca0dc
1
Parent(s):
986d2dd
Update README
Browse files
README.md
CHANGED
@@ -53,7 +53,7 @@ See this [article](https://neuml.hashnode.dev/embeddings-in-the-cloud) for addit
|
|
53 |
|
54 |
## Evaluation Results
|
55 |
|
56 |
-
Performance was evaluated using the [NDCG@10](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) score with a [custom question-answer evaluation set](https://github.com/neuml/
|
57 |
|
58 |
| Model | NDCG@10 | MAP@10 |
|
59 |
| ---------------------------------------------------------- | ---------- | --------- |
|
@@ -69,14 +69,14 @@ The following steps show how to build this index. These scripts are using the la
|
|
69 |
|
70 |
- Install required build dependencies
|
71 |
```bash
|
72 |
-
pip install
|
73 |
```
|
74 |
|
75 |
- Download and build pageviews database
|
76 |
```bash
|
77 |
mkdir -p pageviews/data
|
78 |
wget -P pageviews/data https://dumps.wikimedia.org/other/pageview_complete/monthly/2024/2024-08/pageviews-202408-user.bz2
|
79 |
-
python -m
|
80 |
```
|
81 |
|
82 |
- Build Wikipedia dataset
|
@@ -94,7 +94,7 @@ ds.save_to_disk(f"wikipedia-{date}")
|
|
94 |
|
95 |
- Build txtai-wikipedia index
|
96 |
```bash
|
97 |
-
python -m
|
98 |
-d wikipedia-20240901 \
|
99 |
-o txtai-wikipedia \
|
100 |
-v pageviews/pageviews.sqlite
|
|
|
53 |
|
54 |
## Evaluation Results
|
55 |
|
56 |
+
Performance was evaluated using the [NDCG@10](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) score with a [custom question-answer evaluation set](https://github.com/neuml/ragdata/tree/master/datasets/wikipedia). Results are shown below.
|
57 |
|
58 |
| Model | NDCG@10 | MAP@10 |
|
59 |
| ---------------------------------------------------------- | ---------- | --------- |
|
|
|
69 |
|
70 |
- Install required build dependencies
|
71 |
```bash
|
72 |
+
pip install ragdata mwparserfromhell
|
73 |
```
|
74 |
|
75 |
- Download and build pageviews database
|
76 |
```bash
|
77 |
mkdir -p pageviews/data
|
78 |
wget -P pageviews/data https://dumps.wikimedia.org/other/pageview_complete/monthly/2024/2024-08/pageviews-202408-user.bz2
|
79 |
+
python -m ragdata.wikipedia.views -p en.wikipedia -v pageviews
|
80 |
```
|
81 |
|
82 |
- Build Wikipedia dataset
|
|
|
94 |
|
95 |
- Build txtai-wikipedia index
|
96 |
```bash
|
97 |
+
python -m ragdata.wikipedia.index \
|
98 |
-d wikipedia-20240901 \
|
99 |
-o txtai-wikipedia \
|
100 |
-v pageviews/pageviews.sqlite
|