burgerbee
/

txtai-sv-wikipedia

Sentence Similarity

Model card Files Files and versions Community

burgerbee commited on Mar 21, 2024

Commit

70eeb6d

·

verified ·

1 Parent(s): ead5e9e

Update README.md

Files changed (1) hide show

README.md +4 -9

README.md CHANGED Viewed

@@ -14,8 +14,11 @@ datasets:
 # Wikipedia txtai embeddings index
 This is a [txtai](https://github.com/neuml/txtai) embeddings index for the [Swedish edition of Wikipedia](https://sv.wikipedia.org/).
-This index is built from the [Wikipedia Februari 2024 dataset](https://huggingface.co/datasets/burgerbee/wikipedia-sv-20240220). Only the first two paragraph from each article is included.
 It also uses [Wikipedia Page Views](https://dumps.wikimedia.org/other/pageviews/readme.html) data to add a `percentile` field. The `percentile` field can be used
 to only match commonly visited pages.
@@ -41,14 +44,6 @@ for x in embeddings.search("SELECT id, text, score, percentile FROM txtai WHERE
   print(json.dumps(x, indent=2))
 ```
-## Use Cases
-An embeddings index generated by txtai is a fully encapsulated index format. It doesn't require a database server or dependencies outside of the Python install.
-The Wikipedia index works well as a fact-based context source for retrieval augmented generation (RAG). In other words, search results from this model can be passed to LLM prompts as the context in which to answer questions.
-See this [article](https://neuml.hashnode.dev/embeddings-in-the-cloud) for additional examples on how to use this model.
 # Source
 https://dumps.wikimedia.org/svwiki/20240220/dumpstatus.json

 # Wikipedia txtai embeddings index
 This is a [txtai](https://github.com/neuml/txtai) embeddings index for the [Swedish edition of Wikipedia](https://sv.wikipedia.org/).
+Embeddings is the engine that delivers semantic search. Data is transformed into embeddings vectors where similar concepts will produce similar vectors.
+An embeddings index generated by txtai is a fully encapsulated index format. It DOESN'T require a database server.
+This index is built from the [Wikipedia Februari 2024 dataset](https://huggingface.co/datasets/burgerbee/wikipedia-sv-20240220).
+Only the first two paragraph from each article is included. The Wikipedia index works well as a fact-based context source for retrieval augmented generation (RAG).
 It also uses [Wikipedia Page Views](https://dumps.wikimedia.org/other/pageviews/readme.html) data to add a `percentile` field. The `percentile` field can be used
 to only match commonly visited pages.
   print(json.dumps(x, indent=2))
 ```
 # Source
 https://dumps.wikimedia.org/svwiki/20240220/dumpstatus.json