Upload README.md
Browse files
README.md
CHANGED
@@ -71,7 +71,7 @@ inference: false
|
|
71 |
|
72 |
Yoruba fine-tuned LLM using [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2).
|
73 |
|
74 |
-
|
75 |
|
76 |
### Usage (Sentence-Transformers)
|
77 |
|
@@ -81,7 +81,7 @@ Using this model becomes easy when you have [sentence-transformers](https://www.
|
|
81 |
pip install -U sentence-transformers
|
82 |
```
|
83 |
|
84 |
-
|
85 |
|
86 |
```python
|
87 |
from sentence_transformers import SentenceTransformer
|
@@ -92,10 +92,61 @@ embeddings = model.encode(sentences)
|
|
92 |
print(embeddings)
|
93 |
```
|
94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
### License
|
96 |
|
97 |
This project is licensed under the [MIT License](./LICENSE).
|
98 |
|
99 |
### Copyright
|
100 |
|
101 |
-
(c) 2024 [Finbarrs Oketunji](https://finbarrs.eu).
|
|
|
71 |
|
72 |
Yoruba fine-tuned LLM using [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2).
|
73 |
|
74 |
+
> Yoruba words typically consist of various combinations of vowels and consonants. The Yoruba language has a rich phonetic structure, including eighteen consonants and seven vowels. Words in Yoruba can vary in length and complexity, but they generally follow consistent patterns of syllable structure and pronunciation. Additionally, Yoruba words may include diacritical marks such as accents and underdots to indicate tone and vowel length; they are essential to the language's phonology and meaning.
|
75 |
|
76 |
### Usage (Sentence-Transformers)
|
77 |
|
|
|
81 |
pip install -U sentence-transformers
|
82 |
```
|
83 |
|
84 |
+
### Embeddings
|
85 |
|
86 |
```python
|
87 |
from sentence_transformers import SentenceTransformer
|
|
|
92 |
print(embeddings)
|
93 |
```
|
94 |
|
95 |
+
### Advanced Usage
|
96 |
+
|
97 |
+
```python
|
98 |
+
from sentence_transformers import SentenceTransformer, util
|
99 |
+
import torch
|
100 |
+
|
101 |
+
# Define sentences
|
102 |
+
sentences = [
|
103 |
+
"Kini olu ilu England",
|
104 |
+
"Kini eranko ti o gbona julọ ni agbaye?",
|
105 |
+
"Bawo ni o se le kọ ede Yoruba?",
|
106 |
+
"Kini ounje to gbajumo julọ ni Naijiria?",
|
107 |
+
"Iru aso wo ni a maa n wọ fun ijo Yoruba?"
|
108 |
+
]
|
109 |
+
|
110 |
+
# Load the model
|
111 |
+
model = SentenceTransformer('0xnu/pmmlv2-fine-tuned-yoruba')
|
112 |
+
|
113 |
+
# Compute embeddings
|
114 |
+
embeddings = model.encode(sentences, convert_to_tensor=True)
|
115 |
+
|
116 |
+
# Function to find the closest sentence
|
117 |
+
def find_closest_sentence(query_embedding, sentence_embeddings, sentences):
|
118 |
+
# Compute cosine similarities
|
119 |
+
cosine_scores = util.pytorch_cos_sim(query_embedding, sentence_embeddings)[0]
|
120 |
+
|
121 |
+
# Find the position of the highest score
|
122 |
+
best_match_index = torch.argmax(cosine_scores).item()
|
123 |
+
|
124 |
+
return sentences[best_match_index], cosine_scores[best_match_index].item()
|
125 |
+
|
126 |
+
query = "Kini olu ilu England"
|
127 |
+
query_embedding = model.encode(query, convert_to_tensor=True)
|
128 |
+
|
129 |
+
closest_sentence, similarity_score = find_closest_sentence(query_embedding, embeddings, sentences)
|
130 |
+
|
131 |
+
print(f"Query: {query}")
|
132 |
+
print(f"Closest matching sentence: {closest_sentence}")
|
133 |
+
print(f"Similarity score: {similarity_score:.4f}")
|
134 |
+
|
135 |
+
# You can also try with a new sentence not in the original list
|
136 |
+
new_query = "Kini oruko oba to wa ni ilu Oyo?"
|
137 |
+
new_query_embedding = model.encode(new_query, convert_to_tensor=True)
|
138 |
+
|
139 |
+
closest_sentence, similarity_score = find_closest_sentence(new_query_embedding, embeddings, sentences)
|
140 |
+
|
141 |
+
print(f"\nNew Query: {new_query}")
|
142 |
+
print(f"Closest matching sentence: {closest_sentence}")
|
143 |
+
print(f"Similarity score: {similarity_score:.4f}")
|
144 |
+
```
|
145 |
+
|
146 |
### License
|
147 |
|
148 |
This project is licensed under the [MIT License](./LICENSE).
|
149 |
|
150 |
### Copyright
|
151 |
|
152 |
+
(c) 2024 [Finbarrs Oketunji](https://finbarrs.eu).
|