0xnu commited on
Commit
edbbc61
·
verified ·
1 Parent(s): 5ef7eed

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -3
README.md CHANGED
@@ -71,7 +71,7 @@ inference: false
71
 
72
  Yoruba fine-tuned LLM using [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2).
73
 
74
- [Yoruba](https://en.wikipedia.org/wiki/Yoruba_language) words typically consist of various combinations of vowels and consonants. The Yoruba language has a rich phonetic structure, including eighteen consonants and seven vowels. Words in Yoruba can vary in length and complexity, but they generally follow consistent patterns of syllable structure and pronunciation. Additionally, Yoruba words may include diacritical marks such as accents and underdots to indicate tone and vowel length; they are essential to the language's phonology and meaning.
75
 
76
  ### Usage (Sentence-Transformers)
77
 
@@ -81,7 +81,7 @@ Using this model becomes easy when you have [sentence-transformers](https://www.
81
  pip install -U sentence-transformers
82
  ```
83
 
84
- Then you can use the model like this:
85
 
86
  ```python
87
  from sentence_transformers import SentenceTransformer
@@ -92,10 +92,61 @@ embeddings = model.encode(sentences)
92
  print(embeddings)
93
  ```
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ### License
96
 
97
  This project is licensed under the [MIT License](./LICENSE).
98
 
99
  ### Copyright
100
 
101
- (c) 2024 [Finbarrs Oketunji](https://finbarrs.eu).
 
71
 
72
  Yoruba fine-tuned LLM using [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2).
73
 
74
+ > Yoruba words typically consist of various combinations of vowels and consonants. The Yoruba language has a rich phonetic structure, including eighteen consonants and seven vowels. Words in Yoruba can vary in length and complexity, but they generally follow consistent patterns of syllable structure and pronunciation. Additionally, Yoruba words may include diacritical marks such as accents and underdots to indicate tone and vowel length; they are essential to the language's phonology and meaning.
75
 
76
  ### Usage (Sentence-Transformers)
77
 
 
81
  pip install -U sentence-transformers
82
  ```
83
 
84
+ ### Embeddings
85
 
86
  ```python
87
  from sentence_transformers import SentenceTransformer
 
92
  print(embeddings)
93
  ```
94
 
95
+ ### Advanced Usage
96
+
97
+ ```python
98
+ from sentence_transformers import SentenceTransformer, util
99
+ import torch
100
+
101
+ # Define sentences
102
+ sentences = [
103
+ "Kini olu ilu England",
104
+ "Kini eranko ti o gbona julọ ni agbaye?",
105
+ "Bawo ni o se le kọ ede Yoruba?",
106
+ "Kini ounje to gbajumo julọ ni Naijiria?",
107
+ "Iru aso wo ni a maa n wọ fun ijo Yoruba?"
108
+ ]
109
+
110
+ # Load the model
111
+ model = SentenceTransformer('0xnu/pmmlv2-fine-tuned-yoruba')
112
+
113
+ # Compute embeddings
114
+ embeddings = model.encode(sentences, convert_to_tensor=True)
115
+
116
+ # Function to find the closest sentence
117
+ def find_closest_sentence(query_embedding, sentence_embeddings, sentences):
118
+ # Compute cosine similarities
119
+ cosine_scores = util.pytorch_cos_sim(query_embedding, sentence_embeddings)[0]
120
+
121
+ # Find the position of the highest score
122
+ best_match_index = torch.argmax(cosine_scores).item()
123
+
124
+ return sentences[best_match_index], cosine_scores[best_match_index].item()
125
+
126
+ query = "Kini olu ilu England"
127
+ query_embedding = model.encode(query, convert_to_tensor=True)
128
+
129
+ closest_sentence, similarity_score = find_closest_sentence(query_embedding, embeddings, sentences)
130
+
131
+ print(f"Query: {query}")
132
+ print(f"Closest matching sentence: {closest_sentence}")
133
+ print(f"Similarity score: {similarity_score:.4f}")
134
+
135
+ # You can also try with a new sentence not in the original list
136
+ new_query = "Kini oruko oba to wa ni ilu Oyo?"
137
+ new_query_embedding = model.encode(new_query, convert_to_tensor=True)
138
+
139
+ closest_sentence, similarity_score = find_closest_sentence(new_query_embedding, embeddings, sentences)
140
+
141
+ print(f"\nNew Query: {new_query}")
142
+ print(f"Closest matching sentence: {closest_sentence}")
143
+ print(f"Similarity score: {similarity_score:.4f}")
144
+ ```
145
+
146
  ### License
147
 
148
  This project is licensed under the [MIT License](./LICENSE).
149
 
150
  ### Copyright
151
 
152
+ (c) 2024 [Finbarrs Oketunji](https://finbarrs.eu).