khyat commited on
Commit
83fc9bb
1 Parent(s): 7a1fc4a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -10
README.md CHANGED
@@ -23,16 +23,6 @@ Unveiling the debut of Gujju-Llama 7B Base model, offering researchers and devel
23
  - **Training Precision** float16
24
  - **License:** GNU General Public License v3.0
25
 
26
- ### Gujarati Tokenization
27
-
28
- - Prior to pre-training, the base Llama-2 model lacked the ability to recognize Gujarati characters. As illustrated in the tokenization process below:
29
- - Sample English sentence:- ISRO created history.
30
- - Sample Gujarati sentence:- ઈસરોએ ઈતિહાસ રચ્યો.
31
-
32
- **Base Llama-2 Tokenization:** ['▁', '<0xE0>', '<0xAA>', '<0x88>', '<0xE0>', '<0xAA>', '<0xB8>', '<0xE0>', '<0xAA>', '<0xB0>', '<0xE0>', '<0xAB>', '<0x8B>', '<0xE0>', '<0xAA>', '<0x8F>', '▁', '<0xE0>', '<0xAA>', '<0x88>', '<0xE0>', '<0xAA>', '<0xA4>', '<0xE0>', '<0xAA>', '<0xBF>', '<0xE0>', '<0xAA>', '<0xB9>', 'ા', '<0xE0>', '<0xAA>', '<0xB8>', '▁', '<0xE0>', '<0xAA>', '<0xB0>', '<0xE0>', '<0xAA>', '<0x9A>', '<0xE0>', '<0xAB>', '<0x8D>', '<0xE0>', '<0xAA>', '<0xAF>', '<0xE0>', '<0xAB>', '<0x8B>', '.']
33
-
34
- **Gujju-Llama Tokenization:** ['ઈસ', 'રો', 'એ', '▁ઈ', 'તિ', 'હા', 'સ', '▁રચ્યો', '.']
35
-
36
  ## Usage Note
37
 
38
  These models possess impressive linguistic skills, but it's important to remember they haven't been specifically optimized to avoid potentially harmful or offensive content. To mitigate this risk, we advise users to:
 
23
  - **Training Precision** float16
24
  - **License:** GNU General Public License v3.0
25
 
 
 
 
 
 
 
 
 
 
 
26
  ## Usage Note
27
 
28
  These models possess impressive linguistic skills, but it's important to remember they haven't been specifically optimized to avoid potentially harmful or offensive content. To mitigate this risk, we advise users to: