sampoorna42
/

gujju-llama-base-v1.0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

khyat commited on Mar 19

Commit

83fc9bb

•

1 Parent(s): 7a1fc4a

Update README.md

Files changed (1) hide show

README.md +0 -10

README.md CHANGED Viewed

@@ -23,16 +23,6 @@ Unveiling the debut of Gujju-Llama 7B Base model, offering researchers and devel
 - **Training Precision** float16
 - **License:** GNU General Public License v3.0
-### Gujarati Tokenization
-- Prior to pre-training, the base Llama-2 model lacked the ability to recognize Gujarati characters. As illustrated in the tokenization process below:
-- Sample English sentence:- ISRO created history.
-- Sample Gujarati sentence:- ઈસરોએ ઈતિહાસ રચ્યો.
-**Base Llama-2 Tokenization:** ['▁', '<0xE0>', '<0xAA>', '<0x88>', '<0xE0>', '<0xAA>', '<0xB8>', '<0xE0>', '<0xAA>', '<0xB0>', '<0xE0>', '<0xAB>', '<0x8B>', '<0xE0>', '<0xAA>', '<0x8F>', '▁', '<0xE0>', '<0xAA>', '<0x88>', '<0xE0>', '<0xAA>', '<0xA4>', '<0xE0>', '<0xAA>', '<0xBF>', '<0xE0>', '<0xAA>', '<0xB9>', 'ા', '<0xE0>', '<0xAA>', '<0xB8>', '▁', '<0xE0>', '<0xAA>', '<0xB0>', '<0xE0>', '<0xAA>', '<0x9A>', '<0xE0>', '<0xAB>', '<0x8D>', '<0xE0>', '<0xAA>', '<0xAF>', '<0xE0>', '<0xAB>', '<0x8B>', '.']
-**Gujju-Llama Tokenization:** ['ઈસ', 'રો', 'એ', '▁ઈ', 'તિ', 'હા', 'સ', '▁રચ્યો', '.']
 ## Usage Note
 These models possess impressive linguistic skills, but it's important to remember they haven't been specifically optimized to avoid potentially harmful or offensive content. To mitigate this risk, we advise users to:

 - **Training Precision** float16
 - **License:** GNU General Public License v3.0
 ## Usage Note
 These models possess impressive linguistic skills, but it's important to remember they haven't been specifically optimized to avoid potentially harmful or offensive content. To mitigate this risk, we advise users to: