Corianas commited on
Commit
0b412d8
·
verified ·
1 Parent(s): 332bcb4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -1,3 +1,32 @@
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
  ---
4
+ A llama.c model based on Karpathy's Llama2.c project. https://github.com/karpathy/llama2.c
5
+ Vocab of 4096, trained on Tinystories, and my custom littlestories dataset (currently unreleased.)
6
+
7
+ Model uses ↨ as a shift key, instead of using capial letters, this allowed simplification of the tokenizer to avoid duplicates that are uppercase.
8
+
9
+ To convert normal text to the right format I use:
10
+ ```
11
+ def add_caseifer(text):
12
+ # Using list comprehension for more efficient concatenation
13
+ return ''.join(['↨' + char.lower() if char.isupper() else char for char in text])
14
+ ```
15
+
16
+ To return the text to human format I use:
17
+ ```
18
+ def remove_caseifer(text):
19
+ new_text = ""
20
+ i = 0
21
+ while i < len(text):
22
+ if text[i] == "↨":
23
+ if i+1 < len(text):
24
+ new_text += text[i+1].upper()
25
+ i += 1
26
+ else:
27
+ pass # skip this index
28
+ else:
29
+ new_text += text[i]
30
+ i += 1
31
+ return new_text
32
+ ```