nisten commited on
Commit
ee1406d
1 Parent(s): 61c0c24

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -3
README.md CHANGED
@@ -4,10 +4,19 @@ datasets:
4
  - LDJnr/Capybara
5
  ---
6
 
7
- ###EVEN SMALLER Frankenstein of smolLm-0.13b upped to 0.15b
8
  Use this frankenbase for training.
 
 
9
 
10
- If you're here from twitter and imatient, get the trained checkpoint file.
 
 
 
 
 
 
 
11
 
12
  ```verilog
13
  wget https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/resolve/main/biggie_groked_int8_q8_0.gguf
@@ -31,7 +40,7 @@ We're talking about a convergence of whole bunch of stuff, more papers will be w
31
  2. **BitNet Integration**:
32
  4. **Experimental GrokAdamW Optimizer**:
33
 
34
- ## Acknodledgements
35
 
36
  Credits for optimizer go to [@cognitivecompai](https://github.com/cognitivecomputations/grokadamw) for laying the groundwork with the original GrokAdamW optimizer.
37
 
 
4
  - LDJnr/Capybara
5
  ---
6
 
7
+ ### EVEN SMALLER Frankenstein of smolLm-0.13b upped to 0.18b
8
  Use this frankenbase for training.
9
+ Sorry for the mislabelling, the model is a 0.18b 181m parameter, not 0.15.
10
+ I did not except this repo to blow up and now all the training scripts depend on it.
11
 
12
+ * ACKOWLEDGE HF PAGE ON YOUR FUTURE PAPERS OR I WILL DRAG YOUR ORG ON TWITTER LIKE I DID WITH COHERE LOL
13
+
14
+ >>[!TIP]🐧 If you're here from twitter and imatient, get the trained checkpoint file that runs on 1 cpu core:
15
+ >>
16
+ >>make sure to install latest llama.cpp first, it's easy on linux & mac:
17
+ >> git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
18
+
19
+ Now for the magic trained finetune that runs at insane speeds:
20
 
21
  ```verilog
22
  wget https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/resolve/main/biggie_groked_int8_q8_0.gguf
 
40
  2. **BitNet Integration**:
41
  4. **Experimental GrokAdamW Optimizer**:
42
 
43
+ ## Prior work, from last week
44
 
45
  Credits for optimizer go to [@cognitivecompai](https://github.com/cognitivecomputations/grokadamw) for laying the groundwork with the original GrokAdamW optimizer.
46