Update README.md
Browse files
README.md
CHANGED
@@ -4,10 +4,19 @@ datasets:
|
|
4 |
- LDJnr/Capybara
|
5 |
---
|
6 |
|
7 |
-
###EVEN SMALLER Frankenstein of smolLm-0.13b upped to 0.
|
8 |
Use this frankenbase for training.
|
|
|
|
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
```verilog
|
13 |
wget https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/resolve/main/biggie_groked_int8_q8_0.gguf
|
@@ -31,7 +40,7 @@ We're talking about a convergence of whole bunch of stuff, more papers will be w
|
|
31 |
2. **BitNet Integration**:
|
32 |
4. **Experimental GrokAdamW Optimizer**:
|
33 |
|
34 |
-
##
|
35 |
|
36 |
Credits for optimizer go to [@cognitivecompai](https://github.com/cognitivecomputations/grokadamw) for laying the groundwork with the original GrokAdamW optimizer.
|
37 |
|
|
|
4 |
- LDJnr/Capybara
|
5 |
---
|
6 |
|
7 |
+
### EVEN SMALLER Frankenstein of smolLm-0.13b upped to 0.18b
|
8 |
Use this frankenbase for training.
|
9 |
+
Sorry for the mislabelling, the model is a 0.18b 181m parameter, not 0.15.
|
10 |
+
I did not except this repo to blow up and now all the training scripts depend on it.
|
11 |
|
12 |
+
* ACKOWLEDGE HF PAGE ON YOUR FUTURE PAPERS OR I WILL DRAG YOUR ORG ON TWITTER LIKE I DID WITH COHERE LOL
|
13 |
+
|
14 |
+
>>[!TIP]🐧 If you're here from twitter and imatient, get the trained checkpoint file that runs on 1 cpu core:
|
15 |
+
>>
|
16 |
+
>>make sure to install latest llama.cpp first, it's easy on linux & mac:
|
17 |
+
>> git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
|
18 |
+
|
19 |
+
Now for the magic trained finetune that runs at insane speeds:
|
20 |
|
21 |
```verilog
|
22 |
wget https://huggingface.co/nisten/Biggie-SmoLlm-0.15B-Base/resolve/main/biggie_groked_int8_q8_0.gguf
|
|
|
40 |
2. **BitNet Integration**:
|
41 |
4. **Experimental GrokAdamW Optimizer**:
|
42 |
|
43 |
+
## Prior work, from last week
|
44 |
|
45 |
Credits for optimizer go to [@cognitivecompai](https://github.com/cognitivecomputations/grokadamw) for laying the groundwork with the original GrokAdamW optimizer.
|
46 |
|