Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@
|
|
5 |
<img src="elm-turbo-starfruit.png" width="256"/>
|
6 |
</div>
|
7 |
|
8 |
-
ELM is designed to be a modular and customizable family of neural networks that are highly efficient and performant. Today we are sharing the second version in this series: **ELM
|
9 |
|
10 |
_Model:_ ELM Turbo introduces a more _adaptable_, _decomposable LLM architecture_ thereby yielding flexibility in (de)-composing LLM models into smaller stand-alone slices. In comparison to our previous version, the new architecture allows for more powerful model slices to be learnt during the training process (yielding better quality & higher generative capacity) and a higher level of control wrt LLM efficiency - fine-grained slices to produce varying LLM model sizes (depending on the user/task needs and deployment criteria, i.e., Cloud or Edge device constraints).
|
11 |
|
@@ -21,7 +21,7 @@ _Fast Inference with Customization:_ As with our previous version, once trained,
|
|
21 |
- **HuggingFace** (access ELM Turbo Models in HF): 👉 [here](https://huggingface.co/collections/slicexai/elm-turbo-66945032f3626024aa066fde)
|
22 |
|
23 |
## ELM Turbo Model Release
|
24 |
-
In this version, we employed our new, improved decomposable ELM techniques on a widely used open-source LLM, `microsoft/Phi-3-mini-128k-instruct` (3.82B params) (check [phi3-license]
|
25 |
|
26 |
- [Section 1.](https://github.com/slicex-ai/elm-turbo/blob/main/README.md#1-run-elm-turbo-models-with-huggingface-transformers-library) 👉 instructions to run ELM-Turbo with the Huggingface Transformers library :hugs:.
|
27 |
- [Section 2.](https://github.com/slicex-ai/elm-turbo/blob/main/README.md#2-running-elm-turbo-via-nvidias-tensorrt-llm) 👉 instructions to run ELM-Turbo engines powered by NVIDIA's TensoRT-LLM.
|
@@ -44,7 +44,7 @@ Example - To run the `slicexai/elm-turbo-0.125-instruct`
|
|
44 |
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
|
45 |
import torch
|
46 |
|
47 |
-
elm_turbo_model = "slicexai/elm-turbo-0.
|
48 |
model = AutoModelForCausalLM.from_pretrained(
|
49 |
elm_turbo_model,
|
50 |
device_map="cuda",
|
|
|
5 |
<img src="elm-turbo-starfruit.png" width="256"/>
|
6 |
</div>
|
7 |
|
8 |
+
ELM is designed to be a modular and customizable family of neural networks that are highly efficient and performant. Today we are sharing the second version in this series: **ELM Turbo** models (named _Starfruit_).
|
9 |
|
10 |
_Model:_ ELM Turbo introduces a more _adaptable_, _decomposable LLM architecture_ thereby yielding flexibility in (de)-composing LLM models into smaller stand-alone slices. In comparison to our previous version, the new architecture allows for more powerful model slices to be learnt during the training process (yielding better quality & higher generative capacity) and a higher level of control wrt LLM efficiency - fine-grained slices to produce varying LLM model sizes (depending on the user/task needs and deployment criteria, i.e., Cloud or Edge device constraints).
|
11 |
|
|
|
21 |
- **HuggingFace** (access ELM Turbo Models in HF): 👉 [here](https://huggingface.co/collections/slicexai/elm-turbo-66945032f3626024aa066fde)
|
22 |
|
23 |
## ELM Turbo Model Release
|
24 |
+
In this version, we employed our new, improved decomposable ELM techniques on a widely used open-source LLM, `microsoft/Phi-3-mini-128k-instruct` (3.82B params) (check [phi3-license](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE) for usage). After training, we generated three smaller slices with parameter counts ranging from 1.33 billion to 2.01 billion. Furthermore, we seamlessly integrated these slices into NVIDIA's [TensoRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), providing trtllm engines compatible with A100 and H100 GPUs, respectively.
|
25 |
|
26 |
- [Section 1.](https://github.com/slicex-ai/elm-turbo/blob/main/README.md#1-run-elm-turbo-models-with-huggingface-transformers-library) 👉 instructions to run ELM-Turbo with the Huggingface Transformers library :hugs:.
|
27 |
- [Section 2.](https://github.com/slicex-ai/elm-turbo/blob/main/README.md#2-running-elm-turbo-via-nvidias-tensorrt-llm) 👉 instructions to run ELM-Turbo engines powered by NVIDIA's TensoRT-LLM.
|
|
|
44 |
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
|
45 |
import torch
|
46 |
|
47 |
+
elm_turbo_model = "slicexai/elm-turbo-0.125-instruct"
|
48 |
model = AutoModelForCausalLM.from_pretrained(
|
49 |
elm_turbo_model,
|
50 |
device_map="cuda",
|