slicexai
/

elm-turbo-0.125-instruct

@@ -5,7 +5,7 @@
   <img src="elm-turbo-starfruit.png" width="256"/>
 </div>
-ELM is designed to be a modular and customizable family of neural networks that are highly efficient and performant. Today we are sharing the second version in this series: **ELM-Turbo** models (named _Starfruit_).
 _Model:_ ELM Turbo introduces a more _adaptable_, _decomposable LLM architecture_ thereby yielding flexibility in (de)-composing LLM models into smaller stand-alone slices. In comparison to our previous version, the new architecture allows for more powerful model slices to be learnt during the training process (yielding better quality & higher generative capacity) and a higher level of control wrt LLM efficiency - fine-grained slices to produce varying LLM model sizes (depending on the user/task needs and deployment criteria, i.e., Cloud or Edge device constraints).
@@ -21,7 +21,7 @@ _Fast Inference with Customization:_ As with our previous version, once trained,
 - **HuggingFace** (access ELM Turbo Models in HF): 👉 [here](https://huggingface.co/collections/slicexai/elm-turbo-66945032f3626024aa066fde)
 ## ELM Turbo Model Release
-In this version, we employed our new, improved decomposable ELM techniques on a widely used open-source LLM, `microsoft/Phi-3-mini-128k-instruct` (3.82B params) (check [phi3-license] for usage)(https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE). After training, we generated three smaller slices with parameter counts ranging from 1.33 billion to 2.01 billion. Furthermore, we seamlessly integrated these slices into NVIDIA's [TensoRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), providing trtllm engines compatible with A100 and H100 GPUs, respectively.
 - [Section 1.](https://github.com/slicex-ai/elm-turbo/blob/main/README.md#1-run-elm-turbo-models-with-huggingface-transformers-library) 👉 instructions to run ELM-Turbo with the Huggingface Transformers library :hugs:.
 - [Section 2.](https://github.com/slicex-ai/elm-turbo/blob/main/README.md#2-running-elm-turbo-via-nvidias-tensorrt-llm) 👉 instructions to run ELM-Turbo engines powered by NVIDIA's TensoRT-LLM.
@@ -44,7 +44,7 @@ Example - To run the `slicexai/elm-turbo-0.125-instruct`
 from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 import torch
-elm_turbo_model = "slicexai/elm-turbo-0.50-instruct"
 model = AutoModelForCausalLM.from_pretrained(
     elm_turbo_model,
     device_map="cuda",

   <img src="elm-turbo-starfruit.png" width="256"/>
 </div>
+ELM is designed to be a modular and customizable family of neural networks that are highly efficient and performant. Today we are sharing the second version in this series: **ELM Turbo** models (named _Starfruit_).
 _Model:_ ELM Turbo introduces a more _adaptable_, _decomposable LLM architecture_ thereby yielding flexibility in (de)-composing LLM models into smaller stand-alone slices. In comparison to our previous version, the new architecture allows for more powerful model slices to be learnt during the training process (yielding better quality & higher generative capacity) and a higher level of control wrt LLM efficiency - fine-grained slices to produce varying LLM model sizes (depending on the user/task needs and deployment criteria, i.e., Cloud or Edge device constraints).
 - **HuggingFace** (access ELM Turbo Models in HF): 👉 [here](https://huggingface.co/collections/slicexai/elm-turbo-66945032f3626024aa066fde)
 ## ELM Turbo Model Release
+In this version, we employed our new, improved decomposable ELM techniques on a widely used open-source LLM, `microsoft/Phi-3-mini-128k-instruct` (3.82B params) (check [phi3-license](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE) for usage). After training, we generated three smaller slices with parameter counts ranging from 1.33 billion to 2.01 billion. Furthermore, we seamlessly integrated these slices into NVIDIA's [TensoRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), providing trtllm engines compatible with A100 and H100 GPUs, respectively.
 - [Section 1.](https://github.com/slicex-ai/elm-turbo/blob/main/README.md#1-run-elm-turbo-models-with-huggingface-transformers-library) 👉 instructions to run ELM-Turbo with the Huggingface Transformers library :hugs:.
 - [Section 2.](https://github.com/slicex-ai/elm-turbo/blob/main/README.md#2-running-elm-turbo-via-nvidias-tensorrt-llm) 👉 instructions to run ELM-Turbo engines powered by NVIDIA's TensoRT-LLM.
 from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
 import torch
+elm_turbo_model = "slicexai/elm-turbo-0.125-instruct"
 model = AutoModelForCausalLM.from_pretrained(
     elm_turbo_model,
     device_map="cuda",