tinyllm
/

124M-0.2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

124M-0.2 / README.md

tinyllm's picture

Update README.md

6be5857 verified 11 days ago

|

history blame contribute delete

2.14 kB

	---
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb
	language:
	- en
	library_name: transformers
	tags:
	- IoT
	- sensor
	- embedded
	---

	# TinyLLM

	## Overview

	This repository hosts a small language model developed as part of the TinyLLM framework ([arxiv link]). These models are specifically designed and fine-tuned with sensor data to support embedded sensing applications. They enable locally hosted language models on low-computing-power devices, such as single-board computers. The models, based on the GPT-2 architecture, are trained using Nvidia's H100 GPUs. This repo provides base models that can be further fine-tuned for specific downstream tasks related to embedded sensing.
	## Model Information

	- Parameters: 124M (Hidden Size = 768)
	- Architecture: Decoder-only transformer
	- Training Data: Up to 10B tokens from the [SHL](http://www.shl-dataset.org/) and [Fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) datasets, combined in a 2:8 ratio
	- Input and Output Modality: Text
	- Context Length: 1024

	## Acknowledgements

	We want to acknowledge the open-source frameworks [llm.c](https://github.com/karpathy/llm.c) and [llama.cpp](https://github.com/ggerganov/llama.cpp) and the sensor dataset provided by SHL, which were instrumental in training and testing these models.

	## Usage

	The model can be used in two primary ways:
	1. With Hugging Face’s Transformers Library
	```python
	from transformers import pipeline
	import torch

	path = "tinyllm/124M-0.2"
	prompt = "The sea is blue but it's his red sea"

	generator = pipeline("text-generation", model=path,max_new_tokens = 30, repetition_penalty=1.3, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
	print(generator(prompt)[0]['generated_text'])
	```

	2. With llama.cpp
	Generate a GGUF model file using this [tool](https://github.com/ggerganov/llama.cpp/blob/master/convert_hf_to_gguf.py) and use the generated GGUF file for inferencing.
	```python
	python3 convert_hf_to_gguf.py models/mymodel/
	```

	## Disclaimer

	This model is intended solely for research purposes.