metadata
license: mit
tags:
- text-generation-inference
- Transformer
- large-language-model
- generative AI
- on-device-computing
- edge-computing
QuicktypeGPT is an on-device C-written large language model (LLM) to assist you typing quicker and carrying out meaningful conversations.
This model only has 15M parameters (dim = 288, 6 layers, 6 heads and 6 kv heads) and 27MB. The model is pre-trained on a single A40 GPU and can be inferenced through a pure C program on a laptop CPU (e.g. AMD, Intel) with decent quality and speed. This project is to demonstrate that:
- We do not need to train a very sophisticated LLM but can still achieve santisfactory performance if the LLM is only focused on a small and dedicated domain or task.
- We can deploy small LLMs on edge devices (e.g. desktop, laptop, tablet or phone) to perform inference tasks without relying on the servers in the cloud.
For more details, please refer to quicktypeGPT github project.