Spaces:
Running
title: README
emoji: 📈
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
About Deeplite
Deeplite provides an AI inference optimization platform for the millions of developers struggling to deploy edge-AI to products in smart city, 5G IoT, autonomous vehicles, and the things we use every day by making smaller, faster and more energy efficient AI. We call it AI for everyday life.
PRODUCT
Deeplite’s Neutrino platform builds on the PyTorch ecosystem by natively supporting pre-trained multi-layer perceptron (MLP) and Convolutional neural network (CNN) models with our optimization and quantization methods. Models are available in the Neutrino Zoo or can be a user’s custom model. Our model optimization receives a model, dataset and constraint from the user. The user can specify a target accuracy and the optimization will try to shrink and/or speed up the model inference as much as possible, within that target accuracy. This is unique in that it keeps the model in full precision (no quantization), so the optimized model can be deployed to any target hardware, including AI accelerators/NPUs. While hardware agnostic at full precision, users can also leverage the Hardware-Aware optimization within Neutrino to optimize a model for a specific target hardware, tuning it for performance on that specific HW as well as replacing any unsupported operations with supported ops. Model quantization is a training-aware approach (QAT) that can convert a full precision PyTorch model into a mixed precision model targeting 1, 2, 3, 4, 8 and 32bit layers (weights and activations). The quantization method maximizes accuracy retention while achieving 2-15x memory optimization and 2-5x inference speedup when combined with Deeplite compiler and runtime for on-device inference. Neutrino is deployed on premise, so no sensitive data or models need to be shared outside their own development environment. We have also created a run time engine called DeepliteRT which can run sub 8-bit quantized models natively on Arm CPUs.
SOLUTIONS
We are excited to release YOLOBench, a latency-accuracy benchmark of over 550 YOLO-based object detectors on four different datasets (COCO, PASCAL VOC, WIDERFACE, and SKU-110K) as wll as five initial embedded hardware platforms and growing.