Organization Card

EmbeddedLLM

About EmbeddedLLM

EmbeddedLLM is an open-source company dedicated to advancing the field of Large Language Models (LLMs) through innovative backend solutions and hardware optimizations. Our mission is to make powerful generative models work on all platforms, from edge to private cloud, ensuring accessibility and efficiency for a wide range of applications.

Highlighted Repositories

EmbeddedLLM/JamAIBase

Description: JamAI Base is an open-source RAG (Retrieval-Augmented Generation) backend platform that integrates an embedded database (SQLite) and an embedded vector database (LanceDB) with managed memory and RAG capabilities. It features built-in LLM, vector embeddings, and reranker orchestration and management, all accessible through a convenient, intuitive, spreadsheet-like UI and a simple REST API.
Key Features:
- Embedded database (SQLite) and vector database (LanceDB)
- Managed memory and RAG capabilities
- Built-in LLM, vector embeddings, and reranker orchestration
- Intuitive spreadsheet-like UI
- Simple REST API

EmbeddedLLM/vllm-rocm

Description: This repository is a port of vLLM for AMD GPUs, providing a high-throughput and memory-efficient inference and serving engine for LLMs optimized for ROCm.
Key Features:
- Vision Language Models support
- New features not yet available in the upstream
- Optimized for AMD GPUs with ROCm support

EmbeddedLLM/embeddedllm

Description: It is a AIPC embedded LLM Engine unifying and provide stable way to run LLM fast on CPU, iGPU, GPU. It supports launching OpenAI-API-Compatible API server powered by our engine.
Key Features:
- Supported hardwares: CPU (ONNX), AMD iGPU (ONNX-DirectML), Intel iGPU (IPEX-LLM, OpenVINO), Intel XPU (IPEX-LLM, OpenVINO), Nvidia GPU (ONNX-CUDA).
- Provide prebuilt, ready-to-run Windows 11 executable.
- Vision Language Models support (CPU)

Join Us

We invite you to explore our repositories and models, contribute to our projects, and join us in pushing the boundaries of what's possible with LLMs.

Collections 7

spaces 1

Runtime error

12

Chat Template Generation

🔥

models 91

datasets

None public yet

EmbeddedLLM

AI & ML interests

EmbeddedLLM

Collections 7

EmbeddedLLM/mistral-7b-instruct-v0.3-onnx

EmbeddedLLM/Starling-LM-7b-beta-onnx

EmbeddedLLM/gemma-2b-it-onnx

EmbeddedLLM/gemma-7b-it-onnx

EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-directml

EmbeddedLLM/Phi-3-mini-128k-instruct-onnx-directml

EmbeddedLLM/Phi-3-medium-4k-instruct-onnx-directml

EmbeddedLLM/Phi-3-medium-128k-instruct-onnx-directml

spaces 1

Chat Template Generation

models 91

EmbeddedLLM/Llama-3.1-8B-Instruct-w_fp8_per_channel_sym

EmbeddedLLM/Nexusflow_Athena-V2-Agent-OCP-FP8-Quark

EmbeddedLLM/Nexusflow_Athena-V2-Chat-OCP-FP8-Quark

EmbeddedLLM/Qwen2.5-72B-Instruct-OCP-FP8-Quark

EmbeddedLLM/ELLM_Star

EmbeddedLLM/bge-m3-int4-sym-ov

EmbeddedLLM/bge-m3-int4-ov

EmbeddedLLM/Qwen2.5-32B-Instruct-int4-sym-ov

EmbeddedLLM/Qwen2.5-14B-Instruct-int4-sym-ov

EmbeddedLLM/vLLM-AMD-flash-attn-debug

datasets

AI & ML interests

Team members 6

EmbeddedLLM

Collections 7

spaces 1

Chat Template Generation

models 91 Sort: Recently updated

datasets

models 91