Spaces:
Paused
title: LLMServer
emoji: πΉ
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
LLM Server
This repository contains a FastAPI-based server that serves open-source Large Language Models from Hugging Face.
Getting Started
These instructions will help you set up and run the project on your local machine.
Prerequisites
- Python 3.10 or higher
- Git
Cloning the Repository
Choose one of the following methods to clone the repository:
HTTPS
git clone https://huggingface.co/spaces/TeamGenKI/LLMServer
cd project-name
SSH
git clone git@hf.co:spaces/TeamGenKI/LLMServer
cd project-name
Setting Up the Virtual Environment
Windows
# Create virtual environment
python -m venv myenv
# Activate virtual environment
myenv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
Linux
# Create virtual environment
python -m venv myenv
# Activate virtual environment
source myenv/bin/activate
# Install dependencies
pip install -r requirements.txt
macOS
# Create virtual environment
python3 -m venv myenv
# Activate virtual environment
source myenv/bin/activate
# Install dependencies
pip3 install -r requirements.txt
Running the Application
Once you have set up your environment and installed the dependencies, you can start the FastAPI application:
uvicorn main.app:app --reload
The API will be available at http://localhost:8001
API Documentation
Once the application is running, you can access:
- Interactive API documentation (Swagger UI) at
http://localhost:8000/docs
- Alternative API documentation (ReDoc) at
http://localhost:8000/redoc
Deactivating the Virtual Environment
When you're done working on the project, you can deactivate the virtual environment:
deactivate
Contributing
[Add contributing guidelines here]
License
[Add license information here]
Project Structure
.
βββ Dockerfile
βββ main
β βββ api.py
β βββ app.py
β βββ config.yaml
β βββ env_template
β βββ __init__.py
β βββ logs
β β βββ llm_api.log
β βββ models
β βββ __pycache__
β β βββ api.cpython-39.pyc
β β βββ app.cpython-39.pyc
β β βββ __init__.cpython-39.pyc
β β βββ routes.cpython-39.pyc
β βββ routes.py
β βββ test_locally.py
β βββ utils
β βββ errors.py
β βββ helpers.py
β βββ __init__.py
β βββ logging.py
β βββ __pycache__
β β βββ helpers.cpython-39.pyc
β β βββ __init__.cpython-39.pyc
β β βββ logging.cpython-39.pyc
β β βββ validation.cpython-39.pyc
β βββ validation.py
βββ README.md
βββ requirements.txt
ERROR:
INFO: 127.0.0.1:60874 - "POST /api/v1/model/download?model_name=microsoft%2FPhi-3.5-mini-instruct HTTP/1.1" 200 OK
2025-01-13 16:18:45,409 - api_routes - INFO - Received request to initialize model: microsoft/Phi-3.5-mini-instruct
2025-01-13 16:18:45,409 - llm_api - INFO - Initializing generation model: microsoft/Phi-3.5-mini-instruct
2025-01-13 16:18:45,412 - llm_api - INFO - Loading model from local path: main/models/Phi-3.5-mini-instruct
The load_in_4bit
and load_in_8bit
arguments are deprecated and will be removed in the future versions. Please, pass a BitsAndBytesConfig
object in quantization_config
argument instead.
Could not find the bitsandbytes CUDA binary at PosixPath('/home/aurelio/Desktop/Projects/LLMServer/myenv/lib/python3.13/site-packages/bitsandbytes/libbitsandbytes_cuda124.so')
g++ (GCC) 14.2.1 20240910
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2025-01-13 16:18:45,982 - llm_api - ERROR - Failed to initialize generation model microsoft/Phi-3.5-mini-instruct: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback): Dynamo is not supported on Python 3.13+ 2025-01-13 16:18:45,982 - api_routes - ERROR - Error initializing model: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback): Dynamo is not supported on Python 3.13+ INFO: 127.0.0.1:38330 - "POST /api/v1/model/initialize?model_name=microsoft%2FPhi-3.5-mini-instruct HTTP/1.1" 500 Internal Server Error