---
title: LLMServer
emoji: 👹
colorFrom: indigo
colorTo: purple
sdk: docker
pinned: false
---

# LLM Server

This repository contains a FastAPI-based server that serves open-source Large Language Models from Hugging Face.

## Getting Started

These instructions will help you set up and run the project on your local machine.

### Prerequisites

- Python 3.10 or higher
- Git

### Cloning the Repository

Choose one of the following methods to clone the repository:

#### HTTPS
```bash
git clone https://huggingface.co/spaces/TeamGenKI/LLMServer
cd project-name
```

#### SSH
```bash
git clone git@hf.co:spaces/TeamGenKI/LLMServer
cd project-name
```

### Setting Up the Virtual Environment

#### Windows
```bash
# Create virtual environment
python -m venv myenv

# Activate virtual environment
myenv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
```

#### Linux
```bash
# Create virtual environment
python -m venv myenv

# Activate virtual environment
source myenv/bin/activate

# Install dependencies
pip install -r requirements.txt
```

#### macOS
```bash
# Create virtual environment
python3 -m venv myenv

# Activate virtual environment
source myenv/bin/activate

# Install dependencies
pip3 install -r requirements.txt
```

### Running the Application

Once you have set up your environment and installed the dependencies, you can start the FastAPI application:

```bash
uvicorn main.app:app --reload
```

The API will be available at `http://localhost:8001`

### API Documentation

Once the application is running, you can access:
- Interactive API documentation (Swagger UI) at `http://localhost:8000/docs`
- Alternative API documentation (ReDoc) at `http://localhost:8000/redoc`

### Deactivating the Virtual Environment

When you're done working on the project, you can deactivate the virtual environment:

```bash
deactivate
```

## Contributing

[Add contributing guidelines here]

## License

[Add license information here]

## Project Structure

```
.
├── Dockerfile
├── main
│   ├── api.py
│   ├── app.py
│   ├── config.yaml
│   ├── env_template
│   ├── __init__.py
│   ├── logs
│   │   └── llm_api.log
│   ├── models
│   ├── __pycache__
│   │   ├── api.cpython-39.pyc
│   │   ├── app.cpython-39.pyc
│   │   ├── __init__.cpython-39.pyc
│   │   └── routes.cpython-39.pyc
│   ├── routes.py
│   ├── test_locally.py
│   └── utils
│       ├── errors.py
│       ├── helpers.py
│       ├── __init__.py
│       ├── logging.py
│       ├── __pycache__
│       │   ├── helpers.cpython-39.pyc
│       │   ├── __init__.cpython-39.pyc
│       │   ├── logging.cpython-39.pyc
│       │   └── validation.cpython-39.pyc
│       └── validation.py
├── README.md
└── requirements.txt
```

ERROR:

INFO:     127.0.0.1:60874 - "POST /api/v1/model/download?model_name=microsoft%2FPhi-3.5-mini-instruct HTTP/1.1" 200 OK
2025-01-13 16:18:45,409 - api_routes - INFO - Received request to initialize model: microsoft/Phi-3.5-mini-instruct
2025-01-13 16:18:45,409 - llm_api - INFO - Initializing generation model: microsoft/Phi-3.5-mini-instruct
2025-01-13 16:18:45,412 - llm_api - INFO - Loading model from local path: main/models/Phi-3.5-mini-instruct
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Could not find the bitsandbytes CUDA binary at PosixPath('/home/aurelio/Desktop/Projects/LLMServer/myenv/lib/python3.13/site-packages/bitsandbytes/libbitsandbytes_cuda124.so')
g++ (GCC) 14.2.1 20240910
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

2025-01-13 16:18:45,982 - llm_api - ERROR - Failed to initialize generation model microsoft/Phi-3.5-mini-instruct: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback):
Dynamo is not supported on Python 3.13+
2025-01-13 16:18:45,982 - api_routes - ERROR - Error initializing model: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback):
Dynamo is not supported on Python 3.13+
INFO:     127.0.0.1:38330 - "POST /api/v1/model/initialize?model_name=microsoft%2FPhi-3.5-mini-instruct HTTP/1.1" 500 Internal Server Error