khronoz's picture
Autofill Prompt (#16)
7d9d30d unverified
# Smart Retrieval Backend
The backend is built using Python & [FastAPI](https://fastapi.tiangolo.com/) bootstrapped with [`create-llama`](https://github.com/run-llama/LlamaIndexTS/tree/main/packages/create-llama).
To get started, you must first install the required dependencies in `Requirements` section below, then follow the `Getting Started` section.
## Requirements
1. Python >= 3.11
2. Miniconda (To manage Python versions)
- [Windows](https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe)
- [Linux](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh)
- [MacOS](https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.pkg)
- ```conda create -n SmartRetrieval python=3.11```
3. Pipx (To manage Python packages)
- ```pip install pipx``` (If you already have pipx installed, you can skip this step)
4. Cuda > 12.1 (if you have a Nvidia GPU)
- [Windows](https://developer.nvidia.com/cuda-downloads)
- [Linux](https://developer.nvidia.com/cuda-downloads)
- [MacOS](https://developer.nvidia.com/cuda-downloads)
5. Poetry (To manage dependencies)
- ```pipx install poetry```
## Getting Started
First, ensure if you want to use the cuda version of pytorch, you have the correct version `cuda > 12.1` of cuda installed. You can check this by running `nvcc --version or nvidia-smi` in your terminal, nvcc --version should correctly chow whether you have installed cuda properly or not. If you do not have cuda installed, you can install it from [here](https://developer.nvidia.com/cuda-downloads).
- You may need to add cuda to your path, which can be found online.
Ensure you have followed the steps in the `requirements` section above.
- If on windows, make sure you are running the commands in powershell.
- Add conda to your path, which can be found [here](https://stackoverflow.com/questions/64149680/how-can-i-activate-a-conda-environment-from-powershell)
Then activate the conda environment:
```bash
conda activate SmartRetrieval
```
Second, setup the environment:
```powershell
# Only choose one of the options below depending on if you have CUDA enabled GPU or not:
# If running on windows, make sure you are running the commands in powershell.
-----------------------------------------------
# Install dependencies and torch (cpu version)
# Go to the backend directory and edit the pyproject.toml file to uncomment the `torch-cpu` poetry section
-----------------------------------------------
# Windows: Set env for llama-cpp-python with openblas support on cpu
$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
# Linux: Set env for llama-cpp-python with openblas support on cpu
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
# Then:
poetry install --with torch-cpu
-----------------------------------------------
# Install dependencies and torch (cuda version)
# Installing torch with cuda support on a system without cuda support is also possible.
-----------------------------------------------
# Windows: Set env for llama-cpp-python with cuda support on gpu
$env:CMAKE_ARGS = "-DLLAMA_CUBLAS=on"
# Linux: Set env for llama-cpp-python with cuda support on gpu
CMAKE_ARGS="-DLLAMA_CUBLAS=on"
# Then:
poetry install --with torch-cuda
```
```bash
# Enter poetry shell
poetry shell
```
Third, run the development server:
```bash
python run.py
```
Then call the API endpoint `/api/chat` to see the result:
```bash
curl --location 'localhost:8000/api/chat' \
--header 'Content-Type: application/json' \
--data '{ "messages": [{ "role": "user", "content": "Hello" }] }'
```
You can start editing the API by modifying `app/api/routers/chat.py`. The endpoint auto-updates as you save the file.
Open [http://localhost:8000/docs](http://localhost:8000/docs) with your browser to see the Swagger UI of the API.
The API allows CORS for all origins to simplify development. You can change this behavior by setting the `ENVIRONMENT` environment variable to `prod`:
```bash
ENVIRONMENT=prod uvicorn main:app
```
## Learn More
To learn more about LlamaIndex, take a look at the following resources:
- [LlamaIndex Documentation](https://docs.llamaindex.ai) - learn about LlamaIndex.
- [LlamaIndexTS Documentation](https://ts.llamaindex.ai) - learn about LlamaIndexTS (Typescript features).
- [FastAPI Documentation](https://fastapi.tiangolo.com/) - learn about FastAPI.