Spaces:
Paused
Paused
Update README.md
Browse files
README.md
CHANGED
@@ -1,326 +1,8 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
- **Reuse Your LLM**: Once downloaded, reuse your LLM without the need for repeated downloads.
|
10 |
-
- **Chat History**: Remembers your previous conversations (in a session).
|
11 |
-
- **API**: LocalGPT has an API that you can use for building RAG Applications.
|
12 |
-
- **Graphical Interface**: LocalGPT comes with two GUIs, one uses the API and the other is standalone (based on streamlit).
|
13 |
-
- **GPU, CPU & MPS Support**: Supports multiple platforms out of the box, Chat with your data using `CUDA`, `CPU` or `MPS` and more!
|
14 |
-
|
15 |
-
## Dive Deeper with Our Videos π₯
|
16 |
-
- [Detailed code-walkthrough](https://youtu.be/MlyoObdIHyo)
|
17 |
-
- [Llama-2 with LocalGPT](https://youtu.be/lbFmceo4D5E)
|
18 |
-
- [Adding Chat History](https://youtu.be/d7otIM_MCZs)
|
19 |
-
- [LocalGPT - Updated (09/17/2023)](https://youtu.be/G_prHSKX9d4)
|
20 |
-
|
21 |
-
## Technical Details π οΈ
|
22 |
-
By selecting the right local models and the power of `LangChain` you can run the entire RAG pipeline locally, without any data leaving your environment, and with reasonable performance.
|
23 |
-
|
24 |
-
- `ingest.py` uses `LangChain` tools to parse the document and create embeddings locally using `InstructorEmbeddings`. It then stores the result in a local vector database using `Chroma` vector store.
|
25 |
-
- `run_localGPT.py` uses a local LLM to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.
|
26 |
-
- You can replace this local LLM with any other LLM from the HuggingFace. Make sure whatever LLM you select is in the HF format.
|
27 |
-
|
28 |
-
This project was inspired by the original [privateGPT](https://github.com/imartinez/privateGPT).
|
29 |
-
|
30 |
-
## Built Using π§©
|
31 |
-
- [LangChain](https://github.com/hwchase17/langchain)
|
32 |
-
- [HuggingFace LLMs](https://huggingface.co/models)
|
33 |
-
- [InstructorEmbeddings](https://instructor-embedding.github.io/)
|
34 |
-
- [LLAMACPP](https://github.com/abetlen/llama-cpp-python)
|
35 |
-
- [ChromaDB](https://www.trychroma.com/)
|
36 |
-
- [Streamlit](https://streamlit.io/)
|
37 |
-
|
38 |
-
# Environment Setup π
|
39 |
-
|
40 |
-
1. π₯ Clone the repo using git:
|
41 |
-
|
42 |
-
```shell
|
43 |
-
git clone https://github.com/PromtEngineer/localGPT.git
|
44 |
-
```
|
45 |
-
|
46 |
-
2. π Install [conda](https://www.anaconda.com/download) for virtual environment management. Create and activate a new virtual environment.
|
47 |
-
|
48 |
-
```shell
|
49 |
-
conda create -n localGPT python=3.10.0
|
50 |
-
conda activate localGPT
|
51 |
-
```
|
52 |
-
|
53 |
-
3. π οΈ Install the dependencies using pip
|
54 |
-
|
55 |
-
To set up your environment to run the code, first install all requirements:
|
56 |
-
|
57 |
-
```shell
|
58 |
-
pip install -r requirements.txt
|
59 |
-
```
|
60 |
-
|
61 |
-
***Installing LLAMA-CPP :***
|
62 |
-
|
63 |
-
LocalGPT uses [LlamaCpp-Python](https://github.com/abetlen/llama-cpp-python) for GGML (you will need llama-cpp-python <=0.1.76) and GGUF (llama-cpp-python >=0.1.83) models.
|
64 |
-
|
65 |
-
|
66 |
-
If you want to use BLAS or Metal with [llama-cpp](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal) you can set appropriate flags:
|
67 |
-
|
68 |
-
For `NVIDIA` GPUs support, use `cuBLAS`
|
69 |
-
|
70 |
-
```shell
|
71 |
-
# Example: cuBLAS
|
72 |
-
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
|
73 |
-
```
|
74 |
-
|
75 |
-
For Apple Metal (`M1/M2`) support, use
|
76 |
-
|
77 |
-
```shell
|
78 |
-
# Example: METAL
|
79 |
-
CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
|
80 |
-
```
|
81 |
-
For more details, please refer to [llama-cpp](https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal)
|
82 |
-
|
83 |
-
## Docker π³
|
84 |
-
|
85 |
-
Installing the required packages for GPU inference on NVIDIA GPUs, like gcc 11 and CUDA 11, may cause conflicts with other packages in your system.
|
86 |
-
As an alternative to Conda, you can use Docker with the provided Dockerfile.
|
87 |
-
It includes CUDA, your system just needs Docker, BuildKit, your NVIDIA GPU driver and the NVIDIA container toolkit.
|
88 |
-
Build as `docker build . -t localgpt`, requires BuildKit.
|
89 |
-
Docker BuildKit does not support GPU during *docker build* time right now, only during *docker run*.
|
90 |
-
Run as `docker run -it --mount src="$HOME/.cache",target=/root/.cache,type=bind --gpus=all localgpt`.
|
91 |
-
|
92 |
-
## Test dataset
|
93 |
-
|
94 |
-
For testing, this repository comes with [Constitution of USA](https://constitutioncenter.org/media/files/constitution.pdf) as an example file to use.
|
95 |
-
|
96 |
-
## Ingesting your OWN Data.
|
97 |
-
Put you files in the `SOURCE_DOCUMENTS` folder. You can put multiple folders within the `SOURCE_DOCUMENTS` folder and the code will recursively read your files.
|
98 |
-
|
99 |
-
### Support file formats:
|
100 |
-
LocalGPT currently supports the following file formats. LocalGPT uses `LangChain` for loading these file formats. The code in `constants.py` uses a `DOCUMENT_MAP` dictionary to map a file format to the corresponding loader. In order to add support for another file format, simply add this dictionary with the file format and the corresponding loader from [LangChain](https://python.langchain.com/docs/modules/data_connection/document_loaders/).
|
101 |
-
|
102 |
-
```shell
|
103 |
-
DOCUMENT_MAP = {
|
104 |
-
".txt": TextLoader,
|
105 |
-
".md": TextLoader,
|
106 |
-
".py": TextLoader,
|
107 |
-
".pdf": PDFMinerLoader,
|
108 |
-
".csv": CSVLoader,
|
109 |
-
".xls": UnstructuredExcelLoader,
|
110 |
-
".xlsx": UnstructuredExcelLoader,
|
111 |
-
".docx": Docx2txtLoader,
|
112 |
-
".doc": Docx2txtLoader,
|
113 |
-
}
|
114 |
-
```
|
115 |
-
|
116 |
-
### Ingest
|
117 |
-
|
118 |
-
Run the following command to ingest all the data.
|
119 |
-
|
120 |
-
If you have `cuda` setup on your system.
|
121 |
-
|
122 |
-
```shell
|
123 |
-
python ingest.py
|
124 |
-
```
|
125 |
-
You will see an output like this:
|
126 |
-
<img width="1110" alt="Screenshot 2023-09-14 at 3 36 27 PM" src="https://github.com/PromtEngineer/localGPT/assets/134474669/c9274e9a-842c-49b9-8d95-606c3d80011f">
|
127 |
-
|
128 |
-
|
129 |
-
Use the device type argument to specify a given device.
|
130 |
-
To run on `cpu`
|
131 |
-
|
132 |
-
```sh
|
133 |
-
python ingest.py --device_type cpu
|
134 |
-
```
|
135 |
-
|
136 |
-
To run on `M1/M2`
|
137 |
-
|
138 |
-
```sh
|
139 |
-
python ingest.py --device_type mps
|
140 |
-
```
|
141 |
-
|
142 |
-
Use help for a full list of supported devices.
|
143 |
-
|
144 |
-
```sh
|
145 |
-
python ingest.py --help
|
146 |
-
```
|
147 |
-
|
148 |
-
This will create a new folder called `DB` and use it for the newly created vector store. You can ingest as many documents as you want, and all will be accumulated in the local embeddings database.
|
149 |
-
If you want to start from an empty database, delete the `DB` and reingest your documents.
|
150 |
-
|
151 |
-
Note: When you run this for the first time, it will need internet access to download the embedding model (default: `Instructor Embedding`). In the subsequent runs, no data will leave your local environment and you can ingest data without internet connection.
|
152 |
-
|
153 |
-
## Ask questions to your documents, locally!
|
154 |
-
|
155 |
-
In order to chat with your documents, run the following command (by default, it will run on `cuda`).
|
156 |
-
|
157 |
-
```shell
|
158 |
-
python run_localGPT.py
|
159 |
-
```
|
160 |
-
You can also specify the device type just like `ingest.py`
|
161 |
-
|
162 |
-
```shell
|
163 |
-
python run_localGPT.py --device_type mps # to run on Apple silicon
|
164 |
-
```
|
165 |
-
|
166 |
-
This will load the ingested vector store and embedding model. You will be presented with a prompt:
|
167 |
-
|
168 |
-
```shell
|
169 |
-
> Enter a query:
|
170 |
-
```
|
171 |
-
|
172 |
-
After typing your question, hit enter. LocalGPT will take some time based on your hardware. You will get a response like this below.
|
173 |
-
<img width="1312" alt="Screenshot 2023-09-14 at 3 33 19 PM" src="https://github.com/PromtEngineer/localGPT/assets/134474669/a7268de9-ade0-420b-a00b-ed12207dbe41">
|
174 |
-
|
175 |
-
Once the answer is generated, you can then ask another question without re-running the script, just wait for the prompt again.
|
176 |
-
|
177 |
-
|
178 |
-
***Note:*** When you run this for the first time, it will need internet connection to download the LLM (default: `TheBloke/Llama-2-7b-Chat-GGUF`). After that you can turn off your internet connection, and the script inference would still work. No data gets out of your local environment.
|
179 |
-
|
180 |
-
Type `exit` to finish the script.
|
181 |
-
|
182 |
-
### Extra Options with run_localGPT.py
|
183 |
-
|
184 |
-
You can use the `--show_sources` flag with `run_localGPT.py` to show which chunks were retrieved by the embedding model. By default, it will show 4 different sources/chunks. You can change the number of sources/chunks
|
185 |
-
|
186 |
-
```shell
|
187 |
-
python run_localGPT.py --show_sources
|
188 |
-
```
|
189 |
-
|
190 |
-
Another option is to enable chat history. ***Note***: This is disabled by default and can be enabled by using the `--use_history` flag. The context window is limited so keep in mind enabling history will use it and might overflow.
|
191 |
-
|
192 |
-
```shell
|
193 |
-
python run_localGPT.py --use_history
|
194 |
-
```
|
195 |
-
|
196 |
-
|
197 |
-
# Run the Graphical User Interface
|
198 |
-
|
199 |
-
1. Open `constants.py` in an editor of your choice and depending on choice add the LLM you want to use. By default, the following model will be used:
|
200 |
-
|
201 |
-
```shell
|
202 |
-
MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
|
203 |
-
MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"
|
204 |
-
```
|
205 |
-
|
206 |
-
3. Open up a terminal and activate your python environment that contains the dependencies installed from requirements.txt.
|
207 |
-
|
208 |
-
4. Navigate to the `/LOCALGPT` directory.
|
209 |
-
|
210 |
-
5. Run the following command `python run_localGPT_API.py`. The API should being to run.
|
211 |
-
|
212 |
-
6. Wait until everything has loaded in. You should see something like `INFO:werkzeug:Press CTRL+C to quit`.
|
213 |
-
|
214 |
-
7. Open up a second terminal and activate the same python environment.
|
215 |
-
|
216 |
-
8. Navigate to the `/LOCALGPT/localGPTUI` directory.
|
217 |
-
|
218 |
-
9. Run the command `python localGPTUI.py`.
|
219 |
-
|
220 |
-
10. Open up a web browser and go the address `http://localhost:5111/`.
|
221 |
-
|
222 |
-
|
223 |
-
# How to select different LLM models?
|
224 |
-
|
225 |
-
To change the models you will need to set both `MODEL_ID` and `MODEL_BASENAME`.
|
226 |
-
|
227 |
-
1. Open up `constants.py` in the editor of your choice.
|
228 |
-
2. Change the `MODEL_ID` and `MODEL_BASENAME`. If you are using a quantized model (`GGML`, `GPTQ`, `GGUF`), you will need to provide `MODEL_BASENAME`. For unquantized models, set `MODEL_BASENAME` to `NONE`
|
229 |
-
5. There are a number of example models from HuggingFace that have already been tested to be run with the original trained model (ending with HF or have a .bin in its "Files and versions"), and quantized models (ending with GPTQ or have a .no-act-order or .safetensors in its "Files and versions").
|
230 |
-
6. For models that end with HF or have a .bin inside its "Files and versions" on its HuggingFace page.
|
231 |
-
|
232 |
-
- Make sure you have a `MODEL_ID` selected. For example -> `MODEL_ID = "TheBloke/guanaco-7B-HF"`
|
233 |
-
- Go to the [HuggingFace Repo](https://huggingface.co/TheBloke/guanaco-7B-HF)
|
234 |
-
|
235 |
-
7. For models that contain GPTQ in its name and or have a .no-act-order or .safetensors extension inside its "Files and versions on its HuggingFace page.
|
236 |
-
|
237 |
-
- Make sure you have a `MODEL_ID` selected. For example -> model_id = `"TheBloke/wizardLM-7B-GPTQ"`
|
238 |
-
- Got to the corresponding [HuggingFace Repo](https://huggingface.co/TheBloke/wizardLM-7B-GPTQ) and select "Files and versions".
|
239 |
-
- Pick one of the model names and set it as `MODEL_BASENAME`. For example -> `MODEL_BASENAME = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"`
|
240 |
-
|
241 |
-
8. Follow the same steps for `GGUF` and `GGML` models.
|
242 |
-
|
243 |
-
# GPU and VRAM Requirements
|
244 |
-
|
245 |
-
Below is the VRAM requirement for different models depending on their size (Billions of parameters). The estimates in the table does not include VRAM used by the Embedding models - which use an additional 2GB-7GB of VRAM depending on the model.
|
246 |
-
|
247 |
-
| Mode Size (B) | float32 | float16 | GPTQ 8bit | GPTQ 4bit |
|
248 |
-
| ------- | --------- | --------- | -------------- | ------------------ |
|
249 |
-
| 7B | 28 GB | 14 GB | 7 GB - 9 GB | 3.5 GB - 5 GB |
|
250 |
-
| 13B | 52 GB | 26 GB | 13 GB - 15 GB | 6.5 GB - 8 GB |
|
251 |
-
| 32B | 130 GB | 65 GB | 32.5 GB - 35 GB| 16.25 GB - 19 GB |
|
252 |
-
| 65B | 260.8 GB | 130.4 GB | 65.2 GB - 67 GB| 32.6 GB - 35 GB |
|
253 |
-
|
254 |
-
|
255 |
-
# System Requirements
|
256 |
-
|
257 |
-
## Python Version
|
258 |
-
|
259 |
-
To use this software, you must have Python 3.10 or later installed. Earlier versions of Python will not compile.
|
260 |
-
|
261 |
-
## C++ Compiler
|
262 |
-
|
263 |
-
If you encounter an error while building a wheel during the `pip install` process, you may need to install a C++ compiler on your computer.
|
264 |
-
|
265 |
-
### For Windows 10/11
|
266 |
-
|
267 |
-
To install a C++ compiler on Windows 10/11, follow these steps:
|
268 |
-
|
269 |
-
1. Install Visual Studio 2022.
|
270 |
-
2. Make sure the following components are selected:
|
271 |
-
- Universal Windows Platform development
|
272 |
-
- C++ CMake tools for Windows
|
273 |
-
3. Download the MinGW installer from the [MinGW website](https://sourceforge.net/projects/mingw/).
|
274 |
-
4. Run the installer and select the "gcc" component.
|
275 |
-
|
276 |
-
### NVIDIA Driver's Issues:
|
277 |
-
|
278 |
-
Follow this [page](https://linuxconfig.org/how-to-install-the-nvidia-drivers-on-ubuntu-22-04) to install NVIDIA Drivers.
|
279 |
-
|
280 |
-
## Star History
|
281 |
-
|
282 |
-
[![Star History Chart](https://api.star-history.com/svg?repos=PromtEngineer/localGPT&type=Date)](https://star-history.com/#PromtEngineer/localGPT&Date)
|
283 |
-
|
284 |
-
# Disclaimer
|
285 |
-
|
286 |
-
This is a test project to validate the feasibility of a fully local solution for question answering using LLMs and Vector embeddings. It is not production ready, and it is not meant to be used in production. Vicuna-7B is based on the Llama model so that has the original Llama license.
|
287 |
-
|
288 |
-
# Common Errors
|
289 |
-
|
290 |
-
- [Torch not compatible with CUDA enabled](https://github.com/pytorch/pytorch/issues/30664)
|
291 |
-
|
292 |
-
- Get CUDA version
|
293 |
-
```shell
|
294 |
-
nvcc --version
|
295 |
-
```
|
296 |
-
```shell
|
297 |
-
nvidia-smi
|
298 |
-
```
|
299 |
-
- Try installing PyTorch depending on your CUDA version
|
300 |
-
```shell
|
301 |
-
conda install -c pytorch torchvision cudatoolkit=10.1 pytorch
|
302 |
-
```
|
303 |
-
- If it doesn't work, try reinstalling
|
304 |
-
```shell
|
305 |
-
pip uninstall torch
|
306 |
-
pip cache purge
|
307 |
-
pip install torch -f https://download.pytorch.org/whl/torch_stable.html
|
308 |
-
```
|
309 |
-
|
310 |
-
- [ERROR: pip's dependency resolver does not currently take into account all the packages that are installed](https://stackoverflow.com/questions/72672196/error-pips-dependency-resolver-does-not-currently-take-into-account-all-the-pa/76604141#76604141)
|
311 |
-
```shell
|
312 |
-
pip install h5py
|
313 |
-
pip install typing-extensions
|
314 |
-
pip install wheel
|
315 |
-
```
|
316 |
-
- [Failed to import transformers](https://github.com/huggingface/transformers/issues/11262)
|
317 |
-
- Try re-install
|
318 |
-
```shell
|
319 |
-
conda uninstall tokenizers, transformers
|
320 |
-
pip install transformers
|
321 |
-
```
|
322 |
-
- [ERROR: "If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation..." ](https://pytorch.org/docs/stable/notes/cuda.html#memory-management)
|
323 |
-
```shell
|
324 |
-
export PYTORCH_NO_CUDA_MEMORY_CACHING=1
|
325 |
-
```
|
326 |
-
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
title: katara
|
4 |
+
sdk: gradio
|
5 |
+
emoji: π
|
6 |
+
colorFrom: yellow
|
7 |
+
colorTo: yellow
|
8 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|