zwgao commited on
Commit
daf67f0
โ€ข
1 Parent(s): 650ac8d

update README.md

Browse files
Files changed (2) hide show
  1. README.md +205 -1
  2. app.py +1 -1
README.md CHANGED
@@ -10,4 +10,208 @@ pinned: false
10
  license: apache-2.0
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  license: apache-2.0
11
  ---
12
 
13
+ <img src="https://github.com/liu-zhy/graph-of-thought/assets/26198430/4e8b2511-ce69-4c1a-95a1-5aed4d432a82" width=10% align="left" />
14
+
15
+ # ControlLLM
16
+
17
+ ControlLLM: Augmenting Large Language Models with Tools by Searching on Graphs [[Paper](https://arxiv.org/abs/2310.17796)]
18
+
19
+ We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a $\textit{task decomposer}$ that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a $\textit{Thoughts-on-Graph (ToG) paradigm}$ that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an $\textit{execution engine with a rich toolbox}$ that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods.
20
+
21
+
22
+ ## ๐Ÿค– Video Demo
23
+
24
+ <!-- <table>
25
+ <tr>
26
+ <td><img width="450" src="https://github.com/liu-zhy/graph-of-thought/assets/26198430/7fe7d1ec-e37e-4ea8-8201-dc639c82ba66" alt="Image 1"></td>
27
+ <td><img width="450" src="https://github.com/liu-zhy/graph-of-thought/assets/26198430/a8bc6644-368b-42e3-844a-9962fdc9bd01" alt="Image 2"></td>
28
+ </tr>
29
+ </table>
30
+ -->
31
+
32
+ https://github.com/OpenGVLab/ControlLLM/assets/13723743/cf72861e-0e7b-4c15-89ee-7fa1d838d00f
33
+
34
+ ## ๐Ÿ  System Overview
35
+
36
+ ![arch](https://github.com/liu-zhy/graph-of-thought/assets/95175307/ad3db5c1-f1c7-4e1f-be48-81ed5228f2b0#center)
37
+
38
+ ## ๐ŸŽ Major Features
39
+ - Image Perception
40
+ - Image Editing
41
+ - Image Generation
42
+ - Video Perception
43
+ - Video Editing
44
+ - Video Generation
45
+ - Audio Perception
46
+ - Audio Generation
47
+ - Multi-Solution
48
+ - Pointing Inputs
49
+ - Resource Type Awareness
50
+
51
+ ## ๐Ÿ—“๏ธ Schedule
52
+
53
+ - [ ] Launch online demo
54
+
55
+ ## ๐Ÿ› ๏ธInstallation
56
+
57
+ ### Basic requirements
58
+
59
+ * Linux
60
+ * Python 3.10+
61
+ * PyTorch 2.0+
62
+ * CUDA 11.8+
63
+
64
+ ### Clone project
65
+
66
+ Execute the following command in the root directory:
67
+
68
+ ```bash
69
+ git clone https://github.com/OpenGVLab/ControlLLM.git
70
+ ```
71
+
72
+ ### Install dependencies
73
+
74
+ Setup environment:
75
+
76
+ ```bash
77
+
78
+ conda create -n cllm python=3.10
79
+
80
+ conda activate cllm
81
+
82
+ conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
83
+ ```
84
+
85
+ Install [LLaVA](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file):
86
+
87
+ ```bash
88
+ pip install git+https://github.com/haotian-liu/LLaVA.git
89
+ ```
90
+
91
+ Then install other dependencies:
92
+
93
+ ```bash
94
+ cd controlllm
95
+
96
+ pip install -r requirements.txt
97
+ ```
98
+
99
+ ## ๐Ÿ‘จโ€๐Ÿซ Get Started
100
+
101
+ ### Launch tool services
102
+
103
+ Please put your personal OpenAI Key and [Weather Key](https://www.visualcrossing.com/weather-api) into the corresponding environment variables.
104
+ ```bash
105
+
106
+ cd ./controlllm
107
+ # openai key
108
+ export OPENAI_API_KEY="..."
109
+ # openai base
110
+ export OPENAI_BASE_URL="..."
111
+ # weather api key
112
+ export WEATHER_API_KEY="..."
113
+
114
+ python -m cllm.services.launch --port 10011 --host 0.0.0.0
115
+ ```
116
+
117
+ ### Launch ToG service
118
+
119
+ ```bash
120
+ cd ./controlllm
121
+
122
+ export TOG_SERVICES_PORT=10011
123
+ export OPENAI_BASE_URL="..."
124
+ export OPENAI_API_KEY="..."
125
+ python -m cllm.services.tog.launch --port 10012 --host 0.0.0.0
126
+
127
+ ```
128
+
129
+ ### Launch gradio demo
130
+
131
+ Use `openssl` to generate the certificate:
132
+ ```shell
133
+ mkdir certificate
134
+
135
+ openssl req -x509 -newkey rsa:4096 -keyout certificate/key.pem -out certificate/cert.pem -sha256 -days 365 -nodes
136
+ ```
137
+ Launch gradio demo:
138
+ ```bash
139
+ cd ./controlllm
140
+
141
+ export TOG_PORT=10012
142
+ export TOG_SERVICES_PORT=10011
143
+ export RESOURCE_ROOT="./client_resources"
144
+ export GRADIO_TEMP_DIR="$HOME/.tmp"
145
+ export OPENAI_BASE_URL="..."
146
+ export OPENAI_API_KEY="..."
147
+
148
+ python -m cllm.app.gradio --controller "cllm.agents.tog.Controller" --server_port 10024
149
+
150
+ ```
151
+
152
+ ### Tools as Services
153
+
154
+ Take image generation as an example, we first launch the service.
155
+
156
+ ```bash
157
+
158
+ python -m cllm.services.image_generation.launch --port 10011 --host 0.0.0.0
159
+
160
+ ```
161
+
162
+ Then, we call the services via python api.
163
+
164
+ ```python
165
+ from cllm.services.image_generation.api import *
166
+ setup(port=10011)
167
+ text2image('A horse')
168
+ ```
169
+
170
+ ๐Ÿ˜ฌ Launch all in one endpoint
171
+
172
+ ```bash
173
+ python -m cllm.services.launch --port 10011 --host 0.0.0.0
174
+ ```
175
+
176
+ ## ๐Ÿ› ๏ธ Support Tools
177
+
178
+ See [Tools](TOOL.md)
179
+
180
+ ## ๐ŸŽซ License
181
+
182
+ This project is released under the [Apache 2.0 license](LICENSE).
183
+
184
+ ## ๐Ÿ–Š๏ธ Citation
185
+
186
+ If you find this project useful in your research, please cite our paper:
187
+
188
+ ```BibTeX
189
+ @article{2023controlllm,
190
+ title={ControlLLM: Augment Language Models with Tools by Searching on Graphs},
191
+ author={Liu, Zhaoyang and Lai, Zeqiang and Gao Zhangwei and Cui, Erfei and Li, Zhiheng and Zhu, Xizhou and Lu, Lewei and Chen, Qifeng and Qiao, Yu and Dai, Jifeng and Wang Wenhai},
192
+ journal={arXiv preprint arXiv:2305.10601},
193
+ year={2023}
194
+ }
195
+ ```
196
+
197
+ ## ๐Ÿค Acknowledgement
198
+ - Thanks to the open source of the following projects:
199
+ [Hugging Face](https://github.com/huggingface) &#8194;
200
+ [LangChain](https://github.com/hwchase17/langchain) &#8194;
201
+ [SAM](https://github.com/facebookresearch/segment-anything) &#8194;
202
+ [Stable Diffusion](https://github.com/CompVis/stable-diffusion) &#8194;
203
+ [ControlNet](https://github.com/lllyasviel/ControlNet) &#8194;
204
+ [InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix) &#8194;
205
+ [EasyOCR](https://github.com/JaidedAI/EasyOCR)&#8194;
206
+ [ImageBind](https://github.com/facebookresearch/ImageBind) &#8194;
207
+ [PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha) &#8194;
208
+ [LLaVA](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file) &#8194;
209
+ [Modelscope](https://modelscope.cn/my/overview) &#8194;
210
+ [AudioCraft](https://github.com/facebookresearch/audiocraft) &#8194;
211
+ [Whisper](https://github.com/openai/whisper) &#8194;
212
+ [Llama 2](https://github.com/facebookresearch/llama) &#8194;
213
+ [LLaMA](https://github.com/facebookresearch/llama/tree/llama_v1)&#8194;
214
+
215
+ ---
216
+ If you want to join our WeChat group, please scan the following QR Code to add our assistant as a Wechat friend:
217
+ <p align="center"><img width="300" alt="image" src="https://github.com/OpenGVLab/DragGAN/assets/26198430/e3f0807f-956a-474e-8fd2-1f7c22d73997"></p>
app.py CHANGED
@@ -687,4 +687,4 @@ def app(controller="cllm.agents.tog.Controller", https=False, **kwargs):
687
 
688
  if __name__ == "__main__":
689
  os.makedirs(RESOURCE_ROOT, exist_ok=True)
690
- app(controller="cllm.agents.tog.Controller", server_port=10024)
 
687
 
688
  if __name__ == "__main__":
689
  os.makedirs(RESOURCE_ROOT, exist_ok=True)
690
+ app(controller="cllm.agents.tog.Controller")