File size: 4,144 Bytes
95eaf35
6370672
95eaf35
6370672
 
95eaf35
6370672
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95eaf35
6370672
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: thread-gpt
app_file: app.py
sdk: gradio
sdk_version: 4.4.1
---
<h1 align="center">ThreadGPT</h1>
<p align="center">
  <img src="images/logo.png" alt="ThreadGPT Logo" style="height: 150px">
</p>

Struggling to keep up with the latest AI research papers? **ThreadGPT** is here to help. It seamlessly transforms complex academic papers into concise, easy-to-understand threads. Not only does it summarize the text, but it also includes relevant figures, tables, and visuals from the papers directly into the threads. πŸ§΅βœ¨πŸ“„

<p align="center">
  <img src="./images/gradio.png" alt="Gradio UI" width="800">
  <br>
  <i>Gradio App UI</i>
</p>

<p align="center">
  <img src="./images/examples.png" alt="Example Threads" width="1200">
  <br>
  <i>Examples of threads generated by ThreadGPT (<a href="https://twitter.com/paper_threadoor">@paper_threadoor</a>)</i>
</p>

## πŸ› οΈ Installation

### Clone the repo

```bash
git clone https://github.com/wiskojo/thread-gpt
```

### Install dependencies

```bash
# Install PyTorch, torchvision, and torchaudio
# Please refer to the official PyTorch website (https://pytorch.org) for the installation command that matches your system. Example:
pip install torch==2.0.0 torchvision==0.15.1

# Install all other dependencies
pip install -r requirements.txt
```

### Configure environment variables

Copy the `.env.template` file and fill in your `OPENAI_API_KEY`.

```bash
cp .env.template .env
```

## πŸš€ Getting Started

Before proceeding, please ensure that all the installation steps have been successfully completed.

### 🚨 Cost Warning

Please be aware that usage of GPT-4 with the assistant API can incur high costs. Make sure to monitor your usage and understand the pricing details provided by OpenAI before proceeding.

### Gradio

```bash
python app.py
```

### CLI

#### 🧡 Create Thread

To create a thread, you can either provide a URL to a file or a local path to a file. Use the following commands:

```bash
# For a URL
python thread.py <URL_TO_PDF>

# For a local file
python thread.py <LOCAL_PATH_TO_PDF>
```

By default, you will find all outputs under `./data/<PDF_NAME>`. It will have the following structure.

```
./data/<PDF_NAME>/
β”œβ”€β”€ figures/
β”‚   β”œβ”€β”€ <figure_1_name>.jpg
β”‚   β”œβ”€β”€ <figure_2_name>.png
β”‚   └── ...
β”œβ”€β”€ <PDF_NAME>.pdf
β”œβ”€β”€ results.json
β”œβ”€β”€ thread.json
β”œβ”€β”€ processed_thread.json
└── processed_thread.md
```

The final output for user consumption is located at `./data/<PDF_NAME>/processed_thread.md`. This file is formatted in Markdown and can be conveniently viewed using any Markdown editor.

#### All Contents

1. `figures/`: This directory contains all the figures, tables, and visuals that have been extracted from the paper.
2. `<PDF_NAME>.pdf`: This is the original PDF file.
3. `results.json`: This file contains the results of the layout parsing. It includes an index of all figures, their paths, and captions that were passed to OpenAI.
4. `thread.json`: This file contains the raw thread that was generated by OpenAI before any post-processing was done.
5. `processed_thread.json`: This file is a post-processed version of `thread.json`. The post-processing includes steps such as removing source annotations and duplicate figures.
6. `processed_thread.md`: This is a markdown version of `processed_thread.json`. It is the final output provided for user consumption.

#### πŸ“¨ Share Thread

To actually share the thread on X/Twitter, you need to set up the credentials in the `.env` file. This requires creating a [developer account](https://developer.twitter.com/) and filling in your `CONSUMER_KEY`, `CONSUMER_SECRET`, `ACCESS_KEY`, and `ACCESS_SECRET`. Then run this command on the created JSON file:

```bash
python tweet.py ./data/<PDF_NAME>/processed_thread.json
```

#### πŸ”§ Customize Assistant

ThreadGPT utilizes OpenAI's assistant API. To customize the assistant's behavior, you need to modify the `create_assistant.py` file. This script has defaults for the prompt, name, tools, and model (`gpt-4-1106-preview`). You can customize these parameters to your liking.