---
license: apache-2.0
datasets:
- yuan-tian/chartgpt-dataset
language:
- en
metrics:
- rouge
pipeline_tag: text2text-generation
base_model:
- google/flan-t5-xl
new_version: yuan-tian/chartgpt-llama3
---
# Model Card for ChartGPT
## Model Details
### Model Description
This model is used to generate charts from natural language. For more information, please refer to the paper.
* **Model type:** Language model
* **Language(s) (NLP)**: English
* **License**: Apache 2.0
* **Finetuned from model**: [FLAN-T5-XL](https://huggingface.co/google/flan-t5-xl)
* **Research paper**: [ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language](https://ieeexplore.ieee.org/document/10443572)
### Model Input Format
Click to expand
Model input on the Step `x`. Specifically, `<...>` serves as a seperation token.
```
{table name}
{column names}
{column types}
{data row 1} {data row 2}
{NL utterance}
{Step 1 prompt} {Answer 2}
...
{Step x-1 prompt} {Answer x-1}
{Step x prompt}
```
And the model should output the answer corresponding to step `x`.
The step 1-6 prompts are as follows:
```
Step 1. Select columns:
Step 2. Add filter:
Step 3. Add aggregations:
Step 4. Select chart type:
Step 5. Choose encoding:
Step 6. Add sort:
```
## How to Get Started with the Model
### Running the Model on a GPU
An example of a movie dataset with an utterance "What kinds of movies are the most popular?".
The model should give the answers to step 1 (select columns).
You can use the code below to test if you can run the model successfully.
Click to expand
```python
from transformers import (
AutoTokenizer,
AutoModelForSeq2SeqLM,
)
tokenizer = AutoTokenizer.from_pretrained("yuan-tian/chartgpt")
model = AutoModelForSeq2SeqLM.from_pretrained("yuan-tian/chartgpt", device_map="auto")
input_text = "movies Title,Worldwide_Gross,Production_Budget,Release_Year,Content_Rating,Running_Time,Major_Genre,Creative_Type,Rotten_Tomatoes_Rating,IMDB_Rating nominal,quantitative,quantitative,temporal,nominal,quantitative,nominal,nominal,quantitative,quantitative From Dusk Till Dawn,25728961,20000000,1996,R,107,Horror,Fantasy,63,7.1 Broken Arrow,148345997,65000000,1996,R,108,Action,Contemporary Fiction,55,5.8 What kinds of movies are the most popular? Step 1. Select the columns:"
inputs = tokenizer(input_text, return_tensors="pt", padding=True).to("cuda")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens = True))
```
## Training Details
### Training Data
This model is Fine-tuned from [FLAN-T5-XL](https://huggingface.co/google/flan-t5-xl) on the [chartgpt-dataset](https://huggingface.co/datasets/yuan-tian/chartgpt-dataset).
### Training Procedure
Plan to update the preprocessing and training procedure in the future.
## Citation
**BibTeX:**
```
@article{tian2024chartgpt,
title={ChartGPT: Leveraging LLMs to Generate Charts from Abstract Natural Language},
author={Tian, Yuan and Cui, Weiwei and Deng, Dazhen and Yi, Xinjing and Yang, Yurun and Zhang, Haidong and Wu, Yingcai},
journal={IEEE Transactions on Visualization and Computer Graphics},
year={2024},
pages={1-15},
doi={10.1109/TVCG.2024.3368621}
}
```