RUCKBReasoning
/

TableLLM-13b

@@ -1,7 +1,3 @@
----
-license: llama2
----
 # TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
 We present **T**able**LLM**, a powerful large language model designed to handle tabular data manipulation tasks efficiently, whether they are embedded in spreadsheets or documents, meeting the demands of real office scenarios. The TLLM series encompasses two distinct scales: TLLM-7B and TLLM-13B, which are fine-tuned based on CodeLlama-7B and 13B.
@@ -11,21 +7,21 @@ TLLM generates either a code solution or a direct text answer to handle tabular
 ## Evaluation Results
 We evaluate the code solution generation ability of TLLM on three benchmarks: WikiSQL, Spider and Self-created table operation benchmark. The text answer generation ability is tested on four benchmarks: WikiTableQuestion (WikiTQ), TAT-QA, FeTaQA and OTTQA. The evaluation result is shown below:
-| Model                | WikiTQ | TAT-QA | FeTaQA | OTTQA | WikiSQL | Spider | Self-created | Average |
-| :------------------- | :----: | :----: | :----: | :---: | :-----: | :----: | :----------: | :-----: |
-| TaPEX                |  38.5  |    –   |    –   |   –   |   83.9  |  15.0  |       /      |   45.8  |
-| TaPas                |  31.5  |    –   |    –   |   –   |   74.2  |  23.1  |       /      |   42.92 |
-| TableLlama           |  24.0  |  22.2  |  18.9  |   6.4 |   43.7  |   9.0  |       /      |   20.7  |
-| GPT3.5               |  58.5  |  72.1  |  71.2  |  60.8 |   81.7  |  67.4  |     77.1     |   69.8  |
-| GPT4                 |  74.1  |  77.1  |  78.4  |  69.5 |   84.0  |  69.5  |     77.8     |   75.8  |
-| Llama2-Chat (13B)    |  48.8  |  49.6  |  67.7  |  61.5 |    –    |    –   |       –      |   56.9  |
-| CodeLlama (13B)      |  43.4  |  47.2  |  57.2  |  49.7 |   38.3  |  21.9  |     47.6     |   43.6  |
-| Deepseek-Coder (33B) |   6.5  |  11.0  |   7.1  |   7.4 |   72.5  |  58.4  |     73.9     |   33.8  |
-| StructGPT (GPT3.5)   |  52.5  |  27.5  |  11.8  |  14.0 |   67.8  |  84.8  |       /      |   48.9  |
-| Binder (GPT3.5)      |  61.6  |  12.8  |   6.8  |   5.1 |   78.6  |  52.6  |       /      |   42.5  |
-| DATER (GPT3.5)       |  53.4  |  28.4  |  18.3  |  13.0 |   58.2  |  26.5  |       /      |   37.0  |
-| TLLM-7B (Ours)       |  58.8  |  66.9  |  72.6  |  63.1 |   86.6  |  82.6  |     78.8     |   72.8  |
-| TLLM-13B (Ours)      |  62.4  |  68.2  |  74.5  |  62.5 |   90.7  |  83.4  |     80.8     |   74.7  |
 ## Prompt Template
 The prompts we used for generating code solutions and text answers are introduced below.

 # TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
 We present **T**able**LLM**, a powerful large language model designed to handle tabular data manipulation tasks efficiently, whether they are embedded in spreadsheets or documents, meeting the demands of real office scenarios. The TLLM series encompasses two distinct scales: TLLM-7B and TLLM-13B, which are fine-tuned based on CodeLlama-7B and 13B.
 ## Evaluation Results
 We evaluate the code solution generation ability of TLLM on three benchmarks: WikiSQL, Spider and Self-created table operation benchmark. The text answer generation ability is tested on four benchmarks: WikiTableQuestion (WikiTQ), TAT-QA, FeTaQA and OTTQA. The evaluation result is shown below:
+| Model                | WikiTQ | TAT-QA | FeTaQA |  OTTQA  | WikiSQL | Spider | Self-created | Average |
+| :------------------- | :----: | :----: | :----: | :-----: | :-----: | :----: | :----------: | :-----: |
+| TaPEX                |  38.5  |    –   |    –   |    –    |   83.9  |  15.0  |       /      |   45.8  |
+| TaPas                |  31.5  |    –   |    –   |    –    |   74.2  |  23.1  |       /      |   42.92 |
+| TableLlama           |  24.0  |  22.2  |  18.9  |   6.4   |   43.7  |   9.0  |       /      |   20.7  |
+| GPT3.5               |  58.5  |<ins>72.1</ins>|  71.2  |  60.8   |   81.7  |  67.4  |     77.1     |   69.8  |
+| GPT4                 |**74.1**|**77.1**|**78.4**|**69.5** |   84.0  |  69.5  |     77.8     | **75.8**|
+| Llama2-Chat (13B)    |  48.8  |  49.6  |  67.7  |  61.5   |    –    |    –   |       –      |   56.9  |
+| CodeLlama (13B)      |  43.4  |  47.2  |  57.2  |  49.7   |   38.3  |  21.9  |     47.6     |   43.6  |
+| Deepseek-Coder (33B) |   6.5  |  11.0  |   7.1  |   7.4   |   72.5  |  58.4  |     73.9     |   33.8  |
+| StructGPT (GPT3.5)   |  52.5  |  27.5  |  11.8  |  14.0   |   67.8  |**84.8**|       /      |   48.9  |
+| Binder (GPT3.5)      |  61.6  |  12.8  |   6.8  |   5.1   |   78.6  |  52.6  |       /      |   42.5  |
+| DATER (GPT3.5)       |  53.4  |  28.4  |  18.3  |  13.0   |   58.2  |  26.5  |       /      |   37.0  |
+| TLLM-7B (Ours)       |  58.8  |  66.9  |  72.6  |<ins>63.1</ins>|<ins>86.6</ins>|  82.6  |<ins>78.8</ins>|   72.8  |
+| TLLM-13B (Ours)      |<ins>62.4</ins>|  68.2  |<ins>74.5</ins>|  62.5   | **90.7**|<ins>83.4</ins>|   **80.8**   |<ins>74.7</ins>|
 ## Prompt Template
 The prompts we used for generating code solutions and text answers are introduced below.