arxiv:2502.08127

Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance

Published on Feb 12

· Submitted by

jiminHuang on Feb 13

#1 Paper of the day

Upvote

Authors:

Jimin Huang ,

Qianqian Xie

Abstract

Recent advancements in large language models (LLMs) have shown strong general reasoning abilities, yet their effectiveness in financial reasoning remains underexplored. In this study, we comprehensively evaluate 16 powerful reasoning and general LLMs on three complex financial tasks involving financial text, tabular data, and equations, assessing numerical reasoning, tabular interpretation, financial terminology comprehension, long-context processing, and equation-based problem solving. Our results show that while better datasets and pretraining improve financial reasoning, general enhancements like CoT fine-tuning do not always yield consistent gains. Moreover, all reasoning strategies face challenges in improving performance on long-context and multi-table tasks. To address these limitations, we develop a financial reasoning-enhanced model based on Llama-3.1-8B-Instruct, by CoT fine-tuning and reinforcement learning with domain-specific reasoning paths. Even with simple fine-tuning with one financial dataset, our model achieves a consistent 10% performance improvement across tasks, surpassing all 8B models and even Llama3-70B-Instruct and Llama3.1-70B-Instruct on average. Our results highlight the need for domain-specific adaptations in financial tasks, emphasizing future directions such as multi-table reasoning, long-context processing, and financial terminology comprehension. All our datasets, models, and codes are publicly available. Furthermore, we introduce a leaderboard for benchmarking future datasets and models.

View arXiv page View PDF Add to collection

Community

jiminHuang

Paper author Paper submitter 2 days ago

Check our model here: https://huggingface.co/TheFinAI/Fino1-8B

jiminHuang

Paper author Paper submitter 2 days ago

And also our dataset https://huggingface.co/datasets/TheFinAI/Fino1_Reasoning_Path_FinQA

mukaj

1 day ago

https://huggingface.co/mukaj/Llama-3.1-Hawkish-8B

Similar sized model also trained for financial reasoning, just tested it on FinQA and scored 60.94%, would be good to include in your leaderboard.

lfqian

1 day ago

•

edited about 17 hours ago

Thank you for the comment. I have tested the performance of this model using FinQA dataset. But using our evaluation method from https://github.com/yale-nlp/DocMath-Eval that using GPT to extract results and compare the results, we only get 46.85%. May I ask what evaluation method are you using for this?