FinGEITje-7B-sft / README.md
snoels's picture
Update README.md
b8732af verified
---
license: cc-by-nc-4.0
library_name: peft
tags:
- alignment-handbook
- generated_from_trainer
- trl
- sft
- geitje
- fingeitje
- dutch
- nl
- finance
base_model: BramVanroy/GEITje-7B-ultra
datasets:
- snoels/FinGEITje-sft
model-index:
- name: snoels/FinGEITje-7B-sft
results: []
language:
- nl
pipeline_tag: text-generation
inference: false
---
<p align="center" style="margin:0;padding:0">
<img src="https://huggingface.co/snoels/FinGEITje-7B-sft/resolve/main/fingeitje-banner.png" alt="FinGEITje Banner" width="1000"/>
</p>
<div style="margin:auto; text-align:center">
<h1 style="margin-bottom: 0; font-size: 2em;">🐐 FinGEITje 7B</h1>
<em style="font-size: 1em;">A large open Dutch Financial language model.</em>
</div>
This model is a fine-tuned version of [BramVanroy/GEITje-7B-ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra) on the [snoels/FinGEITje-sft](https://huggingface.co/datasets/snoels/FinGEITje-sft) dataset.
## πŸ“– Model Description
FinGEITje 7B is a large open Dutch financial language model with 7 billion parameters, based on Mistral 7B. It has been further trained on Dutch financial texts, enhancing its proficiency in the Dutch language and its knowledge of financial topics. As a result, FinGEITje provides more accurate and relevant responses in the domain of finance.
## πŸ“Š Training and Evaluation Data
### Training Data
FinGEITje 7B was fine-tuned on the [snoels/FinGEITje-sft](https://huggingface.co/datasets/snoels/FinGEITje-sft) dataset, which consists of translated and processed Dutch financial texts. This dataset includes a wide range of financial topics and instruction tuning data.
#### Data Processing Steps
1. **Translation**: Original instruction tuning datasets were translated into Dutch using a specialized translation service to maintain the integrity of financial terminology.
2. **Post-processing**: The translated data underwent post-processing to correct any translation inconsistencies and to format it according to the original dataset structure.
3. **Formatting**: The data was formatted to match the style and requirements of instruction tuning datasets, ensuring compatibility with the fine-tuning process.
4. **Filtering**: A Dutch language check and predefined validation checks were applied to filter out any low-quality or irrelevant data.
### Evaluation Data
The model was evaluated using:
- **[snoels/FinDutchBench](https://huggingface.co/datasets/snoels/FinDutchBench)**: A Dutch financial benchmark dataset designed to assess the model's performance on various financial tasks.
## βš™οΈ Training Procedure
FinGEITje was trained following the methodology described in the [Alignment Handbook](https://github.com/huggingface/alignment-handbook).
### Training Configuration
- The training configuration is based on the recipe outlined in the alignment handbook and can be found in the [config_qlora.yaml](https://github.com/snoels/fingeit/blob/master/src/training/sft/config_qlora.yaml) file.
- The model was further trained using **QLoRA** (Quantized LoRA) for efficient fine-tuning with reduced computational resources.
### Training Hyperparameters
The following hyperparameters were used during training:
- **Learning Rate**: 0.0002
- **Train Batch Size**: 4
- **Evaluation Batch Size**: 8
- **Seed**: 42
- **Distributed Type**: Multi-GPU
- **Gradient Accumulation Steps**: 2
- **Total Train Batch Size**: 8
- **Optimizer**: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- **LR Scheduler Type**: Cosine
- **Warmup Ratio**: 0.1
- **Number of Epochs**: 1
### Training Results
| Training Loss | Epoch | Step | Validation Loss |
|---------------|-------|------|-----------------|
| 0.406 | 1.0 | 3922 | 0.3928 |
### Evaluation Package
The evaluation package includes a set of metrics defined per task, grouped per dataset to evaluate the model's performance across different financial domains. The evaluation notebooks are available:
- **[Evaluation in Dutch](https://github.com/snoels/fingeit/blob/master/notebooks/evaluation_nl.ipynb)**: Assesses the model's performance on the Dutch financial benchmark dataset.
- **[Evaluation in English](https://github.com/snoels/fingeit/blob/master/notebooks/evaluation_en.ipynb)**: Evaluates the model's performance on English financial benchmarks for comparison purposes.
### Framework Versions
- **PEFT**: 0.7.1
- **Transformers**: 4.39.0.dev0
- **PyTorch**: 2.1.2
- **Datasets**: 2.14.6
- **Tokenizers**: 0.15.2
## πŸ› οΈ How to Use
FinGEITje 7B can be utilized using the Hugging Face Transformers library along with PEFT to load the LoRA adapters efficiently.
### Installation
Ensure you have the necessary libraries installed:
```bash
pip install torch transformers peft accelerate
```
### Loading the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("BramVanroy/GEITje-7B-ultra", use_fast=False)
# Load the base model
base_model = AutoModelForCausalLM.from_pretrained("BramVanroy/GEITje-7B-ultra", device_map='auto')
# Load the FinGEITje model with PEFT adapters
model = PeftModel.from_pretrained(base_model, "snoels/FinGEITje-7B-sft", device_map='auto')
```
### Generating Text
```python
# Prepare the input
input_text = "Wat zijn de laatste trends in de Nederlandse banksector?"
input_ids = tokenizer.encode(input_text, return_tensors='pt').to(model.device)
# Generate a response
outputs = model.generate(input_ids, max_length=200, num_return_sequences=1)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## 🚧 Limitations and Future Work
While FinGEITje 7B demonstrates significant improvements in understanding and generating Dutch financial content, certain limitations exist:
- **Data Cutoff**: The model's knowledge is limited to the data it was trained on and may not include the most recent developments in the financial sector.
- **Accuracy Concerns**: The model may generate incorrect or outdated information. Users should verify critical information with reliable sources.
- **Biases**: Potential biases in the training data may affect the neutrality and fairness of the model's responses.
- **Language Scope**: Primarily designed for Dutch; performance in other languages is not optimized.
- **Ethical Use**: Users should ensure that the model's outputs comply with ethical standards and do not promote misinformation or harmful content.
### Future Work
- **Data Updates**: Incorporate more recent and diverse financial datasets to keep the model up-to-date.
- **Bias Mitigation**: Implement techniques to identify and reduce biases in the model's outputs.
- **Performance Enhancement**: Fine-tune on more specialized financial topics and complex financial tasks.
- **Multilingual Expansion**: Extend support to other languages relevant to the financial sector in the Netherlands and Europe.
## πŸ™ Acknowledgements
We would like to thank:
- **Rijgersberg** ([GitHub](https://github.com/Rijgersberg)) for creating [GEITje](https://github.com/Rijgersberg/GEITje), one of the first Dutch foundation models, and for contributing significantly to the development of Dutch language models.
- **Bram Vanroy** ([GitHub](https://github.com/BramVanroy)) for creating [GEITje-7B-ultra](https://huggingface.co/BramVanroy/GEITje-7B-ultra), an open-source Dutch chat model, and for sharing training, translation, and evaluation resources.
- **Contributors of the [Alignment Handbook](https://github.com/huggingface/alignment-handbook)** for providing valuable resources that guided the development and training process of FinGEITje.
- **Silverfin** for their collaboration in this research. Silverfin, a Belgian scale-up focused on building an accountancy cloud service, provided valuable insights and resources that were instrumental in the development of FinGEITje. More about their work can be found at [Silverfin](https://silverfin.com/).
## πŸ“ Citation
[Link to the paper](https://arxiv.org/abs/2410.12835)
If you use FinGEITje in your work, please cite:
```bibtex
@article{FinGEITje2024,
title={A Dutch Financial Large Language Model},
author={Noels, Sander and De Blaere, Jorne and De Bie, Tijl},
journal={arXiv preprint arXiv:2410.12835},
year={2024},
url={https://arxiv.org/abs/2410.12835}
}
```
## πŸ“œ License
This model is licensed under the [Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/) license.
## πŸ“§ Contact
For any inquiries or questions, please contact [Sander Noels](mailto:sander.noels@ugent.be).