fin-rwkv-430m / README.md

Update README.md

f67a832 verified 5 months ago

No virus

4.8 kB

	---
	license: apache-2.0
	datasets:
	- gbharti/finance-alpaca
	language:
	- en
	library_name: transformers
	tags:
	- finance
	widget:
	- text: >-
	user: Hypothetical, can taxes ever cause a net loss on otherwise-profitable stocks?

	bot:
	example_title: Hypothetical
	- text: >-
	user: What are some signs that the stock market might crash?

	bot:
	example_title: Question 2
	- text: >-
	user: Where should I be investing my money?

	bot:
	example_title: Question
	- text: >-
	user: Is this headline positive or negative? Headline: Australian Tycoon
	Forrest Shuts Nickel Mines After Prices Crash.

	bot:
	example_title: Sentiment analysis
	- text: >-
	user: Aluminum price per KG is 50$. Forecast max: +1$ min:+0.3$. What should
	be the current price of aluminum?

	bot:
	example_title: Forecast
	---

	# Fin-RWKV: Attention Free Financal Expert (WIP)
	Fin-RWKV is a cutting-edge, attention-free model designed specifically for financial analysis and prediction. Developed as part of a MindsDB Hackathon, this model leverages the simplicity and efficiency of the RWKV architecture to process financial data, providing insights and forecasts with remarkable accuracy. Fin-RWKV is tailored for professionals and enthusiasts in the finance sector who seek to integrate advanced deep learning techniques into their financial analyses.

	## Use Cases
	- Sentiment analysis
	- Forecast
	- Product Pricing

	## Features
	- Attention-Free Architecture: Utilizes the RWKV (Recurrent Weighted Kernel-based) model, which bypasses the complexity of attention mechanisms while maintaining high performance.
	- Lower Costs: 10x to over a 100x+ lower inference cost, 2x to 10x lower training cost
	- Tinyyyy: Lightweight enough to run on CPUs in real-time bypassing the GPU - and is able to run on your laptop today
	- Finance-Specific Training: Trained on the gbharti/finance-alpaca dataset, ensuring that the model is finely tuned for financial data analysis.
	- Transformers Library Integration: Built on the popular 'transformers' library, ensuring easy integration with existing ML pipelines and applications.

	## How to use
	```py
	from transformers import AutoTokenizer, AutoModelForCausalLM, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer
	from threading import Thread
	import torch

	tokenizer = AutoTokenizer.from_pretrained("umuthopeyildirim/fin-rwkv-1b5")
	model = AutoModelForCausalLM.from_pretrained("umuthopeyildirim/fin-rwkv-1b5")

	prompt = "user: Is this headline positive or negative? Headline: Australian Tycoon Forrest Shuts Nickel Mines After Prices Crash\nbot:"

	# Tokenize the input
	input_ids = tokenizer.encode(prompt, return_tensors="pt")

	# Generate a response
	output = model.generate(input_ids, max_length=333, num_return_sequences=1)

	# Decode the output
	generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

	print(generated_text)
	```

	## Competing Against
	\| Name \| Param Count \| Cost \| Inference Cost \|
	\|---------------\|-------------\|------\|----------------\|
	\| Fin-RWKV \| 430M \| $3 \| Free on HuggingFace 🤗 & Low-End CPU \|
	\| [BloombergGPT](https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/) \| 50 Billion \| $1.3 million \| Enterprise GPUs \|
	\| [FinGPT](https://huggingface.co/FinGPT) \| 7 Bilion \| $302.4 \| Consumer GPUs \|


	\| Architecture \| Status \| Compute Efficiency \| Largest Model \| Trained Token \| Link \|
	\|--------------\|--------\|--------------------\|---------------\|---------------\|------\|
	\| (Fin)RWKV \| In Production \| O ( N ) \| 14B \| 500B++ (the pile+) \| [Paper](https://arxiv.org/abs/2305.13048) \|
	\| Ret Net (Microsoft) \| Research \| O ( N ) \| 6.7B \| 100B (mixed) \| [Paper](https://arxiv.org/abs/2307.08621) \|
	\| State Space (Stanford) \| Prototype \| O ( Log N ) \| 355M \| 15B (the pile, subset) \| [Paper](https://arxiv.org/abs/2302.10866) \|
	\| Liquid (MIT) \| Research \| - \| <1M \| - \| [Paper](https://arxiv.org/abs/2302.10866) \|
	\| Transformer Architecture (included for contrasting reference) \| In Production \| O ( N^2 ) \| 800B (est) \| 13T++ (est) \| - \|

	<img src="https://cdn-uploads.huggingface.co/production/uploads/631ea4247beada30465fa606/7vAOYsXH1vhTyh22o6jYB.png" width="500" alt="Inference computational cost vs. Number of tokens">

	## Stats for nerds
	### Training Config
	- n_epoch: 100
	- epoch_save_frequency: 10
	- batch_size: 5
	- ctx_len: 2000
	- T_MAX: 384
	- RWKV_FLOAT_MODE: fp16
	- RWKV_DEEPSPEED: 0

	### Loss
	<img src="https://cdn-uploads.huggingface.co/production/uploads/631ea4247beada30465fa606/NvPKCBlbVhiVeeMpUAv2C.png" width="500" alt="Loss">

	_Note: Needs more data and training, testing purposes only. Not recomended for production level deployment._
	[Presentation](https://docs.google.com/presentation/d/1vNQ8Y5wwR0WXlO60fsXjkru5R9I0ZgykTmgag0B3Ato/edit?usp=sharing)