vibhorg
/

rl4llm_uofm_nlpo_super_t5_arxiv

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

rl4llm_uofm_nlpo_super_t5_arxiv / README.md

vibhorg's picture

Update README.md

e1542dd verified 9 months ago

|

history blame contribute delete

304 Bytes

	---
	license: apache-2.0
	datasets:
	- scientific_papers
	metrics:
	- bertscore
	- rouge
	tags:
	- text-generation-inference
	- rlhf
	- PPO
	language:
	- en
	---

	This model is fintuned using PPO based NLPO RL algorithm, on ccdv/arxiv-summarization dataset. The base model is pretunerd version of flan-t5-base model.