license: apache-2.0 | |
datasets: | |
- scientific_papers | |
metrics: | |
- bertscore | |
- rouge | |
tags: | |
- text-generation-inference | |
- rlhf | |
- PPO | |
language: | |
- en | |
This model is fintuned using PPO based NLPO RL algorithm, on ccdv/arxiv-summarization dataset. The base model is pretunerd version of flan-t5-base model. |