vibhorg
/

rl4llm_uofm_nlpo_unsuper_t5_arxiv

license: apache-2.0
datasets:
  - scientific_papers
metrics:
  - bertscore
  - rouge
tags:
  - text-generation-inference
  - rlhf
  - PPO
language:
  - en

This model is fintuned using PPO based NLPO RL algorithm, on ccdv/arxiv-summarization dataset. The base model is flan-t5-base model.