vibhorg
/

rl4llm_uofm_nlpo_super_t5_arxiv

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

vibhorg commited on Mar 13

Commit

e1542dd

•

1 Parent(s): e58d3bf

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -11,4 +11,6 @@ tags:
 - PPO
 language:
 - en
----

 - PPO
 language:
 - en
+---
+This model is fintuned using PPO based NLPO RL algorithm, on ccdv/arxiv-summarization dataset. The base model is pretunerd version of flan-t5-base model.