vibhorg
/

rl4llm_uofm_nlpo_unsuper_t5_arxiv

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

vibhorg commited on Mar 20, 2024

Commit

5e268f8

·

verified ·

1 Parent(s): a71d1c9

Update README.md

Files changed (1) hide show

README.md +14 -0

README.md CHANGED Viewed

@@ -1,3 +1,17 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+datasets:
+- scientific_papers
+metrics:
+- bertscore
+- rouge
+tags:
+- text-generation-inference
+- rlhf
+- PPO
+language:
+- en
+---
+This model is fintuned using PPO based NLPO RL algorithm, on ccdv/arxiv-summarization dataset. The base model is flan-t5-base model.