O->LED_para document simplification system

This is a pretrained version of the document simplification model presented in the Findings of ACL 2023 paper "Context-Aware Document Simplification".

It is a system based on the Longformer encoder-decoder that operates at the paragraph-level and is intended to be guided by a planner.

Target reading levels (1-4) should be indicated via a control token prepended to each input sequence ("<RL_1>", "<RL_2>", "<RL_3>", "<RL_4>"). If using the terminal interface, this will be handled automatically.

How to use

It is recommended to use the plan_simp library to interface with the model.

Here is how to load this model in PyTorch:

# loading
from plan_simp.models.bart import load_simplifier
simplifier, tokenizer, hparams = load_simplifier("liamcripwell/o-ledpara")

# generation
from plan_simp.scripts.generate import Launcher
launcher = Launcher()
launcher.dynamic(model_ckpt="liamcripwell/o-ledpara", clf_model_ckpt="liamcripwell/pgdyn-plan", **params)

Plan-guided generation and evaluation can be run from the terminal (see the repo for more details).

python doc_simp/scripts/generate.py dynamic
  --clf_model_ckpt=liamcripwell/pgdyn-plan
  --model_ckpt=liamcripwell/o-ledpara
  --test_file=<test_data>
  --doc_id_col=pair_id
  --context_dir=<context_dir>
  --out_file=<output_csv>
  --reading_lvl=s_level
  --context_doc_id=pair_id
  --para_lvl=True

python plan_simp/scripts/eval_simp.py
    --input_data=newselaauto_docs_test.csv
    --output_data=test_out_oledpara.csv
    --x_col=complex_str
    --r_col=simple_str
    --y_col=pred
    --doc_id_col=pair_id
    --prepro=True
    --sent_level=True
Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.