Rakib commited on
Commit
61ed0ae
1 Parent(s): 2adacb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -22
README.md CHANGED
@@ -1,6 +1,14 @@
1
  ---
2
  language:
3
  - en
 
 
 
 
 
 
 
 
4
  ---
5
  # Model Card for roberta-base-on-cuad
6
 
@@ -121,27 +129,20 @@ More information needed
121
 
122
  **BibTeX:**
123
  ```
124
- @article{DBLP:journals/corr/abs-1907-11692,
125
- author = {Yinhan Liu and
126
- Myle Ott and
127
- Naman Goyal and
128
- Jingfei Du and
129
- Mandar Joshi and
130
- Danqi Chen and
131
- Omer Levy and
132
- Mike Lewis and
133
- Luke Zettlemoyer and
134
- Veselin Stoyanov},
135
- title = {RoBERTa: {A} Robustly Optimized {BERT} Pretraining Approach},
136
- journal = {CoRR},
137
- volume = {abs/1907.11692},
138
- year = {2019},
139
- url = {http://arxiv.org/abs/1907.11692},
140
- archivePrefix = {arXiv},
141
- eprint = {1907.11692},
142
- timestamp = {Thu, 01 Aug 2019 08:59:33 +0200},
143
- biburl = {https://dblp.org/rec/journals/corr/abs-1907-11692.bib},
144
- bibsource = {dblp computer science bibliography, https://dblp.org}
145
  }
146
  ```
147
 
@@ -175,4 +176,4 @@ tokenizer = AutoTokenizer.from_pretrained("Rakib/roberta-base-on-cuad")
175
 
176
  model = AutoModelForQuestionAnswering.from_pretrained("Rakib/roberta-base-on-cuad")
177
  ```
178
- </details>
 
1
  ---
2
  language:
3
  - en
4
+ license: mit
5
+ datasets:
6
+ - cuad
7
+ pipeline_tag: question-answering
8
+ tags:
9
+ - legal-contract-review
10
+ - roberta
11
+ - cuad
12
  ---
13
  # Model Card for roberta-base-on-cuad
14
 
 
129
 
130
  **BibTeX:**
131
  ```
132
+ @inproceedings{nawar-etal-2022-open,
133
+ title = "An Open Source Contractual Language Understanding Application Using Machine Learning",
134
+ author = "Nawar, Afra and
135
+ Rakib, Mohammed and
136
+ Hai, Salma Abdul and
137
+ Haq, Sanaulla",
138
+ booktitle = "Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference",
139
+ month = jun,
140
+ year = "2022",
141
+ address = "Marseille, France",
142
+ publisher = "European Language Resources Association",
143
+ url = "https://aclanthology.org/2022.lateraisse-1.6",
144
+ pages = "42--50",
145
+ abstract = "Legal field is characterized by its exclusivity and non-transparency. Despite the frequency and relevance of legal dealings, legal documents like contracts remains elusive to non-legal professionals for the copious usage of legal jargon. There has been little advancement in making legal contracts more comprehensible. This paper presents how Machine Learning and NLP can be applied to solve this problem, further considering the challenges of applying ML to the high length of contract documents and training in a low resource environment. The largest open-source contract dataset so far, the Contract Understanding Atticus Dataset (CUAD) is utilized. Various pre-processing experiments and hyperparameter tuning have been carried out and we successfully managed to eclipse SOTA results presented for models in the CUAD dataset trained on RoBERTa-base. Our model, A-type-RoBERTa-base achieved an AUPR score of 46.6{\%} compared to 42.6{\%} on the original RoBERT-base. This model is utilized in our end to end contract understanding application which is able to take a contract and highlight the clauses a user is looking to find along with it{'}s descriptions to aid due diligence before signing. Alongside digital, i.e. searchable, contracts the system is capable of processing scanned, i.e. non-searchable, contracts using tesseract OCR. This application is aimed to not only make contract review a comprehensible process to non-legal professionals, but also to help lawyers and attorneys more efficiently review contracts.",
 
 
 
 
 
 
 
146
  }
147
  ```
148
 
 
176
 
177
  model = AutoModelForQuestionAnswering.from_pretrained("Rakib/roberta-base-on-cuad")
178
  ```
179
+ </details>