File size: 2,466 Bytes
8d9b535
 
657aaab
 
9043a99
055674c
 
 
 
e70d6a8
055674c
 
 
 
 
e70d6a8
055674c
 
 
 
0d18f67
055674c
 
 
 
dc7c26f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a333b9b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
license: mit
language:
- en
pipeline_tag: fill-mask
---

# Model Details

Domain Specific BERT model for Text Mining in Energy & Material Field
## Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Tong Xie, Yuwei Wan, Juntao Fang, Prof. Bram Hoex
- **Supported by:** University of New South Wales, National Computational Infrastructure Australia
- **Model type:** Transformer
- **Language(s) (NLP):** EN
- **License:** MIT

## Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [Github](https://github.com/MasterAI-EAM/EnergyBERT)
- **Paper:** [Under Prepreation]

# Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

## Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

Text Mining in Energy & Material Fields

## Downstream Use 

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

The EnergyBERT model can be expanded way beyond just text classification. It can be fine-tuned to perform various other downstream NLP tasks in the domain of Energy & Material

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import pipeline
unmasker = pipeline('fill-mask', model='EnergyBERT')
unmasker("Hello I'm a <mask> model.")
```

 Training Details

## Training Data

<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

1.2M Published full-text literature corpus from 2000 to 2021.

## Training Procedure 

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

BERT is trained on two unsupervised tasks during its pre-training period: masked language modeling and next sentence prediction. A masked language model involves masking some of the input tokens at random and training the model to predict the masked tokens based on the context surrounding the input tokens. Next sentence prediction involves training the model to predict whether two sentences follow each other logically.

### Training Hyperparameters

- **Training regime:**