Update README.md
Browse files
README.md
CHANGED
@@ -7,3 +7,78 @@ metrics:
|
|
7 |
- mae
|
8 |
pipeline_tag: graph-ml
|
9 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
- mae
|
8 |
pipeline_tag: graph-ml
|
9 |
---
|
10 |
+
|
11 |
+
# AtomFormer base model
|
12 |
+
|
13 |
+
This model is a transformer-based model that leverages gaussian pair-wise positional embeddings to train on atomistic graph data. It
|
14 |
+
is part of a suite of datasets/models/utilities in the AtomGen project that supports other methods for pre-training and fine-tuning
|
15 |
+
models on atomistic graphs.
|
16 |
+
|
17 |
+
|
18 |
+
## Model description
|
19 |
+
|
20 |
+
AtomFormer is a transformer model with modifcations to train on atomstic graphs. It builds primarily on the work
|
21 |
+
from uni-mol+ to add the pair-wise pos. embeds. to the attention mask to leverage 3-D positional information.
|
22 |
+
This model was pre-trained on a diverse set of aggregated atomistic datasets where the target task is the per-atom
|
23 |
+
force prediction and the per-system energy prediction.
|
24 |
+
|
25 |
+
The model also includes metadata regarding the atomic species that are being modeled, this includes the atomic radius,
|
26 |
+
electronegativity, valency, etc. The metadata is normalized and projected to be added to the atom embeddings in the model.
|
27 |
+
|
28 |
+
|
29 |
+
## Intended uses & limitations
|
30 |
+
|
31 |
+
You can use the raw model for either force and energy prediction, but it's mostly intended to
|
32 |
+
be fine-tuned on a downstream task. The performance of the model as a force and energy prediction model
|
33 |
+
is not validated, it was primarily used a pre-training task.
|
34 |
+
|
35 |
+
|
36 |
+
### How to use
|
37 |
+
|
38 |
+
You can use this model directly by loading via the Structure2EnergyandForces task:
|
39 |
+
|
40 |
+
```python
|
41 |
+
>>> from transformers import AutoModel
|
42 |
+
```
|
43 |
+
|
44 |
+
Here is how to use this model to get the features of a given atomistic graph in PyTorch:
|
45 |
+
|
46 |
+
```python
|
47 |
+
from transformers import AutoModel
|
48 |
+
```
|
49 |
+
|
50 |
+
|
51 |
+
## Training data
|
52 |
+
|
53 |
+
AtomFormer is trained on an aggregated S2EF dataset from multiple sources such as OC20, OC22, ODAC23, MPtrj, and SPICE
|
54 |
+
with structures and energies/forces for pre-training. The pre-training data includes total energies and formation
|
55 |
+
energies but trains using formation energy (which isn't included for OC22, indicated by "has_formation_energy" column).
|
56 |
+
|
57 |
+
## Training procedure
|
58 |
+
|
59 |
+
|
60 |
+
|
61 |
+
### Preprocessing
|
62 |
+
|
63 |
+
The model expects input in the form of tokenized atomic symbols represented as `input_ids` and 3D coordinates represented
|
64 |
+
as `coords`. For the pre-training task it also expects labels for the `forces` and `formation_energy`.
|
65 |
+
|
66 |
+
The `DataCollatorForAtomModeling` utility in the AtomGen library has the capacity to perform dynamic padding to batch the
|
67 |
+
data together. It also offers the option to flatten the data and provide a `batch` column for gnn-style training.
|
68 |
+
|
69 |
+
|
70 |
+
### Pretraining
|
71 |
+
|
72 |
+
The model was trained on a node of 4xA40 (48 GB) for 10 epochs (~2 weeks). See the
|
73 |
+
[training code](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) for all hyperparameters
|
74 |
+
details.
|
75 |
+
|
76 |
+
## Evaluation results
|
77 |
+
|
78 |
+
We use the Atom3D dataset to evaluate the model's performance on downstream tasks.
|
79 |
+
|
80 |
+
When fine-tuned on downstream tasks, this model achieves the following results:
|
81 |
+
|
82 |
+
| Task | SMP | PIP | RES | MSP | LBA | LEP | PSR | RSR |
|
83 |
+
|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|
|
84 |
+
| | TBD | TBD | TBD | TBD | TBD | TBD | TBD | TBD |
|