Noelia Ferruz
commited on
Commit
·
a230244
1
Parent(s):
006ad59
Update README.md
Browse files
README.md
CHANGED
@@ -53,8 +53,7 @@ python run_clm.py --model_name_or_path nferruz/ProtGPT2 --train_file training.tx
|
|
53 |
The HuggingFace script run_clm.py can be found here: https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_clm.py
|
54 |
|
55 |
### **How to select the best sequences**
|
56 |
-
We've observed that perplexity values correlate with AlphaFold2's plddt.
|
57 |
-
|
58 |
We recommend to compute perplexity for each sequence with the HuggingFace evaluate method `perplexity`:
|
59 |
|
60 |
```
|
@@ -64,8 +63,7 @@ results = perplexity.compute(predictions=predictions, model_id='nferruz/ProtGPT2
|
|
64 |
```
|
65 |
|
66 |
Where `predictions` is a list containing the generated sequences.
|
67 |
-
|
68 |
-
|
69 |
|
70 |
|
71 |
### **Training specs**
|
|
|
53 |
The HuggingFace script run_clm.py can be found here: https://github.com/huggingface/transformers/blob/master/examples/pytorch/language-modeling/run_clm.py
|
54 |
|
55 |
### **How to select the best sequences**
|
56 |
+
We've observed that perplexity values correlate with AlphaFold2's plddt.
|
|
|
57 |
We recommend to compute perplexity for each sequence with the HuggingFace evaluate method `perplexity`:
|
58 |
|
59 |
```
|
|
|
63 |
```
|
64 |
|
65 |
Where `predictions` is a list containing the generated sequences.
|
66 |
+
We do not yet have a threshold as of what perplexity value gives a 'good' or 'bad' sequence, but given the fast inference times, the best is to sample many sequences, order them by perplexity, and select those with the lower values (the lower the better).
|
|
|
67 |
|
68 |
|
69 |
### **Training specs**
|