MassimoGregorioTotaro commited on
Commit
475d75f
1 Parent(s): fba8f5e

checkbox fix, instructions update

Browse files
Files changed (3) hide show
  1. LICENSE +1 -1
  2. app.py +1 -1
  3. instructions.md +8 -5
LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2021, Massimo G. Totaro All rights reserved.
2
 
3
  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
4
 
 
1
+ Copyright (c) 2023, Massimo G. Totaro All rights reserved.
2
 
3
  Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
4
 
app.py CHANGED
@@ -46,7 +46,7 @@ with open("instructions.md", "r", encoding="utf-8") as md,\
46
  value=""
47
  )
48
  model_name = Dropdown(MODELS, label="Model", value="facebook/esm2_t30_150M_UR50D")
49
- scoring_strategy = Checkbox(value=True, label="Use masked-marginals scoring")
50
  btn = Button(value="Run")
51
  out = HTML()
52
  bto = File(
 
46
  value=""
47
  )
48
  model_name = Dropdown(MODELS, label="Model", value="facebook/esm2_t30_150M_UR50D")
49
+ scoring_strategy = Checkbox(value=True, label="Use higher accuracy scoring", interactive=True)
50
  btn = Button(value="Run")
51
  out = HTML()
52
  bto = File(
instructions.md CHANGED
@@ -1,8 +1,12 @@
1
  # **ESM-Scan**
2
  Calculate the <u>fitness of single amino acid substitutions</u> on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648) [language model predictor](https://github.com/facebookresearch/esm)
3
 
4
- <details>
5
- <summary> <b> USAGE INSTRUCTIONS </b> </summary>
 
 
 
 
6
 
7
  ### **Setup**
8
  No setup is required, just fill the input boxes with the required data and click on the `Run` button.
@@ -21,11 +25,10 @@ Running a calculation resumes the tool from standby, the first run might take lo
21
  + any other *different input*: a deep mutational scan of the full sequence will be performed
22
  - the ESM model to use for the calculations can be chosen among those that are available on Hugging Face Model Hub;
23
  `esm2_t33_650M_UR50D` offers the best expense-accuracy tradeoff[*](https://doi.org/10.1126/science.ade2574)
24
- - the `masked-marginals` scoring strategy considers sequence context at inference time, being slower but more accurate;
25
- in case of long runtimes, you can tick the box off to speed the calculations up significantly, sacrificing accuracy
26
  - when running a deep mutational scan, it is recommended to use smaller models (8M, 35M, 150M parameters), since the runtime is significant, especially for longer sequences and the server might be overloaded;
27
  over 30 min might be necessary for calculating a 300-residue-long sequence with larger models
28
- in general, accuracy is influenced significantly by the scoring strategy and less so by the model size, so it is suggested to reduce the latter first when optimising for runtime;
29
  the scoring strategy computational cost scales with the number of substitutions tested, while the model’s with the wild-type sequence length
30
  - it is possible to calculate the effect of multiple concurrent substitutions, but this has to be done manually, by changing the input sequence and running the calculation again
31
 
 
1
  # **ESM-Scan**
2
  Calculate the <u>fitness of single amino acid substitutions</u> on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648) [language model predictor](https://github.com/facebookresearch/esm)
3
 
4
+ If you use this tool in your research, please cite:
5
+ - Totaro, M.G. (2023). “ESM-Scan - a tool to guide amino acid substitutions.” bioRxiv. [doi.org/10.1101/2023.12.12.571273](https://doi.org/10.1101/2023.12.12.571273)
6
+ - Meier, J. (2021). “Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function.” bioRxiv (Cold Spring Harbor Laboratory), July. [doi.org/10.1101/2021.07.09.450648](https://doi.org/10.1101/2021.07.09.450648)
7
+
8
+ <details>
9
+ <summary> <b> USAGE INSTRUCTIONS </b> </summary>
10
 
11
  ### **Setup**
12
  No setup is required, just fill the input boxes with the required data and click on the `Run` button.
 
25
  + any other *different input*: a deep mutational scan of the full sequence will be performed
26
  - the ESM model to use for the calculations can be chosen among those that are available on Hugging Face Model Hub;
27
  `esm2_t33_650M_UR50D` offers the best expense-accuracy tradeoff[*](https://doi.org/10.1126/science.ade2574)
28
+ - the more accurate `masked-marginals` scoring strategy considers sequence context during inferences, increasing the runtime significantly; if the wait is too long, you can tick the box off to speed the calculations, sacrificing accuracy
 
29
  - when running a deep mutational scan, it is recommended to use smaller models (8M, 35M, 150M parameters), since the runtime is significant, especially for longer sequences and the server might be overloaded;
30
  over 30 min might be necessary for calculating a 300-residue-long sequence with larger models
31
+ in general, accuracy is influenced more by the scoring strategy and less so by the model size, so it is suggested to reduce the latter first when optimising for runtime;
32
  the scoring strategy computational cost scales with the number of substitutions tested, while the model’s with the wild-type sequence length
33
  - it is possible to calculate the effect of multiple concurrent substitutions, but this has to be done manually, by changing the input sequence and running the calculation again
34