Spaces:

thaidaev
/

zsp

Running

App Files Files

MassimoGregorioTotaro commited on Dec 13, 2023

Commit

475d75f

•

1 Parent(s): fba8f5e

checkbox fix, instructions update

Browse files

Files changed (3) hide show

LICENSE +1 -1
app.py +1 -1
instructions.md +8 -5

LICENSE CHANGED Viewed

@@ -1,4 +1,4 @@
-Copyright (c) 2021, Massimo G. Totaro All rights reserved.
 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:


1	+ Copyright (c) 2023, Massimo G. Totaro All rights reserved.
2
3	Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
4

app.py CHANGED Viewed

@@ -46,7 +46,7 @@ with open("instructions.md", "r", encoding="utf-8") as md,\
         value=""
     )
     model_name = Dropdown(MODELS, label="Model", value="facebook/esm2_t30_150M_UR50D")
-    scoring_strategy = Checkbox(value=True, label="Use masked-marginals scoring")
     btn = Button(value="Run")
     out = HTML()
     bto = File(

         value=""
     )
     model_name = Dropdown(MODELS, label="Model", value="facebook/esm2_t30_150M_UR50D")
+    scoring_strategy = Checkbox(value=True, label="Use higher accuracy scoring", interactive=True)
     btn = Button(value="Run")
     out = HTML()
     bto = File(

instructions.md CHANGED Viewed

@@ -1,8 +1,12 @@
 # **ESM-Scan**
 Calculate the <u>fitness of single amino acid substitutions</u> on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648) [language model predictor](https://github.com/facebookresearch/esm)
-  <details>
-    <summary> <b> USAGE INSTRUCTIONS </b> </summary>
 ### **Setup**
 No setup is required, just fill the input boxes with the required data and click on the `Run` button.
@@ -21,11 +25,10 @@ Running a calculation resumes the tool from standby, the first run might take lo
   + any other *different input*: a deep mutational scan of the full sequence will be performed
 - the ESM model to use for the calculations can be chosen among those that are available on Hugging Face Model Hub;
   `esm2_t33_650M_UR50D` offers the best expense-accuracy tradeoff[*](https://doi.org/10.1126/science.ade2574)
-- the `masked-marginals` scoring strategy considers sequence context at inference time, being slower but more accurate;
-  in case of long runtimes, you can tick the box off to speed the calculations up significantly, sacrificing accuracy
 - when running a deep mutational scan, it is recommended to use smaller models (8M, 35M, 150M parameters), since the runtime is significant, especially for longer sequences and the server might be overloaded;
   over 30 min might be necessary for calculating a 300-residue-long sequence with larger models
-  in general, accuracy is influenced significantly by the scoring strategy and less so by the model size, so it is suggested to reduce the latter first when optimising for runtime;
   the scoring strategy computational cost scales with the number of substitutions tested, while the model’s with the wild-type sequence length
 - it is possible to calculate the effect of multiple concurrent substitutions, but this has to be done manually, by changing the input sequence and running the calculation again

 # **ESM-Scan**
 Calculate the <u>fitness of single amino acid substitutions</u> on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648) [language model predictor](https://github.com/facebookresearch/esm)
+If you use this tool in your research, please cite:
+- Totaro, M.G. (2023). “ESM-Scan - a tool to guide amino acid substitutions.” bioRxiv. [doi.org/10.1101/2023.12.12.571273](https://doi.org/10.1101/2023.12.12.571273)
+- Meier, J. (2021). “Language Models Enable Zero-Shot Prediction of the Effects of Mutations on Protein Function.” bioRxiv (Cold Spring Harbor Laboratory), July. [doi.org/10.1101/2021.07.09.450648](https://doi.org/10.1101/2021.07.09.450648)
+<details>
+  <summary> <b> USAGE INSTRUCTIONS </b> </summary>
 ### **Setup**
 No setup is required, just fill the input boxes with the required data and click on the `Run` button.
   + any other *different input*: a deep mutational scan of the full sequence will be performed
 - the ESM model to use for the calculations can be chosen among those that are available on Hugging Face Model Hub;
   `esm2_t33_650M_UR50D` offers the best expense-accuracy tradeoff[*](https://doi.org/10.1126/science.ade2574)
+- the more accurate `masked-marginals` scoring strategy considers sequence context during inferences, increasing the runtime significantly; if the wait is too long, you can tick the box off to speed the calculations, sacrificing accuracy
 - when running a deep mutational scan, it is recommended to use smaller models (8M, 35M, 150M parameters), since the runtime is significant, especially for longer sequences and the server might be overloaded;
   over 30 min might be necessary for calculating a 300-residue-long sequence with larger models
+  in general, accuracy is influenced more by the scoring strategy and less so by the model size, so it is suggested to reduce the latter first when optimising for runtime;
   the scoring strategy computational cost scales with the number of substitutions tested, while the model’s with the wild-type sequence length
 - it is possible to calculate the effect of multiple concurrent substitutions, but this has to be done manually, by changing the input sequence and running the calculation again