syedislamuddin commited on
Commit
07b9363
·
1 Parent(s): b200c9d

Upload app.py

Browse files
Files changed (1) hide show
  1. app.py +83 -42
app.py CHANGED
@@ -198,9 +198,9 @@ def Chopchop(method,select_method):
198
  CHOPCHOP accepts **input** in one of the following forms:
199
  - Gene name
200
  - Genomic coordinates
201
- - DNA sequence
202
  - **In batch mode, we used** a text file containing chr:start-end per line for each snp. Ex: chr1:152220450-152220451".
203
- Based on the input provided, chopchop retrieves sequence (corresponding to gene name/coordinates) and scan it for all potential target (off-target) sites (based on search requirement selected).
 
204
  **Each sgRNA** is then ranked according to:
205
  - Number of off-targets in the genome
206
  - Number of mismatches lie within the off-targets.
@@ -208,8 +208,10 @@ def Chopchop(method,select_method):
208
  - GC-content
209
  - Presence of a guanine (G) at position 20 in the sgRNA target site
210
  - Any target sites with the same score are then sorted by their position in the gene (with preference to 5′ positions).
211
- **Output:** A tab separated text file
212
- - **Columns of interest**: Target sequence and Efficiency (**higher the better**)
 
 
213
  Please note that not all options have Efficiency defined [Ref](https://chopchop.cbu.uib.no/instructions)
214
  """
215
  )
@@ -611,22 +613,35 @@ def ecrisp(select_method):
611
  st.header("E-CRISP")
612
  expander = st.expander("Summary")
613
  #st.markdown("**Summary**")
614
- expander.markdown("E-CRISP is used to design gRNA sequences **(supports 12 organisms)**. It can also be used to reevaluate CRISPR constructs for on- or off-target sites and targeted genomic loci. It identifies target sequences complementary to the gRNA ending in a 3ʹ protospacer-adjacent motif (PAM), **N(G or A)G** and uses a fast indexing approach to find binding sites and a binary interval tree for rapid annotation of putative gRNA target sites.")
615
  expander.markdown("**Off-target** effects and target-site homology are evaluated using Bowtie2 aligner. Designs are **shown** in the output if the number of **off-targets does not exceed a user-specified threshold**. **More than one** design targeting a desired locus are **ranked** according to on-target specificity and number of off-targets.")
616
  expander1 = st.expander("How it works")
617
- expander1.markdown(' it identifies target sequences ending with a PAM motif 5′-NGG/NAG-3′ and uses them to propose guide RNAs. It also uses a fast indexing approach to locate binding sites and the alignment program Bowtie 2 to identify off-target effects. It outputs the successful designs, ranked according to target specificity and efficiency. It also assesses the genomic context (e.g. exons, transcripts, CpG islands) of putative designs and provides an option to re-evaluate given gRNAs for efficiency and specificity. ')
 
 
 
 
 
 
618
  expander1.write(
619
  """
620
- - Input: Multiple lines provided in the Input fasta sequence edit box in the webapp **[here](http://www.e-crisp.org/E-CRISP/index.html)** in the following format
 
621
  - Line1: rs12726330
622
  - Line2: CGGGACATGGAAGAGGTCTGGACCAGGGTACTGGGAAGGCGCTCGGAGGA
623
  - Line3: rs76763715
624
  - Line4: CCAGCCGACCACATGGTACAGGAGGTTCTAGGGTAAGGACAAAGGCAAAG
625
  - and so on
 
 
 
 
 
 
 
 
626
  """
627
  )
628
- expander1.markdown("- Output: A tab separated .tab file")
629
- expander1.markdown("- **Columns of interest**: Efficiency Score (E Score, **Higher the better**) [Ref](https://www.nature.com/articles/nbt.3026) and Specificity Score (S Score, **Higher the better** (max = 100))")
630
 
631
  expander2 = st.expander("References")
632
  expander2.write("[E-CRISP Web App](http://www.e-crisp.org/E-CRISP/)")
@@ -634,42 +649,63 @@ def ecrisp(select_method):
634
  expander3 = st.expander("Tool Options: All you can do with this tool")
635
  expander3.write(
636
  """
637
- - This tool offers single or paired sgRNA and:
638
- - **Options for PAM:**
639
- - **Relaxed**
640
- - **Medium:**
641
- - **Strict**
642
- - **Options for Design:**
643
- - knockdown.
644
- - knockin.
645
- - N/C terminal tagging.
646
- - CRISPRi.
647
- - CRISPRa.
648
- - **Other filtering options.**
 
649
  """
650
  )
651
 
652
  expander4 = st.expander("Scoring")
653
- expander4.markdown('E-CRISP utilises its own SAE (Specificity, Annotation, Efficacy) score to determine the quality of each sgRNA, while Rule Set 1 [24] and SSC [112] are also included in E-CRISP')
654
- expander4.markdown('**E-CRISP utilises its own SAE (Specificity, Annotation, Efficacy) score to determine the quality of each sgRNA, while Rule Set 1 [24] and SSC [112] are also included in E-CRISP**')
655
- expander4.markdown('on-target and off-target predictions, it utilises its own ‘SAE (Specificity, Annotation, Efficacy) Score’ to determine the quality of each gRNA, while Rule Set 1 ( predictive model for sgRNA activity by training a logistic regression classifier to discriminate the highest-activity) [Doench](https://www.nature.com/articles/nbt.3026) and Spacer Scoring for CRISPR (identified sequence features that contribute to sgRNA efficiency by calculating log odds ratio of nucleotide frequency between DNA sequences targeted by efficient and inefficient sgRNAs) [Xu](https://genome.cshlp.org/content/25/8/1147) are also included in its results.')
656
- expander4.markdown('**Doench Score:** sgRNA score. A guide necessarily only has a subset of all the features, indicated via one-hot encoding as binary variables. Let the model weights for the features i for a particular guide sj be wij, the intercept int. Then the sgRNA score f (sj) is given via logistic regression as:')
657
-
658
- latext = r'''
659
- $$
660
- f(s_j) = \frac{1}{1+exp(-g(s_j))} \\
661
- g(s_j) = int + \sum_{i} w_{ij} \\
662
- where f(s_j) \epsilon \ [0,1] \\
663
- '''
664
- expander4.markdown(latext1)
665
  expander4.markdown(
666
  """
667
- Here, features used for prediction are:
668
- - Individual nucleotides and all pairs of adjacent nucleotides indexed by position in the 30 mer target site.
669
- - Count of Gs and Cs in the 20 nt of the sgRNA .
670
- - Two GC-count features for deviations below ten and above ten.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
671
  """
672
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
673
 
674
 
675
  st.markdown(tips,unsafe_allow_html=True)
@@ -871,7 +907,7 @@ st.sidebar.image("logo-card-white.png", use_column_width=True)
871
  #Calc = st.sidebar.radio('Selection Menu')
872
  Calc = st.sidebar.radio(
873
  "",
874
- ('ReadME', 'Selection Menu'))
875
 
876
  #if Calc:
877
 
@@ -882,13 +918,13 @@ if Calc == 'ReadME':
882
 
883
  expander = st.expander("How to use this app")
884
  #st.header('How to use this app')
885
- expander.markdown('Please note that all tools were run using Human Genome **(hg38)**. Each tool require **specific input format** (described for each tool selected from the sidebar when **Selection Menue is enabled**) and **output results** in different formats **(with different columns based on method selected as described under each tool)**. Some of these tools also allow selection of various **endonucleases and related options**, their **reulsts are provided as radio controls** in the sidebar of this app under each tool.')
886
  expander.markdown('**Requirements:** 1) Python3.4 or higher and 2) streamlit 1.13')
887
  expander.markdown('To start this app, **unzip** the base_editor_app.zip in a folder of your choice')
888
  expander.markdown('Open shell terminal and **cd to base_editor_app folder**')
889
  expander.markdown('Type: **streamlit run baserditorsV3.py**, It will launch baseeditor app in the default browser')
890
  expander.markdown('**By default** README radio button is enabled to describe general information about the App and How to use it.')
891
- expander.markdown("- Please enable **Selection Menu** radio control in the sidebar **to enable variant, tool and endonuclease options**")
892
  expander.markdown("- Select Desired Variant from the dropdown list")
893
  expander.markdown("- Select a Tool")
894
  expander.markdown("- Select one of the options **(if available)**")
@@ -896,7 +932,11 @@ if Calc == 'ReadME':
896
 
897
  expander1 = st.expander('Introduction')
898
 
899
- expander1.markdown('**TL;DR** This app **reviewes** popular single base quality estimators for a **[list](https://drive.google.com/file/d/1Sxb-Cc-epbs6vujQaX9wa5acqus0RW3q/view?usp=sharing) of rsIDs** per disease of interest based on CARD’s cross-NDD efforts. We filtered our candidate list of **base edit predictors** for those that are at least **semi-automated and reproducible** (no copy and pasting IDs or sequences one at a time).')
 
 
 
 
900
  expander1.markdown('Clustered Regularly Interspaced Short Palindromic Repeat CRISPR/CRISPR-associated (Cas) systems, such as **Cas9 (type II endonuclease which recognises the 5"'"-NGG-3"'" PAM)** and **Cas12a (type V endonuclease which recognises the 5"'"-TTTV-3"'" PAM)** (also called Cpf1), are the primary tools used for genome editing. CRISPR/Cas9 based gene editing uses sequence-specific nucleases (Cas9 etc) and a sgRNA for precise gene knock-out/in whereas catalytically inactive Cas9 (dCas9) provides gene expression regulation via activation/inhibition (CRISPRa/i) and Cas9 nickase (nCas9) + sgRNA, by incorporating deaminases, enables single base editing. Finally nCas9 + prime editing gRNA (pegRNA) enables editing of all 12 possible base edits')
901
  expander1.markdown('**A CRISPR/Cas9 sytem** requires a custom single guide RNA (sgRNA) that contains a crRNA (a 20 nt sequence homologous to the region of interest that direct Cas9 (or dCas9 or Cas9 nickase) nuclease to the region of interest) and a Cas9 nuclease-recruiting sequence (tracrRNA). An ideal gRNA should maximize on-target activity **(cleavage efficiency)** while also minimizing potential off-target effects **(specificity)**.')
902
  expander1.markdown('**Current sgRNA design tools** (including HDR based and deaminase based) fall under three major categories:')
@@ -915,7 +955,7 @@ if Calc == 'ReadME':
915
  expander1.markdown('In this app we also tested a **prime editor** and an **RNA editor for gene knockdown** for these targets.')
916
 
917
  expander2 = st.expander('How does CRISPR-Cas9 (and base editing) System works')
918
- expander2.markdown('**CRISPR-Cas9** system consists of two key components (accomplishing three steps: Recognition, Cleavage, and Repair):')
919
  expander2.markdown("- **Recognition:** A single guide RNA (sgRNA which is composed of target-specific CRISPR RNA (crRNA) and an auxiliary trans-activating crRNA (trcrRNA) joined by linker loop) targeting Cas9 to a specific DNA locus")
920
  #expander2.markdown('- **Recognition:** A guide RNA (gRNA) that consists of a small piece of pre-designed RNA sequence (usually 20 bases complimentary to the target DNA sequence in the genome) and **guides** Cas9 to the right part of the genome.')
921
  expander2.markdown('- **Cleavage, and Repair**: A Cas9 enzyme (has six domains, REC I (responsible for binding guide RNA), REC II, Bridge Helix, PAM Interacting (confers PAM specificity and is responsible for initiating binding to target DNA), HNH and RuvC (each cut single-stranded DNA after 3rd base upstream of PAM)) that acts as a pair of ‘molecular scissors’ that **cut** the two strands of DNA at a specific location in the genome **so that bits of DNA can then be added or removed** using either non-homologous end joining **(NHEJ)** or homology-directed repair **(HDR)**.')
@@ -1092,6 +1132,7 @@ if Calc == 'ReadME':
1092
  - [SNP_CRISPR](https://www.flyrnai.org/tools/snp_crispr/web/)
1093
  - This tool offers guides for NGG and NAG PAM sequences and are reporoted in this app:
1094
  - NGG, NAG
 
1095
  """
1096
  )
1097
 
 
198
  CHOPCHOP accepts **input** in one of the following forms:
199
  - Gene name
200
  - Genomic coordinates
 
201
  - **In batch mode, we used** a text file containing chr:start-end per line for each snp. Ex: chr1:152220450-152220451".
202
+ - DNA sequence
203
+ Based on the input provided, chopchop retrieves sequence (corresponding to gene name/coordinates) and scan it for all potential target (and off-target) sites (based on search requirement selected).
204
  **Each sgRNA** is then ranked according to:
205
  - Number of off-targets in the genome
206
  - Number of mismatches lie within the off-targets.
 
208
  - GC-content
209
  - Presence of a guanine (G) at position 20 in the sgRNA target site
210
  - Any target sites with the same score are then sorted by their position in the gene (with preference to 5′ positions).
211
+ **Output:** A tab separated text file
212
+ - **Columns of interest**:
213
+ - Target sequence
214
+ - Efficiency (**higher the better**)
215
  Please note that not all options have Efficiency defined [Ref](https://chopchop.cbu.uib.no/instructions)
216
  """
217
  )
 
613
  st.header("E-CRISP")
614
  expander = st.expander("Summary")
615
  #st.markdown("**Summary**")
616
+ expander.markdown("E-CRISP is used to design gRNA sequences **(supports 12 organisms)** and can also reevaluate CRISPR constructs for on- or off-target sites and targeted genomic loci. It identifies target sequences complementary to the gRNA ending in a 3ʹ protospacer-adjacent motif (PAM), **N(G or A)G** and uses a fast indexing approach to find binding sites and a binary interval tree for rapid annotation of putative gRNA target sites.")
617
  expander.markdown("**Off-target** effects and target-site homology are evaluated using Bowtie2 aligner. Designs are **shown** in the output if the number of **off-targets does not exceed a user-specified threshold**. **More than one** design targeting a desired locus are **ranked** according to on-target specificity and number of off-targets.")
618
  expander1 = st.expander("How it works")
619
+ expander1.markdown(
620
+ """
621
+ - E-CRISP identifies target sequences ending with a PAM motif 5′-NGG/NAG-3′ and uses them to propose guide RNAs.
622
+ - It uses a fast indexing approach to locate binding sites and the alignment program Bowtie 2 to identify off-target effects.
623
+ - Designed sgRNAs are assessesed based on genomic context (e.g. exons, transcripts, CpG islands) and ranked according to target specificity and efficiency.
624
+ """
625
+ )
626
  expander1.write(
627
  """
628
+ **Input:**
629
+ - Multiple lines provided in the Input fasta sequence edit box in the webapp **[here](http://www.e-crisp.org/E-CRISP/index.html)** in the following format
630
  - Line1: rs12726330
631
  - Line2: CGGGACATGGAAGAGGTCTGGACCAGGGTACTGGGAAGGCGCTCGGAGGA
632
  - Line3: rs76763715
633
  - Line4: CCAGCCGACCACATGGTACAGGAGGTTCTAGGGTAAGGACAAAGGCAAAG
634
  - and so on
635
+ **Output:**
636
+ - A tab separated .tab file
637
+ - **Columns of interest**:
638
+ - sgRNA Length
639
+ - Efficiency Score (E Score, **Higher the better**) [Ref](https://www.nature.com/articles/nbt.3026)
640
+ - Specificity Score (S Score, **Higher the better** (max = 100))
641
+ - Doench and Xu Score
642
+ - Nucleotide sequence (A, C, G, T) compositions in %
643
  """
644
  )
 
 
645
 
646
  expander2 = st.expander("References")
647
  expander2.write("[E-CRISP Web App](http://www.e-crisp.org/E-CRISP/)")
 
649
  expander3 = st.expander("Tool Options: All you can do with this tool")
650
  expander3.write(
651
  """
652
+ This tool offers single or paired sgRNA and:
653
+ - **Options for PAM:**
654
+ - **Relaxed**
655
+ - **Medium:**
656
+ - **Strict**
657
+ - **Options for Design:**
658
+ - knockdown.
659
+ - knockin.
660
+ - N/C terminal tagging.
661
+ - CRISPRi.
662
+ - CRISPRa.
663
+ - **Other filtering options.**
664
+ - gRNA length, allowed % of G, C, A and T, 3' and 5' flanking sequence length, off-targets evaluation etc
665
  """
666
  )
667
 
668
  expander4 = st.expander("Scoring")
 
 
 
 
 
 
 
 
 
 
 
 
669
  expander4.markdown(
670
  """
671
+ E-CRISP utilises its own **SAE (Specificity, Annotation, Efficacy) score** to determine the quality of each sgRNA in addition to Rule Set 1 [Doench et al](https://www.nature.com/articles/nbt.3026) and [Xu et al](https://genome.cshlp.org/content/25/8/1147). Please see Scoring and Quality Matrices in README tab of this app for details.
672
+ - Specificity Score (S-score):
673
+ - Start with 100.
674
+ - For every off-target, substract (20-mismatches)/iteration.
675
+ - Annotation Score (A-score):
676
+ - Start with zero
677
+ - For every hit exon add 5/exon count
678
+ - For every hit CpG Island subtract 1
679
+ - For every start codon hit add 1
680
+ - For every stop codon hit add 1
681
+ - For every CDS hit add 5/CDS count
682
+ - For every gene hit add 1
683
+ - Efficacy Score (E-score):
684
+ - Add 1 if last 6 bp have a CG content higher then 70 %
685
+ - Subtract 1 if the entire sequence has GC content > 80 %
686
+ - Add 1 if sequence is preceded by a G
687
+ - Add 1 if there are GG in front of the target sequence (opposite the PAM)
688
+ - Add micro-homology score (is higher when sequence tends to give out of frame deletions)
689
  """
690
  )
691
+ #expander4.markdown('on-target and off-target predictions, it utilises its own ‘SAE (Specificity, Annotation, Efficacy) Score’ to determine the quality of each gRNA, while Rule Set 1 ( predictive model for sgRNA activity by training a logistic regression classifier to discriminate the highest-activity) [Doench](https://www.nature.com/articles/nbt.3026) and Spacer Scoring for CRISPR (identified sequence features that contribute to sgRNA efficiency by calculating log odds ratio of nucleotide frequency between DNA sequences targeted by efficient and inefficient sgRNAs) [Xu](https://genome.cshlp.org/content/25/8/1147) are also included in its results.')
692
+ #expander4.markdown('**Doench Score:** sgRNA score. A guide necessarily only has a subset of all the features, indicated via one-hot encoding as binary variables. Let the model weights for the features i for a particular guide sj be wij, the intercept int. Then the sgRNA score f (sj) is given via logistic regression as:')
693
+
694
+ # latext = r'''
695
+ # $$
696
+ # f(s_j) = \frac{1}{1+exp(-g(s_j))} \\
697
+ # g(s_j) = int + \sum_{i} w_{ij} \\
698
+ # where f(s_j) \epsilon \ [0,1] \\
699
+ # '''
700
+ # expander4.markdown(latext1)
701
+ # expander4.markdown(
702
+ # """
703
+ # Here, features used for prediction are:
704
+ # - Individual nucleotides and all pairs of adjacent nucleotides indexed by position in the 30 mer target site.
705
+ # - Count of Gs and Cs in the 20 nt of the sgRNA .
706
+ # - Two GC-count features for deviations below ten and above ten.
707
+ # """
708
+ # )
709
 
710
 
711
  st.markdown(tips,unsafe_allow_html=True)
 
907
  #Calc = st.sidebar.radio('Selection Menu')
908
  Calc = st.sidebar.radio(
909
  "",
910
+ ('ReadME', 'Tools Selection Menu'))
911
 
912
  #if Calc:
913
 
 
918
 
919
  expander = st.expander("How to use this app")
920
  #st.header('How to use this app')
921
+ expander.markdown('Please note that all tools were run using Human Genome **(hg38)**. Each tool require **specific input format** (described for each tool selected from the sidebar when **Tools Selection Menue is enabled**) and **output results** in different formats **(with different columns based on method selected as described under each tool)**. Some of these tools also allow selection of various **endonucleases and related options**, their **reulsts are provided as radio controls** in the sidebar of this app under each tool.')
922
  expander.markdown('**Requirements:** 1) Python3.4 or higher and 2) streamlit 1.13')
923
  expander.markdown('To start this app, **unzip** the base_editor_app.zip in a folder of your choice')
924
  expander.markdown('Open shell terminal and **cd to base_editor_app folder**')
925
  expander.markdown('Type: **streamlit run baserditorsV3.py**, It will launch baseeditor app in the default browser')
926
  expander.markdown('**By default** README radio button is enabled to describe general information about the App and How to use it.')
927
+ expander.markdown("- Please enable **Tools Selection Menu** radio control in the sidebar **to enable variant, tool and endonuclease options**")
928
  expander.markdown("- Select Desired Variant from the dropdown list")
929
  expander.markdown("- Select a Tool")
930
  expander.markdown("- Select one of the options **(if available)**")
 
932
 
933
  expander1 = st.expander('Introduction')
934
 
935
+ expander1.markdown(
936
+ """**TLDR**
937
+ This app **reviewes** popular single base quality estimators for a **[list](https://drive.google.com/file/d/1Sxb-Cc-epbs6vujQaX9wa5acqus0RW3q/view?usp=sharing) of rsIDs** per disease of interest based on CARD’s cross-NDD efforts. We filtered our candidate list of **base edit predictors** for those that are at least **semi-automated and reproducible** (no copy and pasting IDs or sequences one at a time).
938
+ """
939
+ )
940
  expander1.markdown('Clustered Regularly Interspaced Short Palindromic Repeat CRISPR/CRISPR-associated (Cas) systems, such as **Cas9 (type II endonuclease which recognises the 5"'"-NGG-3"'" PAM)** and **Cas12a (type V endonuclease which recognises the 5"'"-TTTV-3"'" PAM)** (also called Cpf1), are the primary tools used for genome editing. CRISPR/Cas9 based gene editing uses sequence-specific nucleases (Cas9 etc) and a sgRNA for precise gene knock-out/in whereas catalytically inactive Cas9 (dCas9) provides gene expression regulation via activation/inhibition (CRISPRa/i) and Cas9 nickase (nCas9) + sgRNA, by incorporating deaminases, enables single base editing. Finally nCas9 + prime editing gRNA (pegRNA) enables editing of all 12 possible base edits')
941
  expander1.markdown('**A CRISPR/Cas9 sytem** requires a custom single guide RNA (sgRNA) that contains a crRNA (a 20 nt sequence homologous to the region of interest that direct Cas9 (or dCas9 or Cas9 nickase) nuclease to the region of interest) and a Cas9 nuclease-recruiting sequence (tracrRNA). An ideal gRNA should maximize on-target activity **(cleavage efficiency)** while also minimizing potential off-target effects **(specificity)**.')
942
  expander1.markdown('**Current sgRNA design tools** (including HDR based and deaminase based) fall under three major categories:')
 
955
  expander1.markdown('In this app we also tested a **prime editor** and an **RNA editor for gene knockdown** for these targets.')
956
 
957
  expander2 = st.expander('How does CRISPR-Cas9 (and base editing) System works')
958
+ expander2.markdown('**CRISPR-Cas9** system consists of **two** key components (accomplishing three steps: Recognition, Cleavage, and Repair):')
959
  expander2.markdown("- **Recognition:** A single guide RNA (sgRNA which is composed of target-specific CRISPR RNA (crRNA) and an auxiliary trans-activating crRNA (trcrRNA) joined by linker loop) targeting Cas9 to a specific DNA locus")
960
  #expander2.markdown('- **Recognition:** A guide RNA (gRNA) that consists of a small piece of pre-designed RNA sequence (usually 20 bases complimentary to the target DNA sequence in the genome) and **guides** Cas9 to the right part of the genome.')
961
  expander2.markdown('- **Cleavage, and Repair**: A Cas9 enzyme (has six domains, REC I (responsible for binding guide RNA), REC II, Bridge Helix, PAM Interacting (confers PAM specificity and is responsible for initiating binding to target DNA), HNH and RuvC (each cut single-stranded DNA after 3rd base upstream of PAM)) that acts as a pair of ‘molecular scissors’ that **cut** the two strands of DNA at a specific location in the genome **so that bits of DNA can then be added or removed** using either non-homologous end joining **(NHEJ)** or homology-directed repair **(HDR)**.')
 
1132
  - [SNP_CRISPR](https://www.flyrnai.org/tools/snp_crispr/web/)
1133
  - This tool offers guides for NGG and NAG PAM sequences and are reporoted in this app:
1134
  - NGG, NAG
1135
+ **For more details on each tool, Please select select it from the sidebar menu under Tools Selection Menu**
1136
  """
1137
  )
1138