Spaces:

syedislamuddin
/

base_editors1

Runtime error

App Files Files Community

syedislamuddin commited on Nov 17, 2022

Commit

07b9363

1 Parent(s): b200c9d

Upload app.py

Browse files

Files changed (1) hide show

app.py +83 -42

app.py CHANGED Viewed

@@ -198,9 +198,9 @@ def Chopchop(method,select_method):
     CHOPCHOP accepts **input** in one of the following forms:
     -   Gene name
     -   Genomic coordinates
-    -   DNA sequence
         -   **In batch mode, we used** a text file containing chr:start-end per line for each snp. Ex: chr1:152220450-152220451".
-    Based on the input provided, chopchop retrieves sequence (corresponding to gene name/coordinates) and scan it for all potential target (off-target) sites (based on search requirement selected).
     **Each sgRNA** is then ranked according to:
     -   Number of off-targets in the genome
     -   Number of mismatches lie within the off-targets.
@@ -208,8 +208,10 @@ def Chopchop(method,select_method):
         -   GC-content
         -   Presence of a guanine (G) at position 20 in the sgRNA target site
         -   Any target sites with the same score are then sorted by their position in the gene (with preference to 5′ positions).
-    **Output:** A tab separated text file
-        - **Columns of interest**: Target sequence and Efficiency (**higher the better**)
     Please note that not all options have Efficiency defined [Ref](https://chopchop.cbu.uib.no/instructions)
         """
     )
@@ -611,22 +613,35 @@ def ecrisp(select_method):
     st.header("E-CRISP")
     expander = st.expander("Summary")
     #st.markdown("**Summary**")
-    expander.markdown("E-CRISP is used to design gRNA sequences **(supports 12 organisms)**. It can also be used to reevaluate CRISPR constructs for on- or off-target sites and targeted genomic loci. It identifies target sequences complementary to the gRNA ending in a 3ʹ protospacer-adjacent motif (PAM), **N(G or A)G** and uses a fast indexing approach to find binding sites and a binary interval tree for rapid annotation of putative gRNA target sites.")
     expander.markdown("**Off-target** effects and target-site homology are evaluated using Bowtie2 aligner. Designs are **shown** in the output if the number of **off-targets does not exceed a user-specified threshold**. **More than one** design targeting a desired locus are **ranked** according to on-target specificity and number of off-targets.")
     expander1 = st.expander("How it works")
-    expander1.markdown(' it identifies target sequences ending with a PAM motif 5′-NGG/NAG-3′ and uses them to propose guide RNAs. It also uses a fast indexing approach to locate binding sites and the alignment program Bowtie 2 to identify off-target effects. It outputs the successful designs, ranked according to target specificity and efficiency. It also assesses the genomic context (e.g. exons, transcripts, CpG islands) of putative designs and provides an option to re-evaluate given gRNAs for efficiency and specificity. ')
     expander1.write(
     """
-    - Input: Multiple lines provided in the Input fasta sequence edit box in the webapp **[here](http://www.e-crisp.org/E-CRISP/index.html)** in the following format
         - Line1: rs12726330
         - Line2: CGGGACATGGAAGAGGTCTGGACCAGGGTACTGGGAAGGCGCTCGGAGGA
         - Line3: rs76763715
         - Line4: CCAGCCGACCACATGGTACAGGAGGTTCTAGGGTAAGGACAAAGGCAAAG
         - and so on
     """
     )
-    expander1.markdown("- Output: A tab separated .tab file")
-    expander1.markdown("- **Columns of interest**: Efficiency Score (E Score, **Higher the better**) [Ref](https://www.nature.com/articles/nbt.3026) and Specificity Score (S Score, **Higher the better** (max = 100))")
     expander2 = st.expander("References")
     expander2.write("[E-CRISP Web App](http://www.e-crisp.org/E-CRISP/)")
@@ -634,42 +649,63 @@ def ecrisp(select_method):
     expander3 = st.expander("Tool Options: All you can do with this tool")
     expander3.write(
     """
-    - This tool offers single or paired sgRNA and:
-        - **Options for PAM:**
-            - **Relaxed**
-            - **Medium:**
-            - **Strict**
-        - **Options for Design:**
-            - knockdown.
-            - knockin.
-            - N/C terminal tagging.
-            - CRISPRi.
-            - CRISPRa.
-        - **Other filtering options.**
     """
     )
     expander4 = st.expander("Scoring")
-    expander4.markdown('E-CRISP utilises its own SAE (Specificity, Annotation, Efficacy) score to determine the quality of each sgRNA, while Rule Set 1 [24] and SSC [112] are also included in E-CRISP')
-    expander4.markdown('**E-CRISP utilises its own SAE (Specificity, Annotation, Efficacy) score to determine the quality of each sgRNA, while Rule Set 1 [24] and SSC [112] are also included in E-CRISP**')
-    expander4.markdown('on-target and off-target predictions, it utilises its own ‘SAE (Specificity, Annotation, Efficacy) Score’ to determine the quality of each gRNA, while Rule Set 1 ( predictive model for sgRNA activity by training a logistic regression classifier to discriminate the highest-activity) [Doench](https://www.nature.com/articles/nbt.3026) and Spacer Scoring for CRISPR (identified sequence features that contribute to sgRNA efficiency by calculating log odds ratio of nucleotide frequency between DNA sequences targeted by efficient and inefficient sgRNAs) [Xu](https://genome.cshlp.org/content/25/8/1147) are also included in its results.')
-    expander4.markdown('**Doench Score:** sgRNA score. A guide necessarily only has a subset of all the features, indicated via one-hot encoding as binary variables. Let the model weights for the features i for a particular guide sj be wij, the intercept int. Then the sgRNA score f (sj) is given via logistic regression as:')
-    latext = r'''
-    $$
-    f(s_j) = \frac{1}{1+exp(-g(s_j))} \\
-    g(s_j) = int + \sum_{i} w_{ij} \\
-    where f(s_j) \epsilon \ [0,1] \\
-    '''
-    expander4.markdown(latext1)
     expander4.markdown(
         """
-        Here, features used for prediction are:
-        -   Individual nucleotides and all pairs of adjacent nucleotides indexed by position in the 30 mer target site.
-        -   Count of Gs and Cs in the 20 nt of the sgRNA .
-        -   Two GC-count features for deviations below ten and above ten.
         """
     )
     st.markdown(tips,unsafe_allow_html=True)
@@ -871,7 +907,7 @@ st.sidebar.image("logo-card-white.png", use_column_width=True)
 #Calc = st.sidebar.radio('Selection Menu')
 Calc = st.sidebar.radio(
     "",
-    ('ReadME', 'Selection Menu'))
 #if Calc:
@@ -882,13 +918,13 @@ if Calc == 'ReadME':
     expander = st.expander("How to use this app")
     #st.header('How to use this app')
-    expander.markdown('Please note that all tools were run using Human Genome **(hg38)**. Each tool require **specific input format** (described for each tool selected from the sidebar when **Selection Menue is enabled**) and **output results** in different formats **(with different columns based on method selected as described under each tool)**. Some of these tools also allow selection of various **endonucleases and related options**, their **reulsts are provided as radio controls** in the sidebar of this app under each tool.')
     expander.markdown('**Requirements:** 1) Python3.4 or higher and 2) streamlit 1.13')
     expander.markdown('To start this app, **unzip** the base_editor_app.zip in a folder of your choice')
     expander.markdown('Open shell terminal and **cd to base_editor_app folder**')
     expander.markdown('Type: **streamlit run baserditorsV3.py**, It will launch baseeditor app in the default browser')
     expander.markdown('**By default** README radio button is enabled to describe general information about the App and How to use it.')
-    expander.markdown("- Please enable **Selection Menu** radio control in the sidebar **to enable variant, tool and endonuclease options**")
     expander.markdown("- Select Desired Variant from the dropdown list")
     expander.markdown("- Select a Tool")
     expander.markdown("- Select one of the options **(if available)**")
@@ -896,7 +932,11 @@ if Calc == 'ReadME':
     expander1 = st.expander('Introduction')
-    expander1.markdown('**TL;DR** This app **reviewes** popular single base quality estimators for a **[list](https://drive.google.com/file/d/1Sxb-Cc-epbs6vujQaX9wa5acqus0RW3q/view?usp=sharing) of rsIDs** per disease of interest based on CARD’s cross-NDD efforts. We filtered our candidate list of **base edit predictors** for those that are at least **semi-automated and reproducible** (no copy and pasting IDs or sequences one at a time).')
     expander1.markdown('Clustered Regularly Interspaced Short Palindromic Repeat CRISPR/CRISPR-associated (Cas) systems, such as **Cas9 (type II endonuclease which recognises the 5"'"-NGG-3"'" PAM)** and **Cas12a (type V endonuclease which recognises the 5"'"-TTTV-3"'" PAM)** (also called Cpf1), are the primary tools used for genome editing. CRISPR/Cas9 based gene editing uses sequence-specific nucleases (Cas9 etc) and a sgRNA for precise gene knock-out/in whereas catalytically inactive Cas9 (dCas9) provides gene expression regulation via activation/inhibition (CRISPRa/i) and Cas9 nickase (nCas9) + sgRNA, by incorporating deaminases, enables single base editing. Finally nCas9 + prime editing gRNA (pegRNA) enables editing of all 12 possible base edits')
     expander1.markdown('**A CRISPR/Cas9 sytem** requires a custom single guide RNA (sgRNA) that contains a crRNA (a 20 nt sequence homologous to the region of interest that direct Cas9 (or dCas9 or Cas9 nickase) nuclease to the region of interest) and a Cas9 nuclease-recruiting sequence (tracrRNA). An ideal gRNA should maximize on-target activity **(cleavage efficiency)** while also minimizing potential off-target effects **(specificity)**.')
     expander1.markdown('**Current sgRNA design tools** (including HDR based and deaminase based) fall under three major categories:')
@@ -915,7 +955,7 @@ if Calc == 'ReadME':
     expander1.markdown('In this app we also tested a **prime editor** and an **RNA editor for gene knockdown** for these targets.')
     expander2 = st.expander('How does CRISPR-Cas9 (and base editing) System works')
-    expander2.markdown('**CRISPR-Cas9** system consists of two key components (accomplishing three steps: Recognition, Cleavage, and Repair):')
     expander2.markdown("- **Recognition:** A single guide RNA (sgRNA which is composed of target-specific CRISPR RNA (crRNA) and an auxiliary trans-activating crRNA (trcrRNA) joined by linker loop) targeting Cas9 to a specific DNA locus")
     #expander2.markdown('- **Recognition:** A guide RNA (gRNA) that consists of a small piece of pre-designed RNA sequence (usually 20 bases complimentary to the target DNA sequence in the genome) and **guides** Cas9 to the right part of the genome.')
     expander2.markdown('- **Cleavage, and Repair**: A Cas9 enzyme (has six domains, REC I (responsible for binding guide RNA), REC II, Bridge Helix, PAM Interacting (confers PAM specificity and is responsible for initiating binding to target DNA), HNH and RuvC (each cut single-stranded DNA after 3rd base upstream of PAM)) that acts as a pair of ‘molecular scissors’ that **cut** the two strands of DNA at a specific location in the genome **so that bits of DNA can then be added or removed** using either non-homologous end joining **(NHEJ)** or homology-directed repair **(HDR)**.')
@@ -1092,6 +1132,7 @@ if Calc == 'ReadME':
     - [SNP_CRISPR](https://www.flyrnai.org/tools/snp_crispr/web/)
         - This tool offers guides for  NGG and NAG PAM sequences and are reporoted in this app:
         - NGG, NAG
     """
     )

     CHOPCHOP accepts **input** in one of the following forms:
     -   Gene name
     -   Genomic coordinates
         -   **In batch mode, we used** a text file containing chr:start-end per line for each snp. Ex: chr1:152220450-152220451".
+    -   DNA sequence
+    Based on the input provided, chopchop retrieves sequence (corresponding to gene name/coordinates) and scan it for all potential target (and off-target) sites (based on search requirement selected).
     **Each sgRNA** is then ranked according to:
     -   Number of off-targets in the genome
     -   Number of mismatches lie within the off-targets.
         -   GC-content
         -   Presence of a guanine (G) at position 20 in the sgRNA target site
         -   Any target sites with the same score are then sorted by their position in the gene (with preference to 5′ positions).
+    **Output:** A tab separated text file
+    - **Columns of interest**:
+        -   Target sequence
+        -   Efficiency (**higher the better**)
     Please note that not all options have Efficiency defined [Ref](https://chopchop.cbu.uib.no/instructions)
         """
     )
     st.header("E-CRISP")
     expander = st.expander("Summary")
     #st.markdown("**Summary**")
+    expander.markdown("E-CRISP is used to design gRNA sequences **(supports 12 organisms)** and can also reevaluate CRISPR constructs for on- or off-target sites and targeted genomic loci. It identifies target sequences complementary to the gRNA ending in a 3ʹ protospacer-adjacent motif (PAM), **N(G or A)G** and uses a fast indexing approach to find binding sites and a binary interval tree for rapid annotation of putative gRNA target sites.")
     expander.markdown("**Off-target** effects and target-site homology are evaluated using Bowtie2 aligner. Designs are **shown** in the output if the number of **off-targets does not exceed a user-specified threshold**. **More than one** design targeting a desired locus are **ranked** according to on-target specificity and number of off-targets.")
     expander1 = st.expander("How it works")
+    expander1.markdown(
+        """
+        -   E-CRISP identifies target sequences ending with a PAM motif 5′-NGG/NAG-3′ and uses them to propose guide RNAs.
+        -   It uses a fast indexing approach to locate binding sites and the alignment program Bowtie 2 to identify off-target effects.
+        -   Designed sgRNAs are assessesed based on genomic context (e.g. exons, transcripts, CpG islands) and ranked according to target specificity and efficiency.
+        """
+    )
     expander1.write(
     """
+    **Input:**
+    -   Multiple lines provided in the Input fasta sequence edit box in the webapp **[here](http://www.e-crisp.org/E-CRISP/index.html)** in the following format
         - Line1: rs12726330
         - Line2: CGGGACATGGAAGAGGTCTGGACCAGGGTACTGGGAAGGCGCTCGGAGGA
         - Line3: rs76763715
         - Line4: CCAGCCGACCACATGGTACAGGAGGTTCTAGGGTAAGGACAAAGGCAAAG
         - and so on
+    **Output:**
+    -   A tab separated .tab file
+        - **Columns of interest**:
+            -   sgRNA Length
+            -   Efficiency Score (E Score, **Higher the better**) [Ref](https://www.nature.com/articles/nbt.3026)
+            -   Specificity Score (S Score, **Higher the better** (max = 100))
+            -   Doench and Xu Score
+            -   Nucleotide sequence (A, C, G, T) compositions in %
     """
     )
     expander2 = st.expander("References")
     expander2.write("[E-CRISP Web App](http://www.e-crisp.org/E-CRISP/)")
     expander3 = st.expander("Tool Options: All you can do with this tool")
     expander3.write(
     """
+    This tool offers single or paired sgRNA and:
+    - **Options for PAM:**
+        - **Relaxed**
+        - **Medium:**
+        - **Strict**
+    - **Options for Design:**
+        - knockdown.
+        - knockin.
+        - N/C terminal tagging.
+        - CRISPRi.
+        - CRISPRa.
+    - **Other filtering options.**
+        -   gRNA length, allowed % of G, C, A and T, 3' and 5' flanking sequence length, off-targets evaluation etc
     """
     )
     expander4 = st.expander("Scoring")
     expander4.markdown(
         """
+        E-CRISP utilises its own **SAE (Specificity, Annotation, Efficacy) score** to determine the quality of each sgRNA in addition to Rule Set 1 [Doench et al](https://www.nature.com/articles/nbt.3026) and [Xu et al](https://genome.cshlp.org/content/25/8/1147). Please see Scoring and Quality Matrices in README tab of this app for details.
+        -   Specificity Score (S-score):
+            -   Start with 100.
+            -   For every off-target, substract (20-mismatches)/iteration.
+        -   Annotation Score (A-score):
+            -   Start with zero
+            -   For every hit exon add 5/exon count
+            -   For every hit CpG Island subtract 1
+            -   For every start codon hit add 1
+            -   For every stop codon hit add 1
+            -   For every CDS hit add 5/CDS count
+            -   For every gene hit add 1
+        -   Efficacy Score (E-score):
+            -   Add 1 if last 6 bp have a CG content higher then 70 %
+            -   Subtract 1 if the entire sequence has GC content > 80 %
+            -   Add 1 if sequence is preceded by a G
+            -   Add 1 if there are GG in front of the target sequence (opposite the PAM)
+            -   Add micro-homology score (is higher when sequence tends to give out of frame deletions)
         """
     )
+    #expander4.markdown('on-target and off-target predictions, it utilises its own ‘SAE (Specificity, Annotation, Efficacy) Score’ to determine the quality of each gRNA, while Rule Set 1 ( predictive model for sgRNA activity by training a logistic regression classifier to discriminate the highest-activity) [Doench](https://www.nature.com/articles/nbt.3026) and Spacer Scoring for CRISPR (identified sequence features that contribute to sgRNA efficiency by calculating log odds ratio of nucleotide frequency between DNA sequences targeted by efficient and inefficient sgRNAs) [Xu](https://genome.cshlp.org/content/25/8/1147) are also included in its results.')
+    #expander4.markdown('**Doench Score:** sgRNA score. A guide necessarily only has a subset of all the features, indicated via one-hot encoding as binary variables. Let the model weights for the features i for a particular guide sj be wij, the intercept int. Then the sgRNA score f (sj) is given via logistic regression as:')
+    # latext = r'''
+    # $$
+    # f(s_j) = \frac{1}{1+exp(-g(s_j))} \\
+    # g(s_j) = int + \sum_{i} w_{ij} \\
+    # where f(s_j) \epsilon \ [0,1] \\
+    # '''
+    # expander4.markdown(latext1)
+    # expander4.markdown(
+    #     """
+    #     Here, features used for prediction are:
+    #     -   Individual nucleotides and all pairs of adjacent nucleotides indexed by position in the 30 mer target site.
+    #     -   Count of Gs and Cs in the 20 nt of the sgRNA .
+    #     -   Two GC-count features for deviations below ten and above ten.
+    #     """
+    # )
     st.markdown(tips,unsafe_allow_html=True)
 #Calc = st.sidebar.radio('Selection Menu')
 Calc = st.sidebar.radio(
     "",
+    ('ReadME', 'Tools Selection Menu'))
 #if Calc:
     expander = st.expander("How to use this app")
     #st.header('How to use this app')
+    expander.markdown('Please note that all tools were run using Human Genome **(hg38)**. Each tool require **specific input format** (described for each tool selected from the sidebar when **Tools Selection Menue is enabled**) and **output results** in different formats **(with different columns based on method selected as described under each tool)**. Some of these tools also allow selection of various **endonucleases and related options**, their **reulsts are provided as radio controls** in the sidebar of this app under each tool.')
     expander.markdown('**Requirements:** 1) Python3.4 or higher and 2) streamlit 1.13')
     expander.markdown('To start this app, **unzip** the base_editor_app.zip in a folder of your choice')
     expander.markdown('Open shell terminal and **cd to base_editor_app folder**')
     expander.markdown('Type: **streamlit run baserditorsV3.py**, It will launch baseeditor app in the default browser')
     expander.markdown('**By default** README radio button is enabled to describe general information about the App and How to use it.')
+    expander.markdown("- Please enable **Tools Selection Menu** radio control in the sidebar **to enable variant, tool and endonuclease options**")
     expander.markdown("- Select Desired Variant from the dropdown list")
     expander.markdown("- Select a Tool")
     expander.markdown("- Select one of the options **(if available)**")
     expander1 = st.expander('Introduction')
+    expander1.markdown(
+        """**TLDR**
+            This app **reviewes** popular single base quality estimators for a **[list](https://drive.google.com/file/d/1Sxb-Cc-epbs6vujQaX9wa5acqus0RW3q/view?usp=sharing) of rsIDs** per disease of interest based on CARD’s cross-NDD efforts. We filtered our candidate list of **base edit predictors** for those that are at least **semi-automated and reproducible** (no copy and pasting IDs or sequences one at a time).
+            """
+            )
     expander1.markdown('Clustered Regularly Interspaced Short Palindromic Repeat CRISPR/CRISPR-associated (Cas) systems, such as **Cas9 (type II endonuclease which recognises the 5"'"-NGG-3"'" PAM)** and **Cas12a (type V endonuclease which recognises the 5"'"-TTTV-3"'" PAM)** (also called Cpf1), are the primary tools used for genome editing. CRISPR/Cas9 based gene editing uses sequence-specific nucleases (Cas9 etc) and a sgRNA for precise gene knock-out/in whereas catalytically inactive Cas9 (dCas9) provides gene expression regulation via activation/inhibition (CRISPRa/i) and Cas9 nickase (nCas9) + sgRNA, by incorporating deaminases, enables single base editing. Finally nCas9 + prime editing gRNA (pegRNA) enables editing of all 12 possible base edits')
     expander1.markdown('**A CRISPR/Cas9 sytem** requires a custom single guide RNA (sgRNA) that contains a crRNA (a 20 nt sequence homologous to the region of interest that direct Cas9 (or dCas9 or Cas9 nickase) nuclease to the region of interest) and a Cas9 nuclease-recruiting sequence (tracrRNA). An ideal gRNA should maximize on-target activity **(cleavage efficiency)** while also minimizing potential off-target effects **(specificity)**.')
     expander1.markdown('**Current sgRNA design tools** (including HDR based and deaminase based) fall under three major categories:')
     expander1.markdown('In this app we also tested a **prime editor** and an **RNA editor for gene knockdown** for these targets.')
     expander2 = st.expander('How does CRISPR-Cas9 (and base editing) System works')
+    expander2.markdown('**CRISPR-Cas9** system consists of **two** key components (accomplishing three steps: Recognition, Cleavage, and Repair):')
     expander2.markdown("- **Recognition:** A single guide RNA (sgRNA which is composed of target-specific CRISPR RNA (crRNA) and an auxiliary trans-activating crRNA (trcrRNA) joined by linker loop) targeting Cas9 to a specific DNA locus")
     #expander2.markdown('- **Recognition:** A guide RNA (gRNA) that consists of a small piece of pre-designed RNA sequence (usually 20 bases complimentary to the target DNA sequence in the genome) and **guides** Cas9 to the right part of the genome.')
     expander2.markdown('- **Cleavage, and Repair**: A Cas9 enzyme (has six domains, REC I (responsible for binding guide RNA), REC II, Bridge Helix, PAM Interacting (confers PAM specificity and is responsible for initiating binding to target DNA), HNH and RuvC (each cut single-stranded DNA after 3rd base upstream of PAM)) that acts as a pair of ‘molecular scissors’ that **cut** the two strands of DNA at a specific location in the genome **so that bits of DNA can then be added or removed** using either non-homologous end joining **(NHEJ)** or homology-directed repair **(HDR)**.')
     - [SNP_CRISPR](https://www.flyrnai.org/tools/snp_crispr/web/)
         - This tool offers guides for  NGG and NAG PAM sequences and are reporoted in this app:
         - NGG, NAG
+    **For more details on each tool, Please select select it from the sidebar menu under Tools Selection Menu**
     """
     )