Spaces:
Running
Running
Erva Ulusoy
commited on
Commit
•
ecb9c6d
1
Parent(s):
6c337e0
about and user guide pages
Browse files- pages/About.py +42 -0
- pages/User_Guide.py +89 -0
pages/About.py
ADDED
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
|
4 |
+
st.markdown('''
|
5 |
+
# Mutual Annotation-Based Prediction of Protein Domain Functions with Domain2GO
|
6 |
+
''')
|
7 |
+
|
8 |
+
st.markdown(
|
9 |
+
"""
|
10 |
+
|
11 |
+
[![biorxiv](https://img.shields.io/badge/bioRxiv-2022.11.03.514980-b31b1b.svg)](https://www.biorxiv.org/content/10.1101/2022.11.03.514980v1) [![github-repository](https://img.shields.io/badge/GitHub-black?logo=github)](https://github.com/HUBioDataLab/Domain2GO)
|
12 |
+
|
13 |
+
""")
|
14 |
+
|
15 |
+
st.markdown('<p style="font-size:18px; font-weight:bold">Developers: Erva Ulusoy & Tunca Dogan</p>', unsafe_allow_html=True)
|
16 |
+
|
17 |
+
st.markdown(
|
18 |
+
'<p style="font-size:25px; font-weight:bold">How it works</p>', unsafe_allow_html=True)
|
19 |
+
|
20 |
+
st.markdown(
|
21 |
+
"""
|
22 |
+
This tool predicts functions of queried proteins by propagating previously generated domain-function associations (Domain2GO mapping set).
|
23 |
+
|
24 |
+
Domain2GO is developed with the aim of identifying unknown protein functions by associating domains with Gene Ontology terms, thus defining the problem as domain function prediction. Domain2GO mappings are generated using the existing domain and GO annotation data. In order to obtain highly reliable associations, we employed statistical resampling and analyzed the co-occurrence patterns of domains and GO terms on the same proteins.
|
25 |
+
|
26 |
+
We applied Domain2GO to predict protein functions, by propagating domain-associated GO terms to proteins that are annotated with those domains. For protein function prediction performance evaluation and comparison against other methods, we employed CAFA3 challenge datasets. The results demonstrated the potential of Domain2GO, especially when predicting molecular function and biological process terms, as it performed better than baseline predictors and curated associations (Fmax = 0.48 and 0.36 for MFO and BPO, respectively).
|
27 |
+
|
28 |
+
For more information on the construction of Domain2GO mappings, statistical analysis of mappings, and protein function prediction performance evaluation, please refer to our pre-print article:
|
29 |
+
|
30 |
+
Ulusoy, E., & Dogan, T. (2022). Mutual Annotation-Based Prediction of Protein Domain Functions with Domain2GO. *bioRxiv*, 514980v1. [Link](https://www.biorxiv.org/content/10.1101/2022.11.03.514980v1)
|
31 |
+
|
32 |
+
|
33 |
+
Overall workflow of Domain2GO is shown below.
|
34 |
+
|
35 |
+
""")
|
36 |
+
|
37 |
+
st.image('figures/full_methodology.png', width=700)
|
38 |
+
|
39 |
+
st.markdown(
|
40 |
+
'<p style="text-align:center"><em><strong>Schematic representation of the proposed method. (A)</strong> The source datasets were downloaded and organized; <strong>(B)</strong> initial mapping between the InterPro domains and GO terms were obtained, and the mapping parameters were calculated; <strong>(C)</strong> generation of the randomized annotation and mapping sets were constructed; <strong>(D)</strong> co-occurrence similarity distributions were plotted, and thresholds were selected based on statistical resampling; <strong>(E)</strong> an ablation study was conducted by calculating the enrichment of top predictions ranked by different statistical measures and finalized Domain2GO mappings were generated by filtering initial mappings; <strong>(F)</strong> protein function predictions were generated by propagating Domain2GO mappings to target proteins.</em></p>',
|
41 |
+
unsafe_allow_html=True)
|
42 |
+
|
pages/User_Guide.py
ADDED
@@ -0,0 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
|
4 |
+
st.markdown('''
|
5 |
+
# Domain2GO User Guide
|
6 |
+
''')
|
7 |
+
|
8 |
+
# st.markdown('<p style="font-size:18px; font-weight:bold"></p>', unsafe_allow_html=True)
|
9 |
+
|
10 |
+
st.markdown('<p style="font-size:25px; font-weight:bold">How to use Domain2GO</p>', unsafe_allow_html=True)
|
11 |
+
st.markdown('<p style="font-size:20px; font-weight:bold">1. Submit your protein sequence</p>', unsafe_allow_html=True)
|
12 |
+
|
13 |
+
|
14 |
+
st.markdown(
|
15 |
+
'''
|
16 |
+
You can submit your protein sequence by pasting it into the text box or uploading a FASTA file.
|
17 |
+
|
18 |
+
Domain2GO only accepts a single protein sequence at a time due to the extended runtime of InterProScan. If you need predictions for multiple UniProtKB/Swiss-Prot proteins, we recommend utilizing our comprehensive protein function prediction dataset available in our [Github repository](https://github.com/HUBioDataLab/Domain2GO).
|
19 |
+
|
20 |
+
An example query sequence can be used by clicking the "Use example sequence" button below the input text box.
|
21 |
+
|
22 |
+
This sequence is also given below:
|
23 |
+
```
|
24 |
+
>sp|O18783|PLMN_NOTEU
|
25 |
+
MEYGKVIFLFLLFLKSGQGESLENYIKTEGASLSNSQKKQFVASSTEECEALCEKETEFVCRSFEHYNKEQKCVIMSENSKTSSVERKRDVVLFEKRIYLSDCKSGNGRNYRGTLSKTKSGITCQKWSDLSPHVPNYAPSKYPDAGLEKNYCRNPDDDVKGPWCYTTNPDIRYEYCDVPECEDECMHCSGENYRGTISKTESGIECQPWDSQEPHSHEYIPSKFPSKDLKENYCRNPDGEPRPWCFTSNPEKRWEFCNIPRCSSPPPPPGPMLQCLKGRGENYRGKIAVTKSGHTCQRWNKQTPHKHNRTPENFPCRGLDENYCRNPDGELEPWCYTTNPDVRQEYCAIPSCGTSSPHTDRVEQSPVIQECYEGKGENYRGTTSTTISGKKCQAWSSMTPHQHKKTPDNFPNADLIRNYCRNPDGDKSPWCYTMDPTVRWEFCNLEKCSGTGSTVLNAQTTRVPSVDTTSHPESDCMYGSGKDYRGKRSTTVTGTLCQAWTAQEPHRHTIFTPDTYPRAGLEENYCRNPDGDPNGPWCYTTNPKKLFDYCDIPQCVSPSSFDCGKPRVEPQKCPGRIVGGCYAQPHSWPWQISLRTRFGEHFCGGTLIAPQWVLTAAHCLERSQWPGAYKVILGLHREVNPESYSQEIGVSRLFKGPLAADIALLKLNRPAAINDKVIPACLPSQDFMVPDRTLCHVTGWGDTQGTSPRGLLKQASLPVIDNRVCNRHEYLNGRVKSTELCAGHLVGRGDSCQGDSGGPLICFEDDKYVLQGVTSWGLGCARPNKPGVYVRVSRYISWIEDVMKNN
|
26 |
+
```
|
27 |
+
|
28 |
+
If you choose to upload a FASTA file, please make sure that the contents of the file also follow the format shown above.
|
29 |
+
|
30 |
+
Please enter your email address in the text box below the sequence input box. InterProScan requests your email to notify you when your job is done. Your email will not be used for any other purpose.
|
31 |
+
|
32 |
+
''')
|
33 |
+
|
34 |
+
st.markdown('<p style="font-size:20px; font-weight:bold">2. Wait for your results</p>', unsafe_allow_html=True)
|
35 |
+
|
36 |
+
st.markdown(
|
37 |
+
'''
|
38 |
+
After submitting your protein sequence, Domain2GO will run InterProScan to find domains in your protein. This step may take a few minutes to complete.
|
39 |
+
''')
|
40 |
+
|
41 |
+
st.markdown('<p style="font-size:20px; font-weight:bold">3. View your results</p>', unsafe_allow_html=True)
|
42 |
+
|
43 |
+
st.markdown(
|
44 |
+
'''
|
45 |
+
After InterProScan is complete, you can view the predicted functions by clicking the "Show function predictions" button. The results will be displayed in a table with the following columns:
|
46 |
+
|
47 |
+
| Column name | Description |
|
48 |
+
| ------------- | ------------- |
|
49 |
+
| protein_name | Protein name you provided in the input FASTA. |
|
50 |
+
| GO_ID | Gene Ontology term ID. |
|
51 |
+
| GO_term | Gene Ontology term name. |
|
52 |
+
| GO_aspect | Gene Ontology term aspect. |
|
53 |
+
| domain_locations | List of locations of the domain in the protein sequence. |
|
54 |
+
| probability | Probability of the domain being associated with the GO term. You can find more information about calculation of this score in our [pre-print article](https://www.biorxiv.org/content/10.1101/2022.11.03.514980v1). |
|
55 |
+
| domain_accession | InterPro domain accession. |
|
56 |
+
| domain_name | InterPro domain name. |
|
57 |
+
''')
|
58 |
+
|
59 |
+
st.markdown(
|
60 |
+
'''
|
61 |
+
''')
|
62 |
+
|
63 |
+
st.markdown(
|
64 |
+
'''
|
65 |
+
You can download the results as a CSV file by clicking the "Download function predictions as CSV" button.
|
66 |
+
''')
|
67 |
+
|
68 |
+
# write another section for the warning messages that can be displayed on the main page
|
69 |
+
st.markdown('<p style="font-size:20px; font-weight:bold">4. Troubleshooting</p>', unsafe_allow_html=True)
|
70 |
+
|
71 |
+
st.markdown(
|
72 |
+
'''
|
73 |
+
Please check the following table for possible warning/error messages that can be displayed on the main page and their descriptions.
|
74 |
+
| Warning message | Description |
|
75 |
+
| ------------- | ------------- |
|
76 |
+
| 'No domains found.' | InterProScan did not find any domains in your protein sequence. If you are sure that your protein has domains, please check that your protein sequence is in a valid FASTA format. |
|
77 |
+
| Errors about InterProScan | InterProScan job failed. Your InterProScan job ID together with the error message returned by InterProScan is displayed on the main page. Please check this message or query the status of your InterProScan job by giving it to following URL: https://www.ebi.ac.uk/Tools/services/rest/iprscan5/status/{job_id} |
|
78 |
+
| 'No predictions made for domains found in sequence.' | Domains in your protein sequence are not associated with any GO terms in our mapping set. |
|
79 |
+
''')
|
80 |
+
|
81 |
+
st.markdown(
|
82 |
+
'''
|
83 |
+
''')
|
84 |
+
|
85 |
+
st.markdown(
|
86 |
+
'''
|
87 |
+
If you have any questions or encounter any problems, please create an issue in our [Github repository](https://github.com/HUBioDataLab/Domain2GO/issues) or open a discussion in our [HuggingFace space](https://huggingface.co/spaces/HUBioDataLab/Domain2GO/discussions).
|
88 |
+
''')
|
89 |
+
|