Search S2D2

Welcome to S2D2 (Sherbrooke Spliced Domain Detector): a tool design to extract data from our Database, containing datas taken from the Ensembl database and from literature,
and allowing the display of a clear and easily understandable representation of the desired gene with all known transcripts listed.
Additional informations can be shown as tables and a figure after the submitting of the form depending on the selected options including the validation of the domains and/or localization-signals "constitutive" property.

The Search input can either be:
-a generic name (Ex: IPMK, RBFOX2, FGFR3),
-a specific ENSG identifier of a gene (Ex: ENSG0000151151, ENSG000006648) or
-a specific ENST identifier of a transcript(Ex: ENST00000481110, ENST00000369059).

Search Filter

The different checkboxes allows the search of specific proteins domains having an Interpro identifier if the "Domains" term is checked.
Otherwise, you can search for localization and/or transport signals.

-NLS : Nuclear localization signal

-NES : Nuclear Export signal

-NoLS: Nucleolar localization signal

-SP: Signal Peptide predicted by Phobius and SignalP

Proteins Domains Database

The selection of the protein families database can be use to show the predicting domains following the InterPro Annotation with links directing to them (note: possible duplicatas of a protein domain might appear if multiple protein domains database are checked): List of available database available: Max e_value: Any input added here folowing a certain format ( 0.XXXXX ou Xe-XXXXX where X is an integer) will restrict the datas fetch from the dataBase to an e_value lower than the input. The input can be none and in this case every datas possible will be selected.

Number Of Matching Pattern

The  "Number of matching Pattern (special_id/ transcript)" links every proteins domains and/or localization signals to their specific transcript in the attempt to determine whether or not the motif is constitutive or not.
Every pair of (ipro_acc and ENST_id in the case of domain) or (typ and ENST_id in the case of localization signals ) is giving a special_id which stays the same as long as they overlap for each different transcript.

In the case that multiple domains proteins database are chosen, ( ex: pfam, smart, gene3D, etc)
some duplicates of the same pair might appear for the same transcript having the same name (ex: 2 domains named IPRO00719 overlapping each-other for the transcript ENST0000260795).
So to solve this conflict we label the "Similar" tag for the special_id value of the overlapping pairs (ipro_acc/ ENST_id) of the same transcript (identical name) and we only keep one value representative of the pair (ipro_acc/ transcript) or (typ/transcripts) to determine whether or not the proteins domains and/or localization signals are present in all transcripts.

As long as the same "NumberOfMatchingPattern(Special_id/ transcript)" is the same as the total number of transcripts
we can tell whether a particular domain or localization-signal is constitutive or not. For example, if in the "NumberOfMatchingPattern(Special_id/ transcript)'s column " the value is 7: it means that in 7 different transcripts there is a repeated domain or a repeated localization signals. Futhermore, if there is a total of 7 coding transcripts in total for the gene a "Yes" value is going to be assign in the "constitutive's" column, otherwise a "No" value is going to be assign meaning that there is some coding transcrips which do not have the pattern.

References/ Publications

Citations for Interpro:
Robert D. Finn, Teresa K. Attwood, Patricia C. Babbitt, Alex Bateman, Peer Bork, Alan J. Bridge, Hsin-Yu Chang, Zsuzsanna Dosztányi, Sara El-Gebali, Matthew Fraser, Julian Gough, David Haft, Gemma L. Holliday, Hongzhan Huang, Xiaosong Huang, Ivica Letunic, Rodrigo Lopez, Shennan Lu, Aron Marchler-Bauer, Huaiyu Mi, Jaina Mistry, Darren A. Natale, Marco Necci, Gift Nuka, Christine A. Orengo, Youngmi Park, Sebastien Pesseat, Damiano Piovesan, Simon C. Potter, Neil D. Rawlings, Nicole Redaschi, Lorna Richardson, Catherine Rivoire, Amaia Sangrador-Vegas, Christian Sigrist, Ian Sillitoe, Ben Smithers, Silvano Squizzato, Granger Sutton, Narmada Thanki, Paul D Thomas, Silvio C. E. Tosatto, Cathy H. Wu, Ioannis Xenarios, Lai-Su Yeh, Siew-Yit Young and Alex L. Mitchell (2017). InterPro in 2017 — beyond protein family and domain annotations. Nucleic Acids Research, Jan 2017; doi: 10.1093/nar/gkw1107
Philip Jones, David Binns, Hsin-Yu Chang, Matthew Fraser, Weizhong Li, Craig McAnulla, Hamish McWilliam, John Maslen, Alex Mitchell, Gift Nuka, Sebastien Pesseat, Antony F. Quinn, Amaia Sangrador-Vegas, Maxim Scheremetjew, Siew-Yit Yong, Rodrigo Lopez, and Sarah Hunter (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics, Jan 2014; doi:10.1093/bioinformatics/btu031


Citations for pfam:
The Pfam protein families database: towards a more sustainable future: R.D. Finn, P. Coggill, R.Y. Eberhardt, S.R. Eddy, J. Mistry, A.L. Mitchell, S.C. Potter, M. Punta, M. Qureshi, A. Sangrador-Vegas, G.A. Salazar, J. Tate, A. Bateman Nucleic Acids Research (2016) Database Issue 44:D279-D285

Citations for PANTHER: PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Huaiyu Mi, Xiaosong Huang, Anushya Muruganujan, Haiming Tang, Caitlin Mills, Diane Kang, and Paul D. Thomas Nucl. Acids Res. (2016) doi: 10.1093/nar/gkw1138
Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools Paul D. Thomas, Anish Kejariwal, Nan Guo, Huaiyu Mi, Michael J. Campbell, Anushya Muruganujan and Betty Lazareva-Ulitsky Nucl. Acids Res. (1 July 2006) 34 (suppl 2): W645-W650.

Citations for gene3D:
Gene3D: expanding the utility of domain assignments. Lam SD, Dawson NL, Das S, Sillitoe I, Ashford P, Lee D, Lehtinen S, Orengo CA, Lees JG. Nucleic Acids Res. 2016 Jan doi: 10.1093/nar/gkv1231

Citations for SMART:
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. Gough J, Karplus K, Hughey R, Chothia C. J Mol Biol. 2001 Nov 2;313(4):903-19.

Citations for Prosite:
De Castro E, Sigrist CJA, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E, Bairoch A, Hulo N. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W362-5. PubMed:16845026 [Full text] [PDF version]