Welcome to S2D2 (Sherbrooke Spliced Domain Detector): a tool design to extract data from our Database, containing datas taken from the Ensembl database and from literature,
and allowing the display of a clear and easily understandable representation of the desired gene with all known transcripts listed.
Additional informations can be shown as tables and a figure after the submitting of the form depending on the selected options including the validation of the domains and/or localization-signals "constitutive" property.
The Search input can either be:
-a generic name (Ex: IPMK, RBFOX2, FGFR3),
-a specific ENSG identifier of a gene (Ex: ENSG0000151151, ENSG000006648) or
-a specific ENST identifier of a transcript(Ex: ENST00000481110, ENST00000369059).
The different checkboxes allows the search of specific proteins domains having an Interpro identifier if the "Domains" term is checked.
Otherwise, you can search for localization and/or transport signals.
-NLS : Nuclear localization signal
-NES : Nuclear Export signal
-NoLS: Nucleolar localization signal
The "Number of matching Pattern (special_id/ transcript)" links every proteins domains and/or localization signals to their specific transcript in the attempt to determine whether or not the motif is constitutive or not.
Every pair of (ipro_acc and ENST_id in the case of domain) or (typ and ENST_id in the case of localization signals ) is giving a special_id which stays the same as long as they overlap for each different transcript.
In the case that multiple domains proteins database are chosen, ( ex: pfam, smart, gene3D, etc)
some duplicates of the same pair might appear for the same transcript having the same name (ex: 2 domains named IPRO00719 overlapping each-other for the transcript ENST0000260795).
So to solve this conflict we label the "Similar" tag for the special_id value of the overlapping pairs (ipro_acc/ ENST_id) of the same transcript (identical name) and we only keep one value representative of the pair (ipro_acc/ transcript) or (typ/transcripts) to determine whether or not the proteins domains and/or localization signals are present in all transcripts.
As long as the same "NumberOfMatchingPattern(Special_id/ transcript)" is the same as the total number of transcripts
we can tell whether a particular domain or localization-signal is constitutive or not. For example, if in the "NumberOfMatchingPattern(Special_id/ transcript)'s column " the value is 7: it means that in 7 different transcripts there is a repeated domain or a repeated localization signals. Futhermore, if there is a total of 7 coding transcripts in total for the gene a "Yes" value is going to be assign in the "constitutive's" column, otherwise a "No" value is going to be assign meaning that there is some coding transcrips which do not have the pattern.
Citations for Interpro:
Robert D. Finn, Teresa K. Attwood, Patricia C. Babbitt, Alex Bateman, Peer Bork, Alan J. Bridge, Hsin-Yu Chang, Zsuzsanna Dosztányi, Sara El-Gebali, Matthew Fraser, Julian Gough, David Haft, Gemma L. Holliday, Hongzhan Huang, Xiaosong Huang, Ivica Letunic, Rodrigo Lopez, Shennan Lu, Aron Marchler-Bauer, Huaiyu Mi, Jaina Mistry, Darren A. Natale, Marco Necci, Gift Nuka, Christine A. Orengo, Youngmi Park, Sebastien Pesseat, Damiano Piovesan, Simon C. Potter, Neil D. Rawlings, Nicole Redaschi, Lorna Richardson, Catherine Rivoire, Amaia Sangrador-Vegas, Christian Sigrist, Ian Sillitoe, Ben Smithers, Silvano Squizzato, Granger Sutton, Narmada Thanki, Paul D Thomas, Silvio C. E. Tosatto, Cathy H. Wu, Ioannis Xenarios, Lai-Su Yeh, Siew-Yit Young and Alex L. Mitchell (2017). InterPro in 2017 — beyond protein family and domain annotations. Nucleic Acids Research, Jan 2017; doi: 10.1093/nar/gkw1107
Philip Jones, David Binns, Hsin-Yu Chang, Matthew Fraser, Weizhong Li, Craig McAnulla, Hamish McWilliam, John Maslen, Alex Mitchell, Gift Nuka, Sebastien Pesseat, Antony F. Quinn, Amaia Sangrador-Vegas, Maxim Scheremetjew, Siew-Yit Yong, Rodrigo Lopez, and Sarah Hunter (2014). InterProScan 5: genome-scale protein function classification. Bioinformatics, Jan 2014; doi:10.1093/bioinformatics/btu031
Citations for pfam:
The Pfam protein families database: towards a more sustainable future: R.D. Finn, P. Coggill, R.Y. Eberhardt, S.R. Eddy, J. Mistry, A.L. Mitchell, S.C. Potter, M. Punta, M. Qureshi, A. Sangrador-Vegas, G.A. Salazar, J. Tate, A. Bateman Nucleic Acids Research (2016) Database Issue 44:D279-D285
Citations for PANTHER: PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Huaiyu Mi, Xiaosong Huang, Anushya Muruganujan, Haiming Tang, Caitlin Mills, Diane Kang, and Paul D. Thomas Nucl. Acids Res. (2016) doi: 10.1093/nar/gkw1138
Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools Paul D. Thomas, Anish Kejariwal, Nan Guo, Huaiyu Mi, Michael J. Campbell, Anushya Muruganujan and Betty Lazareva-Ulitsky Nucl. Acids Res. (1 July 2006) 34 (suppl 2): W645-W650.
Citations for gene3D:
Gene3D: expanding the utility of domain assignments. Lam SD, Dawson NL, Das S, Sillitoe I, Ashford P, Lee D, Lehtinen S, Orengo CA, Lees JG. Nucleic Acids Res. 2016 Jan doi: 10.1093/nar/gkv1231
Citations for SMART:
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. Gough J, Karplus K, Hughey R, Chothia C. J Mol Biol. 2001 Nov 2;313(4):903-19.
Citations for Prosite:
De Castro E, Sigrist CJA, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E, Bairoch A, Hulo N. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W362-5. PubMed:16845026 [Full text] [PDF version]