G4RNA

G4RNA Tutorial


G4RNA is a database designed to house data on sequences tested for G-quadruplex (G4) folding. The browsing tool performs queries to G4RNA using the form exemplified below. This tutorial is intented to display different ways of addressing queries.The user is advised to replicate the presented examples to get accustomed to the form.

Search engines

Queries are formulated using one or both search engines to describe characteristics to be shared among sequences that will be shown in the results section. Let's take a look at the output of different searches using the default display fields (Display fields will be described later).


Keyword driven search

1- Search for gene symbol NRAS:

All sequences associated with the gene NRAS will be displayed in the results.


2- Search for multiple genes (NRAS, KRAS and VEGFA) using "Exact" search type:

We found the sequences from the previous search for NRAS as well as those for the genes KRAS and VEGFA. Notice that multiple terms can be supplied using an "Exact" search type, they need to be separated by either a space, a new line or one the following characters , ; : - _


3- Search for chain of characters "RAS" using the "contains" search type:

This time the results will present sequences associated with genes containing "RAS" in their symbol, in this case NRAS and KRAS.


4- Search for sequence "GGCGGCGGCGAAGG":

The sequence search will screen sequences from the database to find the ones matching the subsequence supplied as a search term.


5- Search for a sequence defined with a regular expression (L2 min 3 pyrimidines):

The search terms can be stated using regular expressions, this option is provided to offer more flexibility for users when formulating their queries. Moreover, sequences can be stated using IUPAC's nucleotide ambiguity code (N,Y,R,W...). The sequences matched by the supplied search term below must possess a subsequence presenting a classical G4 of 3 quartets with first and third loop of 1 to 7 nt and a second loop of a minimum of 3 pyrimidines.


6- Search for sequences that underwent a probing experiment:

The logic here is the same as previous examples, it searches for a term within the descriptions of the experiment.


7- Search for sequences published in 1994:

It is possible to use the information within the complete reference to formulate queries. Below, the searched sequences contain the "1994" chain of characters within the complete reference.


8- Search for sequences published by Nature Publishing Group:

This search uses the digital object identifier (DOI) which is a unique alphanumeric string assigned to every publication. Since the prefix of the DOI is shared by all publications from the same organization, the DOI can be used to search publications from a particular publishing group such as Nature's (DOI prefix 10.1038).


Position driven search

Search sequences of chromosome 11 between positions 10Mbp to 70Mbp:

The genomic position search engine is more intuitive, the user provides a search window inside a chromosome. It allows to focus your research when used with the keyword driven engine as well.


Display fields

Each display field choice represents a column which will be displayed in the table of results if selected. The default fields are : Gene symbol, location in mRNA, sequence length, Sequence, G4 folding and reference. Some display fields will allow more rows to be displayed than the default fields. In the exemple below, the "Experiment" field will split the row corresponding to LRP5's 50 nt long sequence in three rows since it was tested using three different methods. This is the case of the "Sequence identifier" field which allows the display of non wild-type sequences.


Sorting

The result table can be sorted using a list of selected display fields.