 Research
 Open Access
Using hidden Markov models to investigate Gquadruplex motifs in genomic sequences
 Masato Yano^{1} and
 Yuki Kato^{2}Email author
https://doi.org/10.1186/1471216415S9S15
© Yano and Kato; licensee BioMed Central Ltd. 2014
 Published: 8 December 2014
Abstract
Background
Gquadruplexes are fourstranded structures formed in guaninerich nucleotide sequences. Several functional roles of DNA Gquadruplexes have so far been investigated, where their putative functional roles during DNA replication and transcription have been suggested. A necessary condition for Gquadruplex formation is the presence of four regions of tandem guanines called Gruns and three nucleotide subsequences called loops that connect Gruns. A simple computational way to detect potential Gquadruplex regions in a given genomic sequence is pattern matching with regular expression. Although many putative Gquadruplex motifs can be found in most genomes by the regular expressionbased approach, the majority of these sequences are unlikely to form Gquadruplexes because they are unstable as compared with canonical double helix structures.
Results
Here we present elaborate computational models for representing DNA Gquadruplex motifs using hidden Markov models (HMMs). Use of HMMs enables us to evaluate Gquadruplex motifs quantitatively by a probabilistic measure. In addition, the parameters of HMMs can be trained by using experimentally verified data. Computational experiments in discriminating between positive and negative Gquadruplex sequences as well as reducing putative Gquadruplexes in the human genome were carried out, indicating that HMMbased models can discern bona fide Gquadruplex structures well and one of them has the possibility of reducing false positive Gquadruplexes predicted by existing regular expressionbased methods. Furthermore, our results show that one of our models can be specialized to detect Gquadruplex sequences whose functional roles are expected to be involved in DNA transcription.
Conclusions
The HMMbased method along with the conventional pattern matching approach can contribute to reducing costly and laborious wetlab experiments to perform functional analysis on a given set of potential Gquadruplexes of interest. The C++ and Perl programs are available at http://tcs.cira.kyotou.ac.jp/~ykato/program/g4hmm/.
Keywords
 Regular Expression
 Hide State
 Viterbi Algorithm
 Forward Algorithm
 Pattern Match Approach
Background
Deoxyribonucleic acids (DNAs) are macromolecules that hold genetic information in almost all of the organisms. The bulk of existing DNA molecules is assumed to form a righthanded double helical structure called BDNA [1], where each constituent bases A and C selectively bind to bases T and G, respectively, between two strands arranged in the antiparallel way. In contrast, several in vitro experiments reveal the existence of nonBDNA structures caused by particular sequence motifs and DNAprotein interactions. Well investigated examples include Gquadruplex (G4), ZDNA, cruciform and triplex. Recent advances in providing in vitro evidence of these specific structures develop the hypothesis that these structures are considered to have some functional roles in living cells [2].
Eukyariotic telomeric sequences include Grich regions and they can form G4 structures in vitro. However, the question of how many such Grich regions can actually form G4 structures in vivo has not been resolved. The potential to form G4 structures in telomeric sequences in vivo can be shown by in vitro DNA binding experiments with those sequences [2]. For example, telomere endbinding proteins in ciliates can control the formation of G4 DNA structures at telomeres [6]. Interestingly, however, a recent study suggests that endogenous G4 structures in human cells are present largely outside the telomeres [7]. Another work reports that protruding nucleotides in human telomeric sequences destabilize the G4 structure and overhanging sequences influence the folding of the quadruplex [8]. Other examples of Grich regions in genomes are transcriptional start sites, mitotic and meiotic double strand break sites. Although G4 structures have stability with higher temperature than that of canonical double helix structures, many functional regions in genomic sequences have not a few Grich motifs [2], motivating us to investigate further the functional roles of G4 structures.
Since little is known about the functions of G4 structures and genomescale wetlab experiments with nuclear magnetic resonance (NMR) spectroscopy for structural analysis [9] are not feasible, several computational efforts have been made on identifying the locations of potential G4 sequences in genomic DNAs and inferring their functions by comparative sequence analysis using related genes with known functions [10, 11]. In principle, G4 motifs can be represented by a regular expression G^{+}N ^{∗}G^{+}N^{∗}G^{+}N^{∗}G^{+}, where 'N' shows an arbitrary base including G, '+' denotes at least one repeat of the preceding symbol and '∗' means at least zero repeats. Due to this simple pattern of G4 motifs, several in silico methods have been proposed to detect G4 sequences in genomes using pattern matching with regular expression [12–16]. Moreover, regular expressionbased methods that incorporate a simple scoring scheme are proposed [17–19]. Another computational study focuses on thermodynamic stability of G4 structures using Gaussian process regression [20]. Although the pattern matching approaches can detect many G4 motifs in genomic sequences quite fast, it is pointed out that the majority of these motifs may be false positive G4 sequences [21, 22].
In this contribution, we present more elaborate computational models than regular expression to represent G4 motifs, employing hidden Markov models (HMMs). HMMs are so flexible in modeling linear dependence that they are widely used in bioinformatics including protein secondary structure prediction [23, 24] and sequence motif search [25]. To model G4 motifs, we provide four HMMbased models from the viewpoint of the number of hidden states that describe Gruns and loops, and compare with each other in three computational experiments. The first preliminary experiment in predicting Grun regions in a set of 100 real G4 sequences in the literature [20] indicates that each HMMbased model can represent actual Grun regions well. The subsequent experiment in discriminating real and shuffled G4 sequences by using HMMs shows that the models considering detailed distributions of Grun and loop lengths can outperform the simple probabilistic extension of regular expression. In the third test with statistical analysis in discriminating highly likely G4 structures from putative G4 motifs in the human premRNA sequences [26], the results show that the HMMbased model that can represent elaborate length distribution of Grun regions outperforms the other three models presented in this work. Moreover, the above model can be specialized to detect G4 sequences whose functional roles are expected to be involved in DNA transcription. Finally, this model in conjunction with pattern search is applied to G4 screening in the whole human genome, producing a considerably smaller number of G4 candidates with statistical significance than that of G4 sequences predicted by pattern matching alone.
Here we would like to emphasize the significance of our research findings as follows:

As compared with the regular expressionbased approach, our method can assess G4 motifs quantitatively by a probabilistic measure. Indeed, G4 motifs can be detected first by the "discrete" regular expressionbased method and then may be scored to judge their thermodynamic stability using energy parameters for G4 structures. However, to the best of our knowledge, elaborate energy parameters for G4 structures have not been available so far. Under these circumstances, probabilistic models including HMMs are useful in not only evaluating predictions quantitatively but also training the model parameters from experimentally verified data.

Our results show that HMMbased models are statistically reliable enough to detect a more specified motif among general G4 structures in genomic sequences, narrowing down potential G4 sequences predicted by the existing pattern matching method. This means that the combination of the regular expressionbased approach and our probabilistic method will help reduce expensive and laborious wetlab experiments more than the regular expression method alone will do to exhaustively analyze a given set of G4 motifs of interest. We believe that our research findings can boost understanding of functional roles of G4 structures in genomes, as well as helping to design therapeutic drugs that target specific G4 structures.
Results and discussion
We develop four HMMs to see how well the models can represent real G4 sequences and can reduce false positive G4 sequences from putative ones. To put it simply, the HMMs developed have four sets of hidden states for Gruns linked by three sets of hidden states for loops (see Methods for details of HMMs). In addition, the parameters of HMMs were trained by experimentally verified data in the literature [20].
Predicting Grun regions
Stegle et al. [20] provide a dataset of 260 G4 structures, which were experimentally verified with varying salt concentrations. Note that the corresponding sequences are of the form G^{+}N^{∗}G^{+}N^{∗}G^{+}N^{∗}G^{+} in regular expression. In our test, we used 100 sequences out of 260 because the original dataset contains duplicate sequences with different salt concentrations.
where SEN, PPV and F denote sensitivity, positive predictive value and Fmeasure, respectively.
Results of predicting Grun regions in 100 real G4 sequences verified experimentally in [20].
HMM  TP  FP  FN  SEN  PPV  F  Time (s) 

Model 1  1196  3  0  1.000  0.998  0.999  0.008 
Model 2  1113  2  83  0.931  0.998  0.963  0.014 
Model 3  886  1  310  0.741  0.999  0.851  0.017 
Model 4  1080  1  116  0.903  0.999  0.949  0.025 
Discriminating G4 sequences
We first investigate the discriminative performance of the four HMMbased models between real and shuffled G4 sequences. More specifically, we first randomly split the set of 100 real G4 sequences in Stegle et al.'s dataset [20] into two sets of 50 positive sequences, where one set is for training and the other is for validation. Next, a set of 50 negative sequences for validation was created by doing trinucleotide shuffling [27] of 50 positive sequences in the validation set. Note that use of trinucleotide shuffling comes from the observation that G4 structures often have at least three consecutive Gs as each Grun to make their structures stable. In total, we have 100 sequences in the validation set where 50 sequences are positive and the other 50 sequences are negative.
where $\overline{L}$ and s denote the average and the standard deviation, respectively, of all validation sequences.
Reducing potential G4 sequences in database
Human genes that include putative nonoverlapping G4 sequences used in our experiments.
Gene symbol  # putative G4s  Length (nt)  Gene symbol  # putative G4s  Length (nt) 

AHNAK  762  113317  MAWBP  204  50268 
ARS2  105  13551  MGC3207  72  9751 
BPGM  102  33014  MGC4707  815  227682 
C14orf138  22  7942  NGFRAP1  19  1734 
CCM2  373  76283  NT5C3  120  48668 
CNOT4  309  148303  PHF14  336  195730 
COMMD6  18  11871  PLEKHH1  295  56220 
DIP2A  471  109711  PPP1R9A  742  386048 
DKFZp761I2123  161  20951  RAB37  457  76205 
DNAJA5  76  29372  RAP1B  130  49723 
EGFR  695  188307  RGS6  2135  630822 
ERCC1  118  14306  SEMA5B  786  119412 
FLJ20097  232  126686  SF1  78  14164 
FMO3  57  26924  SLC37A3  303  64760 
FOXM1  86  19455  SP8  29  4605 
FPRL1  41  9327  SUNC1  109  41972 
FUS  82  11648  SYNJ1  256  99205 
GMFB  24  14536  TFEC  145  95597 
HTF9C  69  5371  TJP2  283  81032 
IFRD1  171  53022  TRAF7  273  22332 
IMPDH1  176  17976  UPP1  108  19976 
ITM2C  135  14343  USP42  26  56635 
KIAA2010  174  52859  ZAP70  222  26293 
KRIT1  105  47132  ZCCHC11  312  129797 
LOC285989  86  14703  ZNF32  25  5020 
Functional analysis of putative G4 sequences
where G(X) is the number of G4 sequences in the gene X and X shows the length of X. For the original G4 candidates in the GRSDB2 database and their reduced G4 sequences computed by the HMMbased models 2 and 4 with the cutoff of lowertailed 5% point of the standard normal distribution, we calculated G4 density of each gene and converted it into the Zscore in each case. It should be noted that the Zscores were calculated over all genes in each case. We should also notice that the point here is to make clear which gene can be considered to have significantly many G4 sequences.
G4 density of each gene computed from the results of the HMMbased model 2.
Gene symbol  Z(D_{ pred })  Z(D_{ ref })  Gene symbol  Z(D_{ pred })  Z(D_{ ref }) 

HTF9C  2.914  2.753  FPRL1  0.275  0.138 
TRAF7  2.601  2.540  MGC4707  0.407  0.417 
NGFRAP1  2.298  2.107  KIAA2010  0.438  0.515 
IMPDH1  1.453  1.708  TJP2  0.444  0.447 
ITM2C  1.395  1.578  EGFR  0.452  0.379 
ZAP70  1.365  1.247  IFRD1  0.517  0.538 
ERCC1  1.347  1.180  RGS6  0.537  0.484 
DKFZp761I2123  0.944  0.987  RAP1B  0.677  0.747 
ARS2  0.896  1.009  BPGM  0.681  0.585 
MGC3207  0.848  0.884  C14orf138  0.695  0.694 
SP8  0.733  0.513  DNAJA5  0.743  0.756 
AHNAK  0.700  0.659  SUNC1  0.753  0.753 
FUS  0.689  0.767  NT5C3  0.771  0.798 
SEMA5B  0.470  0.610  SYNJ1  0.775  0.759 
LOC285989  0.457  0.359  ZCCHC11  0.806  0.819 
RAB37  0.364  0.410  KRIT1  0.845  0.879 
UPP1  0.154  0.208  CNOT4  0.915  0.929 
SF1  0.143  0.242  FMO3  0.917  0.917 
PLEKHH1  0.142  0.154  PPP1R9A  0.991  0.984 
ZNF32  0.066  0.062  FLJ20097  1.012  1.015 
SLC37A3  0.025  0.041  PHF14  1.017  1.054 
CCM2  0.028  0.031  GMFB  1.076  1.077 
FOXM1  0.044  0.129  TFEC  1.113  1.123 
DIP2A  0.162  0.173  COMMD6  1.194  1.123 
MAWBP  0.221  0.253  USP42  1.448  1.484 
G4 density of each gene computed from the results of the HMMbased model 4.
Gene symbol  Z(D_{ pred })  Z(D_{ ref })  Gene symbol  Z(D_{ pred })  Z(D_{ ref }) 

HTF9C  2.732  2.753  MAWBP  0.272  0.253 
TRAF7  2.613  2.540  EGFR  0.393  0.379 
NGFRAP1  2.199  2.107  MGC4707  0.420  0.417 
IMPDH1  1.712  1.708  TJP2  0.452  0.447 
ITM2C  1.608  1.578  RGS6  0.501  0.484 
ZAP70  1.237  1.247  KIAA2010  0.505  0.515 
ERCC1  1.224  1.180  IFRD1  0.548  0.538 
ARS2  0.995  1.009  BPGM  0.616  0.585 
DKFZp761I2123  0.967  0.987  C14orf138  0.675  0.694 
MGC3207  0.801  0.884  RAP1B  0.744  0.747 
FUS  0.734  0.767  SUNC1  0.752  0.753 
AHNAK  0.679  0.659  DNAJA5  0.763  0.756 
SEMA5B  0.593  0.610  SYNJ1  0.770  0.759 
SP8  0.563  0.513  NT5C3  0.811  0.798 
RAB37  0.417  0.410  ZCCHC11  0.817  0.819 
LOC285989  0.382  0.359  KRIT1  0.888  0.879 
SF1  0.286  0.242  CNOT4  0.930  0.929 
UPP1  0.180  0.208  FMO3  0.930  0.917 
PLEKHH1  0.157  0.154  PPP1R9A  1.001  0.984 
ZNF32  0.101  0.062  FLJ20097  1.021  1.015 
CCM2  0.023  0.031  PHF14  1.063  1.054 
SLC37A3  0.032  0.041  GMFB  1.068  1.077 
FOXM1  0.114  0.129  TFEC  1.130  1.123 
FPRL1  0.142  0.138  COMMD6  1.174  1.123 
DIP2A  0.188  0.173  USP42  1.486  1.484 
Information on genes that have significantly many putative G4 structures.
Gene symbol  Length (nt)  Gene ontology 

HTF9C  5371  Function: RNA binding, methyltransferase activity, nucleotide binding. Process: metabolic process. 
TRAF  22332  Function: ligase activity, metal ion binding, protein binding, ubiquitinprotein, ligase activity, zinc ion binding. Process: activation of MAPKKK activity, positive regulation of MAPKKK cascade, protein ubiquitination, regulation of apoptosis, regulation of transcription, DNAdependent, transcription. Component: ubiquitin ligase complex. 
NGFRAP1  1734  Function: metal ion binding, molecular function. Process: apoptosis, biological_process, multicellular organismal development. Component: cellular_component, nucleus. 
IMPDH1  17976  Function: IMP dehydrogenase activity, catalytic activity, metal ion binding, oxydoreductase activity, potassium ion binding. Process: GMP biosynthetic process, GTP biosynthetic process, lymphocyte proliferation, metabolic process, purine nucleotide biosynthetic process, response to stimulus, visual perception. 
Applying HMM to whole human genome
The third experiment stated above focuses only on premRNA sequences in the human genome, leaving further potential G4 sequences over the whole genome. Thus, we demonstrate here how many potential G4 sequences the regular expressionbased method can detect in the whole human genome and how many our method can reduce.
Comparison of the number of Gmotifs in the human genome between use of regular expression (RE) alone and that of HMM (model 2) together with RE.
Model  # G4 motifs  % Reduction  Time 

RE  100332  N/A  2m 5.296s 
RE+HMM with cutoff 1  94272  6.040  2m 5.296s + 22.245s 
RE+HMM with cutoff 2  3285  96.726  2m 5.296s + 22.245s 
Discussion
From our experimental results, the following two points on the constitution of HMMs become clear:

Increasing the hidden states for representing Gruns in an HMM can lead to small variance of the probability distribution over input sequences given the model.

Increasing the hidden states for describing loops can make the HMM flexible.
The first point can be explained by Figures 3 and 4, while Figure 5 gives a good account of the second point.
Here we will look closely at Gruns and loops in G4 sequences. Recall that Grun is a region of consecutive Gs involved in Gquartets and loop is a single strand consisting of arbitrary bases that connect Gruns in front and behind. Since the HMMbased model 2 as well as the model 4 is specialized to represent consecutive Gs, each G in Gruns will be strictly discriminated in the model, affecting the sharpness of the probability distribution over the set of input sequences. On the other hand, the model 3 has more hidden states that can represent any base, and thus it can output an arbitrary sequence in a more flexible framework and show multimodal probability distribution. Viewed in this light, we may say that the model 4 has the broader distribution of Zscores due to increase in hidden states for representing loops, and several groups of peaks because of increase in hidden states for describing Gruns (see also Figure 6). Although the different peaks in score distributions may tell us which potential G4 sequence actually forms G4 structure in vitro and/or in vivo, experimental verification in wetlabs is still awaited.
Conclusions
We presented the HMMbased modelings for G4 motifs in anticipation of reducing false positive G4 sequences in genomic DNAs detected by simple pattern matching with regular expression. The discrimination test with the HMMs was indicative of high discriminative power of elaborate models between positive and negative G4 sequences. Our computational experiments with statistical analysis on potential G4 sequences in human genomes make it clear that the HMMbased model that considers detailed distribution of Grun length can discriminate well between G4 sequences that match the model and those that do not. Moreover, another experimental results suggest that the above HMMbased model can be specialized to detect genes whose functional roles are expected to be involved in transcription, which include significantly many G4 sequences. Furthermore, this model in conjunction with use of regular expression can detect a considerably smaller number of G4 candidates in the whole human genome with statistical significance. Therefore, we may reasonably conclude that the HMMbased approach together with the conventional pattern matching method can contribute to reducing costly and laborious wetlab experiments to exhaustively analyze a given set of G4 motifs of interest.
In this work, we proposed the HMMbased models where each Grun has variable length. In contrast, applying HMMs that deal only with a specific fixed length of Gruns to genomic sequences may yield more accurate discrimination of G4 sequences. In addition, change of the training sequences that should be verified experimentally may have a certain effect on prediction results. In this sense, collaboration between in silico, in vitro and in vivo experiments will be even more important to advance functional analysis of G4 structures in genomes of various organisms.
Methods
A G4 sequence comprises alternate Gruns and loops, which can be described as G^{+}N^{∗}G^{+}N^{∗}G^{+}N^{∗}G^{+} in regular expression. In particular, the majority of existing pattern matchingbased methods assume that the length of a Grun is between three and five and that of a loop is between one and seven [11]. To model the G4 motif by HMMs, we focus on which state of Grun and loop each base in a given sequence is decoded into. Advantages of use of HMMs can be summarized as follows:

The most likely state path that corresponds to structural elements in a sequence can be predicted by the Viterbi algorithm.

The probability of a sequence given the parameterized model can be calculated by the forward algorithm.

Optimal probability parameters of the model can be estimated on a set of example sequences by the BaumWelch algorithm.
The Viterbi algorithm can compute the most probable state path of an HMM for a given sequence in O(m^{2}n) time based on dynamic programming, where m is the number of hidden states in the HMM and n is the sequence length. The forward algorithm and the backward algorithm, which is analogous to the forward algorithm but differs in that a backward recursion starts at the end of a sequence, can compute the probability of a sequence given an HMM by dynamic programming with the same running time of the Viterbi algorithm. Finally, the BaumWelch algorithm can calculate optimal parameters of an HMM given a set of training sequences, where the forward and backward algorithms are repeatedly used until the change in log likelihood of the sequences is less than some threshold. Details of the algorithms can be found in [25].
Declarations
Publication charges were supported by JSPS KAKENHI [#24700296 to YK]
Declarations
Acknowledgements
We would like to thank Prof. Shigehiko Kanaya at Nara Institute of Science and Technology for his helpful comments. This work was supported by JSPS KAKENHI [#24700296 to YK].
This article has been published as part of BMC Genomics Volume 15 Supplement 9, 2014: Thirteenth International Conference on Bioinformatics (InCoB2014): Computational Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcgenomics/supplements/15/S9.
Authors’ Affiliations
References
 Watson JD, Crick FH: Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature. 1953, 171: 737738. 10.1038/171737a0.PubMedView ArticleGoogle Scholar
 Bochman ML, Paeschke K, Zakian VA: DNA secondary structures: stability and function of Gquadruplex structures. Nat Rev Genet. 2012, 13: 770780. 10.1038/nrg3296.PubMedPubMed CentralView ArticleGoogle Scholar
 Huppert JL: Structure, location and interactions of Gquadruplexes. FEBS J. 2010, 277: 34523458. 10.1111/j.17424658.2010.07758.x.PubMedView ArticleGoogle Scholar
 Guédin A, Gros J, Alberti P, Mergny JL: How long is too long? Effects of loop size on Gquadruplex stability. Nucleic Acids Res. 2010, 38: 78587868. 10.1093/nar/gkq639.PubMedPubMed CentralView ArticleGoogle Scholar
 Takahama K, Sugimoto C, Arai S, Kurokawa R, Oyoshi T: Loop lengths of Gquadruplex structures affect the Gquadruplex DNA binding selectivity of the RGG motif in ewing's sarcoma. Biochemistry. 2011, 50: 53695378. 10.1021/bi2003857.PubMedView ArticleGoogle Scholar
 Paeschke K, Simonsson T, Postberg J, Rhodes D, Lipps HJ: Telomere endbinding proteins control the formation of Gquadruplex DNA structures in vivo. Nat Struct Mol Biol. 2005, 12: 847854. 10.1038/nsmb982.PubMedView ArticleGoogle Scholar
 Biffi G, Tannahill D, McCafferty J, Balasubramanian S: Quantitative visualization of DNA Gquadruplex structures in human cells. Nat Chem. 2013, 5: 182186. 10.1038/nchem.1548.PubMedPubMed CentralView ArticleGoogle Scholar
 Viglasky V, Bauer L, Tluckova K, Javorsky P: Evaluation of human telomeric Gquadruplexes: the influence of overhanging sequences on quadruplex stability and folding. J Nucleic Acids. 2010, 2010:Google Scholar
 Adrian M, Heddi B, Phan AT: NMR spectroscopy of Gquadruplexes. Methods. 2012, 57: 1124. 10.1016/j.ymeth.2012.05.003.PubMedView ArticleGoogle Scholar
 Todd AK: Bioinformatics approaches to quadruplex sequence location. Methods. 2007, 43: 246277. 10.1016/j.ymeth.2007.08.004.PubMedView ArticleGoogle Scholar
 Huppert JL: Hunting Gquadruplexes. Biochimie. 2008, 90: 11401148. 10.1016/j.biochi.2008.01.014.PubMedView ArticleGoogle Scholar
 Todd AK, Johnston M, Neidle S: Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res. 2005, 33: 29012907. 10.1093/nar/gki553.PubMedPubMed CentralView ArticleGoogle Scholar
 Huppert JL, Balasubramanian S: Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 2005, 33: 29082916. 10.1093/nar/gki609.PubMedPubMed CentralView ArticleGoogle Scholar
 Rawal P, Kummarasetti VB, Ravindran J, Kumar N, Halder K, Sharma R, Mukerji M, Das SK, Chowdhury S: Genomewide prediction of G4 DNA as regulatory motifs: role in Escherichia coli global regulation. Genome Res. 2006, 16: 644655. 10.1101/gr.4508806.PubMedPubMed CentralView ArticleGoogle Scholar
 Huppert JL, Balasubramanian S: Gquadruplexes in promoters throughout the human genome. Nucleic Acids Res. 2007, 35: 406413.PubMedPubMed CentralView ArticleGoogle Scholar
 Cao K, Ryvkin P, Johnson FB: Computational detection and analysis of sequences with duplexderived interstrand Gquadruplex forming potential. Methods. 2012, 57: 310. 10.1016/j.ymeth.2012.05.002.PubMedPubMed CentralView ArticleGoogle Scholar
 D'Antonio L, Bagga P: Computational methods for predicting intramolecular Gquadruplexes in nucleotide sequences. Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference (CSB2004). 2004, Stanford, CA, 561562. 1619 August 2004Google Scholar
 Kikin O, D'Antonio L, Bagga PS: QGRS Mapper: a webbased server for predicting Gquadruplexes in nucleotide sequences. Nucleic Acids Res. 2006, 34: 676682. 10.1093/nar/gkj467.View ArticleGoogle Scholar
 Beaudoin JD, Jodoin R, Perreault JP: New scoring system to identify RNA Gquadruplex folding. Nucleic Acids Res. 2014, 42: 12091223. 10.1093/nar/gkt904.PubMedPubMed CentralView ArticleGoogle Scholar
 Stegle O, Payet L, Mergny JL, MacKay DJC, Huppert JL: Predicting and understanding the stability of Gquadruplexes. Bioinformatics. 2009, 25: 374382. 10.1093/bioinformatics/btp210.View ArticleGoogle Scholar
 Beaudoin JD, Perreault JP: 5'UTR Gquadruplex structures acting as translational repressors. Nucleic Acids Res. 2010, 38: 70227036. 10.1093/nar/gkq557.PubMedPubMed CentralView ArticleGoogle Scholar
 Lorenz R, Bernhart SH, Externbrink F, Qin J, Siederdissen CH, Amman F, Hofacker IL, Stadler PF: RNA folding algorithms with Gquadruplexes. Lect Notes Bioinform. 2012, 7409: 4960.Google Scholar
 Asai K, Hayamizu S, Handa K: Prediction of protein secondary structure by the hidden Markov model. Comput Appl Biosci. 1993, 9: 141146.PubMedGoogle Scholar
 Krogh A, Brown M, Mian IS, Sjölander K, Haussler D: Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994, 235: 15011531. 10.1006/jmbi.1994.1104.PubMedView ArticleGoogle Scholar
 Durbin R, Eddy SR, Krogh A, Mitchison G: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. 1998, Cambridge University Press, CambridgeView ArticleGoogle Scholar
 Kikin O, Zappala Z, D'Antonio L, Bagga PS: GRSDB2 and GRS_UTRdb: databases of quadruplex forming Grich sequences in premRNAs and mRNAs. Nucleic Acids Res. 2008, 36: 141148. 10.1093/nar/gkn705.View ArticleGoogle Scholar
 Jiang M, Anderson J, Gillespie J, Mayne M: uShuffle: a useful tool for shuffling biological sequences while preserving the klet counts. BMC Bioinform. 2008, 9: 19210.1186/147121059192.View ArticleGoogle Scholar
 Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hinrichs AS, Learned K, Lee BT, Li CH, Raney BJ, Rhead B, Rosenbloom KR, Sloan CA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ: The UCSC Genome Browser database: 2014 update. Nucleic Acids Res. 2014, 42: 764770. 10.1093/nar/gkt946.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.