- Research article
- Open Access
Novel porcine repetitive elements
© Wiedmann et al; licensee BioMed Central Ltd. 2006
- Received: 01 September 2006
- Accepted: 01 December 2006
- Published: 01 December 2006
Repetitive elements comprise ~45% of mammalian genomes and are increasingly known to impact genomic function by contributing to the genomic architecture, by direct regulation of gene expression and by affecting genomic size, diversity and evolution. The ubiquity and increasingly understood importance of repetitive elements contribute to the need to identify and annotate them. We set out to identify previously uncharacterized repetitive DNA in the porcine genome. Once found, we characterized the prevalence of these repeats in other mammals.
We discovered 27 repetitive elements in 220 BACs covering 1% of the porcine genome (Comparative Vertebrate Sequencing Initiative; CVSI). These repeats varied in length from 55 to 1059 nucleotides. To estimate copy numbers, we went to an independent source of data, the BAC-end sequences (Wellcome Trust Sanger Institute), covering approximately 15% of the porcine genome. Copy numbers in BAC-ends were less than one hundred for 6 repeat elements, between 100 and 1000 for 16 and between 1,000 and 10,000 for 5. Several of the repeat elements were found in the bovine genome and we have identified two with orthologous sites, indicating that these elements were present in their common ancestor. None of the repeat elements were found in primate, rodent or dog genomes. We were unable to identify any of the replication machinery common to active transposable elements in these newly identified repeats.
The presence of both orthologous and non-orthologous sites indicates that some sites existed prior to speciation and some were generated later. The identification of low to moderate copy number repetitive DNA that is specific to artiodactyls will be critical in the assembly of livestock genomes and studies of comparative genomics.
- Repetitive Element
- Count Number
- Repeat Element
- Much Recent Common Ancestor
- Estimate Copy Number
Repetitive elements comprise ~45%  of mammalian genomes and are increasingly known to impact genomic function by contributing to the genomic architecture, by direct regulation of gene expression [2, 3] and by affecting genomic size, diversity and evolution [4–8]. The ubiquity and increasingly understood importance of repetitive elements (REs) contribute to the need to identify and annotate REs . In recent years, several attempts have been made to automate the process of de novo identification and characterization of REs [10–16]. The algorithms take into account the likely evolutionary history of the REs – not only genetic drift, but also the processes that lead to the juxtaposition of REs . Because knowing the evolutionary history of each RE helps to define the type of RE, these algorithms are valuable not only in identifying repetitive sequence, but also in increasing our understanding of the evolutionary role of the identified RE. Our initial attempt was to identify novel repetitive DNA with a program called RECON , which produced 14,067 families of REs with 249 of those having count numbers of 10 or more. We decided a different approach was needed that would organize closely related elements in a parsimonious way. In this paper, we describe 27 novel porcine repetitive elements and estimate their prevalence in swine and other species.
We identified repetitive elements using a procedure similar to previously published methods [10, 11]. First, we used RepeatMasker  on the BAC sequences to mask out previously characterized repeat elements. Second, we identified all pair-wise alignments among masked sequences using BLAST . Third, we identified multiple copy sequence segments with alignments to many sites (≥ 10). Fourth, we clustered sites linked by pair-wise alignments and constructed phylogenetic trees. Fifth, excessive variation (2-fold) in copy number within a putative RE caused it to be divided; co-localization of RE among many sites caused them to be merged. Sixth, we examined flanking sequences of putative RE for clues about replication machinery or to consolidate RE that should be merged. Seventh, we estimated the prevalence of RE in an independent set of porcine sequences as well as in the genomes of other species.
Our method compared to RECON
The bulk of the automated parts of our process, Steps 2 through 4, were very similar to RECON . RECON does not appear to have analogues for Steps 1 (RepeatMasker), 5, 6, or 7. We utilized Step 1 to steer us away from previously characterized repetitive. We utilized the manually intensive Steps 5–7 to achieve a more parsimonious (smaller number of repeat families) than appeared to be possible with RECON alone. In this sense, we envision that our method is a complement to RECON, not a replacement.
Steps 1 – 4
Thirty-six percent of the sequence was masked by RepeatMasker. Comparing all unmasked sequence fragments (≥ 50 bp) produced 1,334,953 pair-wise alignments. One thousand five hundred seventy-nine highly redundant sequences (totaling 1.07 Mb) were identified that had a minimum of 10 hits for at least 50 contiguous bases. Sixty putative repeat element families resulted from clustering the 1579 highly redundant sequences. The repeat element families were labeled MPRE1 – MPRE60 (for M eat Animal Research Center P orcine R epetitive E lement). Their lengths ranged from 55 to 1059 bp and their copy numbers (across the 220 discovery BACs) ranged from 12 to 1102.
Steps 5 – 6
The 60 original MPREs were consolidated into 31 because of overlap or co-localization at multiple sites. Twenty-nine MPREs were absorbed into 31; the 31 original MPRE identifiers of the longer sequences were kept to maintain provenance. In addition, there were three combinations (MPRE20 and 57; MPRE15, 17, 19 and 26; MPRE44, 50 and 52) of repeats that frequently appeared together in the same order with some variation in their relative spacing. The most consistent group contained two elements – MPRE20 in reverse complement followed by a small gap, then MPRE57. All thirteen times that MPRE20 occurred, it occurred in this grouping. MPRE57 occurred 13 out of 14 times in this grouping. Naturally, we concluded that MPRE20 (600 bp) and MPRE57 (204 bp) were two parts of a longer RE that had a variable middle (100–250 bp range for all but one example). After examining the alignment in ClustalX, we could see that the middle was conserved except for an 84 bp deletion in one instance and a 67 bp insertion in another. Further review of the BACs showed that the 13 groups containing MPRE20 and MPRE57 sometimes occurred in overlapping regions between pairs of clones in the BAC collection, meaning that we only had 7 unique loci plus one very unique locus that had a PRE1a (Porcine Repeat Element 1a, as identified by RepeatMasker) inserted into the gap. There was no pattern to the gap in the other instances. We include this longer repeat element in our list of novel porcine repeat elements as MPRE61, which is more fully described in a later section.
The final alteration to the list of MPREs was the removal of MPRE48 due to its low copy number in the set of 275,595 porcine BAC-ends supplied by the Wellcome Trust Sanger Institute (hereafter shortened to "Sanger") . Surprisingly, MPRE48 was found to appear less frequently, only six times, in BAC-ends (335.9 Mb) than in the much smaller portion of the genome spanned by the set of fully sequenced BACs (36.4 Mb) from which the MPREs were derived. That brings the final number of novel repeat elements reported here to 27, although we decided against removing MPRE48 from the fasta file of MPREs, see Additional file 1.
Count numbers for novel porcine repetitive elements
BLAST hits to SSGI 1
BLAST hits to BAC-ends 2
count number regular 3
count number irregular 4
BLAST hits to Bovine Genome 5
The prevalence of these newly identified REs was compared to that of known REs. Three of the newly discovered porcine REs, MPRE11, 16 and 38, were more common than the LINE element L3 and one, MPRE42, was about as common as L3 (Table 1). The other 23 MPREs have lower count numbers. In the Sanger archive of 275,595 BAC-ends, the number of elements for all SINEs was 203,206, for all LINES was 116,107 and for all LTRs (Long Terminal Repeats) was 25,066 based on RepeatMasker. Looking specifically at the LINEs, the most common by far was L1 with 94,325, followed by L2 with 18,720 and L3 is third with 2,358.
These newly discovered repeat elements did not appear to be duplicated genes, LINE elements or expressed sequence that was transposed by a LINE element. To address these questions, the MPREs were translated and compared (BLAST) to the GenBank nr database and only one strong hit was found. MPRE1 hit Sus scrofa interferon alpha-1 precursor with a bit score of 352, so it was eliminated from further consideration as a novel RE. For comparison, the highest bit score of MPREs reported here was less than 50. The repeats were also compared (BLAST) to vectors, mitochondrial DNA, and tRNAs. The middle of MPRE58 did have high similarity to tRNA-GLU; otherwise, there were no substantial high-scoring pairs.
Certain difficulties arise when defining repeat elements. One is that REs often are present as mosaics of smaller subsets of commonly occurring sequences [21, 22]. Another is that REs can often sustain considerable mutations, including large truncations and insertions. Two extreme examples of this are the truncation of the 5' end during retrotransposition, and the insertion of one RE into the middle of another. A third difficulty requiring resolution is that segmental duplication will create very long repeated sequences that do not retro-transpose together, and therefore should be broken up into their retro-transposable component parts. RECON, the software for identification of REs described by Bao and Eddy, handles all three of these difficulties .
Our approach was intentionally a bit more simplistic. We were able to create a much more parsimonious set of RE than what we were able to generate with RECON. Whereas RECON intends to recreate the full repeat elements in the way that will make for the best possible additions to the RepeatMasker database, as well as aid in the study of the evolutionary history of the repeat elements, our goal was to mask out the most commonly repeated regions of the porcine genome. The technique we found most useful in refining the definitions of the MPREs was to plot the frequency of BLAST hits as a function of position within the sequence of the putative repeat elements. From the criteria used to define them, the number of hits was at least 10 across the whole sequence – but many showed a much higher hit frequency along part of their lengths. For purposes of comparison, we applied RECON to our pair-wise alignments from Step 2. RECON divided the 1,334,953 BLAST hits into 29,631 potential repeat elements that were then grouped into 14,067 families. Only 249 of these families had 10 or more elements. Note that it is possible for a family containing only one element to correspond to many BLAST hits. Rather than continue with so many families, we found that our method yielded a more parsimonious classification of moderately repetitive elements. One difference between the two methods was that our method required a minimum copy number prior to the formation of families of repeat elements.
The MPREs have no clear connections to known proteins. The NCBI BLASTX results for these sequences were typically a combination of description-less accessions and unrelated proteins in a variety of organisms. That remained true when the dataset was compared to the TIGR gene index for Sus scrofa .
The novel repeat elements were compared to known types of repeats – SINEs, LINEs and LTRs – and did not fit the definitions for those classes of repeat elements. Because RepeatMasker would mask out low-complexity regions, the methods used here would not initially find the tail ends of LTRs. Each MPRE was tested for nearby low-complexity regions and none were consistently found. One of the characteristics of SINEs is the presence of tRNA coding sequence in their 5 prime regions [23, 24]. Only MPRE58 had a region similar to tRNA, and that was in the middle of its sequence. LINEs are best characterized by their two ORFs – one coding for a reverse transcriptase and the other for a protein with RNA binding activity . All the MPREs were translated to potential proteins and compared to a comprehensive database (NCBI BLASTX). None of the results were similar to the possible translations of a LINE.
Counting repeat elements is challenging
Because of the degeneracy of repetitive elements it is difficult to arrive at an accurate count in the target genome. Another difficulty in the quantification of repeat elements is that REs are often composed of smaller repeat units that occur more frequently than the larger unit [21, 22].
To characterize the prevalence of MPREs, we went to an independent data set, the Sanger BAC-ends from the CHORI-242 library archived at Ensembl . Table 1 lists three different measures of prevalence of MPRE within these BAC-ends. The first measure (BLAST hits to BAC-ends) gives the number of hits that were at least half the length of the repeat element. An issue here is the typical size of the traces – an average of 1219 bp. The longer REs will tend to be under-counted due to edge effects in the trace archive. The next two measures of count number were calculated by plotting the number of BLAST hits as a function of position on the RE. Some of the resulting plots were smooth and flat across most of the RE with an expected drop-off near each end. For these "regular" plots the count number was the average value of the middle 90% of the plot amplitude. Other plots varied quite a bit in amplitude across the RE. This was likely due to sub-repeats that hit in areas of the genome that the whole repeat did not. During this measure of count number there was no lower limit to the size of the hit other than that needed to get the expectation value below 0.1. These were considered irregular and the algorithm for determining their count number was to take the smallest value on the plot after ignoring the first and last 10% of the plot. A few plots were only mildly irregular, and for those both the regular and irregular algorithms were used with both numbers reported in Table 1.
Comparing the novel repeat element content across genomes
The sequences of novel porcine repetitive elements listed here were compared (BLAST ) to a recent build of the complete cow genome (AAFCO2 from ) as well as against the mouse and human genomes. In the case of mouse, there were no significant similarities found. The comparison to the human genome yielded only one significant hit – a 37 bp long section of MPRE17 (870 bp long) matched once in chromosome 9 thousands of bp away from any annotated features. The comparison to the cow genome yielded a variety of results. Five of the 27 MPREs did not hit at all (MPREs 6, 22, 28, 50 and 60), and three others (MPREs 44, 49 and 61) had ten or fewer hits (Table 1), despite the fact that the cow genome contains ten times more sequence than the collection of porcine BAC-ends tested. Fourteen of the 27 MPREs appeared frequently in cow as well as pig, as indicated by having at least 1000 BLAST hits to the cow genome.
Because the collection of BACs spans only 1% of the whole pig or cow genome, we cannot rule out the possibility that all of the MPREs have at least one orthologous location in both species. The fact that 12 MPREs did not have blast hits in any of the cow BACs makes it seem likely that those 12 are relatively recent evolutionary occurrences. Of the 10 MPREs that appear most frequently in the cow, only two, MPRE55 and MPRE59, were observed to appear in orthologous locations among the tested set of fully-sequenced BACs.
A closer look at MPRE61
Allelic differences or SNP can be identified from cases where MPRE61 sites coincide with overlaps among CVSI BACs. MPRE61 sites coincide with 3 pairs of overlapping BACs, 1 (AC145413 and AC144901), 2 (AC139879, AC140099) and 3 (AC146932 and AC087424). In addition, an MPRE61 site coincided with a group of 3 overlapping BACs, including AC138784, AC138788 and AC138786. Overlapping BAC pair 2 had two single base differences, and pair 3 had 3 single base differences and one 43 bp insertion/deletion. No sequence differences were observed within MPRE61 for pair 1 or the group of 3 overlapping BACs.
To put the apparent allelic diversity rates into context, we examined the genetic sources of the DNA used to construct the BAC library (RPCI-44). The source of DNA for RPCI-44 was a pooled sample with equal contributions from 4 male crossbred pigs each comprised of 3/8 Landrace, 3/8 Yorkshire and ¼ Meishan . The probability of identifying SNP increases with the diversity of genomes sampled. For the cases of 2 overlapping BACs, the probability of sampling different genomes is 87.5%, different breeds is 65.7%, and one BAC of western (Landrace or Yorkshire) origin and the other of Meishan origin is 37.5%. The probability of sampling diverse genomes is higher for the case of 3 overlapping BACs. The probability of sampling more than one genome is 98.4%, more than one breed is 87.9%, and at least one BAC of western origin combined with one BAC of Meishan origin is 56.25%. The fact that we didn't observe SNP in one of the three pairs of overlapping BACs is not that unusual given that the probability of sampling identical genomes with at least one of the 3 pairs of overlapping BACs is 33% (1-.8753). On the other hand, the fact that we did not observe SNP within the group of 3 overlapping BACs given the relatively high probabilities of diverse genomes being sampled is unexpected.
MPRE61 size differences are not randomly distributed throughout the phylogenetic tree. Different sizes cluster on different branches of the tree; however, the clustering is not complete. This indicates that insertions and deletions (evolutionary events that cause size differences) occurred throughout the evolution of MPRE61, and in some cases while the element was still replicating. The incomplete clustering of sizes indicates evolutionary plasticity and as a result recurrent insertions and deletions.
From our experience, it seems that although some available programs may help with the process of identification of REs, a level of judiciousness is also required. The BLAST and phylogenetic analyses are proven to be useful to improve the efficacy, particularly when comparisons are made across species. Discovering the RE in one dataset and characterizing their prevalence and diversity in another was crucial to our effort.
Using an approach similar to previously published work but modified to fit our specific goals and data, several repetitive elements were identified in porcine and bovine genomes that do not exist in mouse or human. These elements do not contain signatures of previously identified retrotransposons, but seem to have undergone replication and mutation. Because these elements are in a lower copy number than most of the REs that make up mammalian genomes, they could be exploited in mapping or whole-genome association studies. As the porcine genome sequencing effort progresses, we should know more about the distribution, history and possible contribution of these repeats to the genomic architecture in artiodactyls.
The genuine challenge of genome sequencing and assembly would be enhanced with an improved understanding of repeat elements and their distributions, especially those repeat elements that are species specific.
Two hundred-twenty fully sequenced porcine BACs generated by the Comparative Vertebrate Sequencing Initiative [30, 31] were downloaded from the RPCI-44 clone library, totaling 36.4 Mb. RepeatMasker  masked out 36% of this sequence. All unmasked fragments of sequence that were at least 50 bp long were compared (BLAST) to the original data set. The BLAST parameters used were those recommended by Korf et al. (2003) for finding repeat elements, namely -r 1 -q -1 -G 2 -E 2 -W 9 -F "m D" -e 1 for NCBI-BLAST . The output, which contained 1,334,953 hits, was analyzed using two similar methods. One was to use the RECON software  downloaded from its website  and the other used separate, original PERL scripts that performed several of the same functions included in the RECON package.
PCR and sequencing
Primer pairs for amplification of genomic DNA were designed from consensus MPRE61 sequences using Primer3 . Primer sequences were 5'-TTTTCCTGTGGTGATTTGTGA-3' and 5'-GGGCGCTGGACTGCTCAAA-3' (positions 278–298 and 953-935 (5' to 3' on opposite strand) of MPRE61, respectively). PCR was performed in a PTC-225 DNA engine (MJ Research Inc, Watertown, MA) using 0.25 U Hot Star® Taq polymerase (Qiagen, Valencia, CA, USA), 1X of supplied buffer, 1.5 mM MgCl2, 200 μM dNTPs, 0.8 μM each primer, and 100 ng of genomic DNA in 25 μl reactions. The PCR mixture was held at 94°C for 15 min, and cycled 44 times at 94°C for 20 sec, held at 57°C annealing temperature for 30 sec and extension at 72°C for 1.5 min, followed by a final extension at 72°C for 5 min. Five μl of the PCR reaction was electrophoresed in 1.5% agarose gels to determine quality of amplification and a portion (2–4 μl) was used for cloning in pCR4-TOPO vector (Invitrogen, Carlsbad, CA). Plasmid DNA was prepared using standard alkaline lysis and PTFE filter plates (Millipore, Bedford, MA) and was sequenced with T7 primer.
The authors wish to thank Sue Hauver for expert technical assistance. Mention of trade names or commercial products is solely for the purpose of providing information and does not imply recommendation, endorsement or exclusion of other suitable products by the U.S. Department of Agriculture. This work was supported by USDA CRIS Project No. 5438-31000-071-00D and 5438-31000-073-00D.
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, International Human Genome Sequencing Consortium, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.PubMedView ArticleGoogle Scholar
- Han JS, Szak ST, Boeke JD: Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature. 2004, 429: 268-274. 10.1038/nature02536.PubMedView ArticleGoogle Scholar
- Fondon JW, Garner HR: Molecular origins of rapid and continuous morphological evolution. PNAS. 2004, 101 (52): 18058-18063. 10.1073/pnas.0408118101.PubMedPubMed CentralView ArticleGoogle Scholar
- Singer MF: SINEs and LINEs: highly repeated short and long interspersed sequences in mammalian genomes. Cell. 1982, 28: 433-434. 10.1016/0092-8674(82)90194-5.PubMedView ArticleGoogle Scholar
- Singer M, Berg P: Genes and Genomes. 1991, University Science Books, Mill Valley, CaliforniaGoogle Scholar
- Bennett EA, Coleman LE, Tsui C, Pittart WS, Devine SE: Natural genetic variation caused by transposable elements in humans. Genetics. 2004, 168: 933-951. 10.1534/genetics.104.031757.PubMedPubMed CentralView ArticleGoogle Scholar
- Nekrutenko A, Li W-H: Transposable elements are found in a large number of human protein-coding genes. Trends in Genetics. 2001, 17 (11): 619-621. 10.1016/S0168-9525(01)02445-3.PubMedView ArticleGoogle Scholar
- Deininger PL, Batzer MA: Mammalian retroelements. Genome Research. 2002, 12: 1455-1465. 10.1101/gr.282402.PubMedView ArticleGoogle Scholar
- Holmes I: Transcendent elements: whole-genome transposon screens and open evolutionary questions. Genome Research. 2002, 12: 1152-1155. 10.1101/gr.453102.PubMedView ArticleGoogle Scholar
- Bao Z, Eddy SR: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Research. 2002, 12: 1269-1276. 10.1101/gr.88502.PubMedPubMed CentralView ArticleGoogle Scholar
- Campagna D, Romualdi C, Vitulo N, Del Favero M, Lexa M, Cannata N, Valle G: RAP: a new computer program for de novo identification of repeated sequences in whole genomes. Bioinformatics. 2005, 21 (5): 582-588. 10.1093/bioinformatics/bti039.PubMedView ArticleGoogle Scholar
- Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21 (Suppl 1): i351-i358. 10.1093/bioinformatics/bti1018.PubMedView ArticleGoogle Scholar
- Edgar RC, Myers EW: PILER: identification and classification of genomic repeats. Bioinformatics. 2005, 21 (Suppl 1): i152-i158. 10.1093/bioinformatics/bti1003.PubMedView ArticleGoogle Scholar
- Taneda A: Adplot: detection and visualization of repetitive patterns in complete genomes. Bioinformatics. 2004, 20 (5): 701-708. 10.1093/bioinformatics/btg470.PubMedView ArticleGoogle Scholar
- Caspi A, Pachter L: Identification of transposable elements using multiple alignments of related genomes. Genome Research. 2006, 16: 260-270. 10.1101/gr.4361206.PubMedPubMed CentralView ArticleGoogle Scholar
- Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud D, Ashburner M, Anxolabehere D: Combined evidence annotation of transposable elements in genome sequences. PLoS Comp Biol. 2005, 1 (2): e22-10.1371/journal.pcbi.0010022.View ArticleGoogle Scholar
- Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. 1996–2004, [http://www.repeatmasker.org]
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.PubMedView ArticleGoogle Scholar
- The Ensembl archive of swine (Sus scrofa) sequences. [ftp://ftp.ensembl.org/pub/traces/sus_scrofa/fasta/]
- The Gene Index Project. [http://compbio.dfci.harvard.edu/tgi/]
- Pevzner PA, Tang H, Tesler G: De novo repeat classification and fragment assembly. Genome Research. 2004, 14: 1786-1796. 10.1101/gr.2395204.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhi D, Raphael BJ, Price AL, Tang H, Pevzner PA: Identifying repeat domains in large genomes. Genome Biology. 2006, 7: R7-10.1186/gb-2006-7-1-r7.PubMedPubMed CentralView ArticleGoogle Scholar
- Shedlock AM, Okada N: SINE insertions: powerful tools for molecular systematics. BioEssays. 2000, 22: 148-160. 10.1002/(SICI)1521-1878(200002)22:2<148::AID-BIES6>3.0.CO;2-Z.PubMedView ArticleGoogle Scholar
- Shimamura M, Abe H, Nikaido M, Ohshima K, Okada N: Geneology of families of SINEs in Cetaceans and Artiodactyls: The presence of a huge superfamily of tRNAGLU-derived families of SINEs. Mol Biol Evol. 1999, 16 (8): 1046-1060.PubMedView ArticleGoogle Scholar
- NCBI BLAST cow sequences. [http://www.ncbi.nlm.nih.gov/genome/seq/BlastGen/BlastGen.cgi?taxid=9913]
- Bovine Genome Project. [http://www.hgsc.bcm.tmc.edu/projects/bovine/]
- The R Project for Statistical Computing. [http://www.r-project.org/]
- BAC PAC Resources, Children's Hospital Oakland Research Institute (CHORI). [http://bacpac.chori.org/mporcine44.htm]
- Hafez EE, Ghany AGAA, Zaki EA: LTR- retrotransposons-based molecular markers in cultivated Egyptian cottons G. barbadense L. African Journal of Biotechnology. 2006, 5: 1200-1204.Google Scholar
- Comparative Vertebrate Sequencing Initiative. [http://www.nisc.nih.gov/]
- Cooper GM, Stone EA, Asimenos G, NISC Comparative Sequencing Program, Green ED, Batzoglou S, Sidow A: Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005, 15: 901-13. 10.1101/gr.3577405.PubMedPubMed CentralView ArticleGoogle Scholar
- Korf I, Yandell M, Bedell J: BLAST. 2003, O'Reilly & Associates, 143-Google Scholar
- RECON software package. [http://selab.janelia.org/recon.html]
- Rozen S, Skaletsky HJ: Primer3 on the WWW for general users and for biologist programmers. Bioinformatics Methods and Protocols Methods in Molecular Biology. Edited by: Krawetz S, Misener S. 2000, Humana Press, Totowa, NJ, 365-386. [http://frodo.wi.mit.edu/primer3/primer3_code.html]Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.