Integration of mate pair sequences to improve shotgun assemblies of flow-sorted chromosome arms of hexaploid wheat
© Belova et al.; licensee BioMed Central Ltd. 2013
Received: 26 July 2012
Accepted: 22 March 2013
Published: 4 April 2013
The assembly of the bread wheat genome sequence is challenging due to allohexaploidy and extreme repeat content (>80%). Isolation of single chromosome arms by flow sorting can be used to overcome the polyploidy problem, but the repeat content cause extreme assembly fragmentation even at a single chromosome level. Long jump paired sequencing data (mate pairs) can help reduce assembly fragmentation by joining multiple contigs into single scaffolds. The aim of this work was to assess how mate pair data generated from multiple displacement amplified DNA of flow-sorted chromosomes affect assembly fragmentation of shotgun assemblies of the wheat chromosomes.
Three mate pair (MP) libraries (2 Kb, 3 Kb, and 5 Kb) were sequenced to a total coverage of 89x and 64x for the short and long arm of chromosome 7B, respectively. Scaffolding using SSPACE improved the 7B assembly contiguity and decreased gene space fragmentation, but the degree of improvement was greatly affected by scaffolding stringency applied. At the lowest stringency the assembly N50 increased by ~7 fold, while at the highest stringency N50 was only increased by ~1.5 fold. Furthermore, a strong positive correlation between estimated scaffold reliability and scaffold assembly stringency was observed. A 7BS scaffold assembly with reduced MP coverage proved that assembly contiguity was affected only to a small degree down to ~50% of the original coverage.
The effect of MP data integration into pair end shotgun assemblies of wheat chromosome was moderate; possibly due to poor contig assembly contiguity, the extreme repeat content of wheat, and the use of amplified chromosomal DNA for MP library construction.
KeywordsWheat Assembly Scaffold Mate-pair MDA Improvement
Bread wheat is one of the most important food crops worldwide. However, present wheat production is far from the expected increased global demand in the near future [1, 2]. Development of better yielding varieties with improved adaptation to the new climatic challenges is therefore important for global food security. A ‘tool’ with a great potential to revolutionize wheat breeding and production is a publicly available reference genome sequence. Genome sequences enable cost-effective identification of genomic variation which subsequently can be used to improve agricultural traits of interest through marker-assisted selection (MAS) and genomic selection programs . A rapidly increasing number of genomes from important food crops are becoming available. In 2011 potato and cacao [4, 5], in 2010 soybean , and in 2009 maize, sorghum and cucumber genomes were published [7–9]. However, even though wheat is one of the top five food commodities in the world, a wheat genome sequence is not yet available.
The main reason why the wheat genome sequencing is lagging behind is related to technical challenges due to large size (17Gb) and the complexity of the hexaploid wheat genome. Bread wheat is allohexaploid and carries three distinct, but closely related homoelogous genomes (2n = 6x = 42, AABBDD) [10, 11]. A distinction between homoeolog sequences in post sequencing processing of genomic sequence data is essentially impossible. Fortunately, the hexaploid wheat genome can be dissected to small parts by flow cytometric sorting of single chromosomes and chromosome arms [12, 13]. This technological breakthrough has enabled production of wheat chromosome specific BAC-libraries  and facilitated construction of physical maps of hexaploid wheat chromosomes . For some genomic applications, such as shotgun sequencing, large amount of DNA are required. In order to obtain sufficient DNA to sequence purified chromosome arms, millions of chromosomes must be sorted, a process, which is highly labor intensive . Including an amplification step of flow-sorted DNA can significantly reduce the labor and consequently the cost of acquiring chromosome specific DNA for sequencing. Multiple displacement amplification (MDA) is the most common method for genome amplification for sequencing purposes as MDA generate relatively long amplification products (majority between 5-20 kb) . However, MDA is known to give rise to chimeras, which can bring down the utility of the amplified DNA .
Shotgun sequencing of MDA DNA from flow-sorted chromosome arms, especially in combination with genetic maps and synteny information, has proven to be a highly cost effective way of gene discovery and construction of syntenic chromosome assemblies [19–21]. Unfortunately, the fragmentation level of the shotgun assemblies has been very high, which limits the information value of the assemblies. De novo assemblies of 7DS and 7BS using Illumina paired-end (PE) sequences with a chromosome arm coverage of 30-34×, resulted roughly in 600,000-1,000,000 contigs per chromosome arm, an N50 of ~500-1200 bp, and maximum contig sizes of just over 30,000 bp [21, 22]. Consequently, many contigs do not contain complete gene sequences, and the relative order of genes can only be identified for a small subset of genes found on contigs containing multiple genes (i.e. multigene contigs).
High levels of DNA sequence assembly fragmentation is closely associated with the repeat content of the genome , and the wheat genome is extreme with respect to repeat content, having more than 80% repetitive DNA . One way of reducing assembly fragmentation is to include additional sequencing libraries with large insert sizes, referred to as mate pair (MP) libraries . MP reads can vary in insert sizes between 1-20 kb and the idea of these ‘long jump’ paired sequences is to span repetitive regions that cause assembly fragmentation, and thereby link multiple contigs into longer scaffolds. This will improve the information value of an assembly by (1) improving the assembly contiguity (2) increasing the proportion of full length genes contained in single sequences (i.e. link exons from different contigs), and (3) increase the number of linearly ordered genes.
A number of recent publications describe the effect of MP data on assemblies of plant genomes [4, 9, 25]. One example is the potato genome assembly, which had on average an N50 increase of 37 Kb for every 1 Kb increase in MP insert size . Although the potato genome (1C = 865 Mbp) has a relatively high repeat content (total repeat content ≈ 62%, TE-derived repeats ≈ 32%), it does not compare to the hexaploid wheat genome (1C = 17,000 Mbp) that has >80% of TE-derived repetitive DNA . It is thus not clear to what extent MP data may improve shotgun assemblies of genomes with extreme repeat content such as wheat. Additionally, the utility of MP data from MDA DNA from flow-sorted chromosomes is unknown. The aim of this paper is therefore to study the effects of MP from MDA DNA on assembly contiguousness and gene content in shotgun assemblies of a flow-sorted hexaploid wheat chromosome.
Preparation of DNA from chromosome arms 7BS and 7BL
A double ditelosomic line of wheat Triticum aestivum L. cv. Chinese Spring carrying both arms of chromosome 7B as telosomes (2n = 40 + 2t7BS + 2t7BL) was used to purify the 7BS and 7BL arms. The seeds were provided by Dr. Bikram Gill (Kansas State University, Manhattan, USA). The chromosome arms were purified by flow cytometry. 68,000 and 45,000 of 7BS and 7BL arms, respectively, corresponding to 50 ng of DNA, were isolated in several batches. In order to estimate contamination with other chromosomes, 1000 chromosomes were sorted onto a microscope slide and used for fluorescence in situ hybridization (FISH) with probes for Afa family and telomeric repeats. Batches with the highest purity of the sorted fraction (93 and 88% for 7BS and 7BL, respectively) were used for further processing. DNA was purified and subsequently amplified using Illustra GenomiPhi V2 DNA Amplification Kit (GE Healthcare, Chalfont St. Giles, United Kingdom) as previously described . Three independent amplifications were performed for each arm to reduce amplification bias. Totally, 15.9 and 14 micrograms were prepared for 7BS and 7BL, respectively.
Sequencing library construction
PE libraries with a mean insert size of ~350 bp (Illumina protocol) and 2 Kb MP libraries (in-house modified Roche MP protocol) were constructed and sequenced at Fasteris SA (Geneva, Switzerland). The PE reads were 100 bp, while the 2 Kb MP reads were 45 bp. 3 Kb and 5 Kb MP libraries were prepared according to “Mate Pair Library v2 Sample Preparation Guide”  at Aarhus University (Denmark). The read length of the 3 and 5 Kb MP libraries were trimmed to 35 bp. All MP libraries were sequenced using HiSeq2000 technology (Illumina) according to manufacturer’s recommendations.
Contigs were assembled with PE reads using ABySS  which is based on a de Bruijn graph approach. This method collects the information generated from fix-length words of k –mers shared by overlapping reads . Initially, multiple assemblies were generated using different values of k and assessed using assembly quality statistics such as N50, maximum contig length, number of contigs in the assembly and the total amount of bases in the assembly. A k-mer length of 71 was chosen as the optimal value. A seed value of 150 was used (s parameter) and a minimum of 10 pairs were required to join contigs (n parameter). After assembly, contigs shorter than 200 bp were removed to generate a filtered dataset for scaffolding.
To accurately determine the mean insert size and insert size variation of each MP library, we mapped all mate pair reads back to the 7B contigs using BWA v0.6.0  with the parameters BWA aln –t 10 -q 10. Based on the BWA results we identified the number of MP reads aligning to contigs, the proportion of MP read pairs mapping to the same or different contigs, and the orientation of the MP reads that mapped to the same contig. We also assessed if the genomic origin of MP reads were biased towards different fractions of the genome (i.e. repeat or conserved fraction). This was done by mapping reads to an in house repeat content database (TREP10 and the repeats identified in Choulet et al. 2011) and the NCBI nr database.
We initially tested three software packages for scaffolding of pre-assembled contigs: ABySS, SOAPdenovo and SSPACE. Unfortunately, we were not able to scaffold contigs using ABySS due to the large proportion of MP reads that mapped in forward-forward direction (see results and discussion for more details). SOAP and SSPACE both produced scaffolds, but as the N50 and gene space assembly statistics of SSPACE assemblies exceeded SOAPdenovo at all parameters tested, we chose to use SSPACE for further investigation of the effect of MP on shotgun assemblies. In SSPACE the key parameter that defines the stringency of the scaffolding is ‘number of links’ (k), i.e. number of independent read pairs that uniquely support a connection between two contigs. We performed SSPACE scaffolding with k equal to 3, 5, 7, 10, 15 and 20.
The protein annotation (v1.2) excluding splice variants of Brachypodium distachyon (referred to as Brachypodium) was used as query sequences in a TBLASTN search  to assess gene content in contigs and scaffold assemblies. Blast result filtering were carried out as follows: (1) Only query proteins having at least one exon hit with minimum 30 amino acid length and a minimum per cent identity of 70 were considered in the analyses. (2) Duplicated exon hits on one contig/scaffold were removed. Duplicate hits were defined as two or more query hits with identical query start and query end positions, identical mismatches, identical gap length, and identical hit identity. (3) For each query protein, the mean e-value of all hits were calculated and overlapping exon hits (overlapping >5 bp) from proteins with higher e-value were discarded. (4) Two types of gene coverage were calculated: ‘total coverage’ and ‘adjusted coverage’. Total coverage was calculated as the total length of all the hits from a protein query relative to the query sequence length. Adjusted coverage was calculated as the number of unique query amino acid residues with a blast hit in the target sequence(s). To exclude gene hits from repetitive DNA (e.g. TE-associated coding regions) and spurious protein homology, genes with total coverage of >5 and genes with <10% adjusted coverage was not considered in any analyses.
A gene fragmentation index (GFI) was estimated to compare gene space fragmentation in different assemblies. The average blast hit coverage of Brachypodium gene homologs in the entire assembly, referred to as assembly coverage (AC) represents an approximation of the theoretically optimal situation, when each gene is contained within a single DNA sequence (i.e. no fragmentation). The AC estimate was then compared with the average Brachypodium gene coverage per contig or scaffolds, referred to as sequence coverage (SC), to calculate a gene fragmentation index (GFI) defined as (AC-SC)/AC. Hence, the GFI measures gene fragmentation as the difference in percent between SC and AC, and approaches 0 as SC and AC become similar.
Evaluation of scaffold reliability
As we cannot directly measure the level of scaffolding errors due to the lack of any reference assembly, we estimated the level of scaffold errors by (1) utilizing information from synteny with Brachypodium and (2) comparing the 7BL scaffold assemblies with the sequence content of 50 random BAC clones from 7BL. Because the number of chimeric contigs is assumed to be very low, the level of errors introduced by scaffolding can be estimated by comparing the synteny levels in contigs with synteny in scaffolds of similar sizes. If homologs of two Brachypodium genes are present in a single wheat contig, these homologs have a probability of representing closely linked loci (referred to as neighbouring genes) on the Brachypodium chromosome. This probability depends on the synteny level between wheat and Brachypodium in that exact region. If the scaffolding process does not introduce structural assembly errors, the proportion of neighbouring Brachypodium homologs should be similar in contigs and scaffolds of similar size. In our analyses we defined a neighbour gene pair as genes originating from Brachypodium loci with < =50 genes distance from each other. A bootstrap test was performed to test if the difference in proportions of neighbouring loci in contigs and scaffolds were likely to occur as a consequence of random sampling error. One thousand contig datasets were re-sampled (with replacement) and the P-value was calculated as the proportion of bootstrapped contig datasets with equal or lower proportions of neighbour genes as found in scaffolds.
In addition to the synteny approach we also utilized the sequence content of 50 BAC clones originating from 7BL to evaluate scaffold reliability (See Additional file 1 for assembly methods and Additional file 2 for sequence contigs). Raw sequencing reads from 7BL BACs are available upon request. We first identified scaffolds containing sequences derived from the BACs by BLASTN, using a threshold of >99% identity across minimum 2.5 Kb. With the assumption that identified scaffolds truly are derived from one of these 50 BACs, an estimate of scaffold reliability can therefore be defined as the proportion of contigs within a scaffold that originate from a certain BAC. To assess if contigs in scaffolds originate from the BAC we used BLASTN and defined a significant contig-to-BAC hit as having >99% identity across >50% of the contig length. Because longer scaffolds are more likely contain sequences belonging to multiple BACs (i.e. lower proportion of contigs originating from a single BAC) and scaffolding stringency affect scaffold length distribution, we normalized the scaffold reliability by dividing on scaffold length (i.e. proportion of contigs in a scaffold with a BLASTN hit to BAC/scaffold length). Normalized scaffold reliability is hereafter referred to as scaffold reliability index (SRI).
Shotgun assembly of 7BS and 7BL
Contig assembly summary statistics
Mean length (bp)
Max length (bp)
Mate pair data
Summary table of mate pair sequence data
Mate pair library
Mapping of MP reads classified as having RF, FR, and FF/RR orientation to the repeat database showed no apparent difference in repeat content (~36% mapped to repeat database, data not shown). It is not uncommon to have PE oriented reads (i.e. FR) in MP libraries (c.f. mate pair library sample preparation guide), but the occurrence of FF/RR reads is more difficult to explain. Hence, we specifically analysed the content of the FF/RR-mapped reads by BLASTN against the NCBI nr database to assess if the FF/RR reads originated from other sources than wheat DNA. All target hits with >80% identity in the NCBI nr nucleotide database were collected and the species information of each hit was extracted. More than 96% of the reads had a best BLAST hit to other grasses (Additional file 6), implying that the FF/RR-mapping reads truly were derived from wheat DNA.
Effect of MP integration on 7B assemblies
Scaffold assembly summary statistics
MP used (%)
Contigs in scaffolds
Sum scaffolds (Mbp)
Assembly N50* (Kb)
Even though only 1-2% of MP data was used for scaffolding the assembly N50 was improved substantially at low stringency levels; at k = 3 assembly N50 increased by 6 and 7.2 fold for 7BS and 7BL, respectively (scaffolds are available upon requests). However, as expected, the reduction in assembly fragmentation was dramatically affected when k was increased (Table 3). The number of contigs incorporated in scaffolds decreased from 42% to 8% and from 52% to 11% for 7BL and 7BS, respectively, and the total number of scaffolds decreased by ~70% when increasing k from 3 to 20. Furthermore the total number of residues included in scaffolds was reduced from ~40% to 18% of the total chromosome length (Table 3). The mean scaffold length however, was not affected much when k was increased due to a change in the distribution of length of contigs included in scaffolds; as k increased, the proportion of long contigs included in scaffolds also increased. Scaffold content was strongly biased towards gene containing contigs. Although a maximum of 40-50% of 7BS and 7BL contigs were incorporated into scaffolds (Table 3), as many as 75% (k = 20) and 95% (k = 3) of sequences containing full length genes (> = 70% Brachypodium homolog coverage) were included in scaffolds.
Gene content in ABySS and SSPACE assemblies
Brachypodium homolog coverage1(mean)
Full length genes
(>30 aa, >70 pident)
Scaffold content reliability
Effect of MP coverage
Modest effect of MP integration in 7B shotgun assemblies
De novo assembly of shotgun sequences from large plant genomes like wheat remains a challenging task, mainly due to prevalence of repetitive DNA . Its presence can lead to complex, misassembled rearrangements and the collapse of reads coming from distinct copies of repetitive DNA into single assembled sequences  which results in contracted and fragmented assemblies. Ideally, genomes with high repeat content should therefore be assembled using reads longer then the length of the repeats. Wheat consists of >80% TE-derived repetitive DNA with a mean length of ~4600 bp , far longer than the read length of sequence reads from next generation sequencing data. One assembly strategy to reduce the fragmentation generated by repetitive DNA is therefore to link neighboring chromosome regions belonging to different sequence contigs with long insert size MP reads that bridge the repetitive segments.
Our results show that integration of MP data in shotgun assemblies of flow-sorted wheat chromosomes improves assembly contiguity and decrease gene space fragmentation, but that the degree of assembly statistics improvement is greatly affected by scaffolding stringency (Table 1, Table 3). For example, at low stringency the assembly N50 increased by 6–7.5-fold, while at the highest stringency (k = 20) N50 was only increased by 1.3-1.8 compared to contig assemblies. Although a negative correlation between stringency and assembly improvements also was observed for the gene space statistics, the MP effect on gene space were less affected by scaffold assembly stringency compared to the N50 statistic (Table 4). This is likely explained by the fact that genes are more often found in longer contigs, hence even at high stringency a large proportion of the gene containing contigs were joined into scaffolds.
Even for the least stringent scaffold assembly, the assembly improvement for 7B does not seem to be in the same magnitude as reported for some recently sequenced plant genomes. For the cucumber (1C = 367 Mbp) and cacao (1C = 430 Mbp) genomes, addition of long insert libraries improved N50 by 14-fold (172 Kb) and 60-fold (473.8 Kb), respectively [4, 9]. Assembly metrics from these shotgun assemblies is difficult to compare directly due to the use of different sequencing platforms, different PE and MP libraries, and different sequencing coverage. However, it is possible to estimate the MP-effectiveness based on the total gain in N50 per Kb of MP insert size length (N50 gain/max MP insert size). For cucumber and cacao these numbers are 14/2 Kb and 60/8 Kb compared to for example 5/5 Kb in wheat k = 5.
Potato is another recently published plant genome  for which different MP libraries were added to improve assembly. Integration of a 2 Kb MP library produced a 3-fold increase in N50, while using both 2 Kb + 5Kb MP libraries increased the N50 by 8-fold. Comparable metrics were obtained in 7B SSPACE assemblies using k = 5; the 2 Kb MP libraries produced 1.9- and 2.6-fold changes in N50 for 7BS and 7BL, respectively, while the final N50 fold change was ~5 after addition of the 5 Kb MP data. Thus, even though the actual scaffold N50 was much higher in potato after adding 2 + 5 Kb MP libraries (173 Kb) compared to the 7B assemblies with 2 + 3 + 5 Kb MP data (7BS = 11 Kb/7BL = 8.3 Kb), the relative N50 gain was not that different.
The modest impact of MP data in the chromosome 7B assemblies compared to other plant genomes could be explained by the inherent repeat characteristics of the wheat genome. While the wheat genome consists of >80% TE-derived repetitive DNA , potato, cacao and cucumber genomes are much smaller and all have <35% TE-derived repetitive DNA . This difference will undoubtedly cause large differences in the effect of MP data on assembly contiguity. Another reason for the relatively low impact of MP data on the scaffold N50 could be related to the quality of the MP data. Only a small fraction (~1%) of the MP reads from MDA chromosomal DNA satisfied requirements of SSPACE for being included in scaffold construction, mostly due to a very high portion of reads having a different orientation or discrepancy between expected and estimated insert size for MP reads (Table 2). Lastly, the fragmentation level of the contig assembly is important for the MP effect. It is evident that small contigs have less chance of being put into scaffolds due to the fact that small sequences will have few MP reads originating from them. An improved contig assembly N50, for example by increasing PE sequencing coverage or adding additional PE libraries with different insert sizes, could therefore be a good strategy to be able to include a larger proportion of the contig assembly into scaffolds, and hence increase scaffold N50.
Scaffold reliability and assembly stringency
Assembly errors can be introduced at the scaffolding stage when the software has to choose between two similar solutions and falsely connects contigs from non-adjacent chromosome regions or links two adjacent contigs in the wrong orientation. Our synteny- and BAC-based scaffold reliability estimates provides measures of reliability at two types of different genomic landscapes. Our synteny approach did not detect signatures of erroneous contig joining in small scaffolds from gene dense regions in the assemblies; however when using sequence contents from 50 BACs to assess scaffold reliability a strong correlation between estimated scaffold reliability and scaffold assembly stringency was observed (Figure 3). We interpret these differences in test conclusions to reflect that scaffolds from non-genic genomic regions are more prone to contain errors (especially at low stringency parameters), likely due to higher content of repetitive DNA in the intragenic space.
The origin of erroneous MP orientation
The MP data contained a high percentage of forward-reverse reads (i.e. PE) as well as contamination of read pairs that map in the same direction (FF/RR) (Table 2). The high proportion of FR reads in our MP data is most likely explained by contamination with PE reads, which represent non-biotinylated fragments that were not removed during the wash step in library preparation (c.f. mate pair library sample preparation guide). This is supported by the fact that these PE oriented reads have a smaller estimated insert size of around 500 bp (Figure 1, Additional files 4 and 5). The origin of MP reads oriented in FF/RR direction, which make up ~38% of the total MP data, is less obvious. There is no evidence for FF/RR reads containing non-wheat DNA contamination, nor do the FF/RR reads have increased proportions of reads from TE-repetitive DNA. Moreover, since the 2 Kb and 3/5 Kb libraries were produced and sequenced by different labs using different protocols it is highly unlikely that systematic technical errors have been introduced. One possible explanation to the high FF/RR fraction is that they originate from rearranged DNA generated in the multiple displacement amplification (MDA) step, which was used to increase DNA amount after chromosome flow-sorting. It has been shown that MDA generates genomic rearrangement in the amplified DNA with a frequency of 1 rearrangement per 10 Kb, and majority of chimeras are inverted sequences . In a de-novo assembly of a single bacteria cell MDA, >50% of the MP pairs were chimeric pairs . Hence, even though MDA has proven to be very useful to prepare DNA from flow sorted chromosomes for single-end and short insert size PE sequencing [20, 21, 34], the use of MDA DNA in long insert size MP library construction and scaffolding might not be an optimal strategy for wheat genome scaffolding. Another limitation due to the high proportion of FF/RR pairs is that it restricted us from using any type of scaffold-assembler. For example, when trying to integrate MP data using ABySS, the software did not handle the large proportion of the MP reads with non-MP orientation.
The wheat chromosome 7B was sequenced and assembled using PE reads with an insert size of ~350 bp in combination with 2, 3 and 5 Kb MP libraries. MP integration improved both assembly contiguity and reduced fragmentation of the gene space, but only to a modest extent. Scaffold reliability increased with increasing assembly stringency, emphasizing the need to use high stringency scaffolding parameters to avoid scaffolding errors. Scaffold assemblies of 7BS with reduced MP coverage showed that MP sequence coverage of ~40-50× would be sufficient to produce assemblies with slightly reduced N50 but comparable results for gene space improvement compared to the full coverage assembly (89×). In conclusion, MP assembly improvements was lower than for other recently assembled plant genomes, possibly due to the extreme repeat content of wheat, high fragmentation of contig assemblies, and the use of MDA DNA to construct MP libraries.
The project was funded by grants from the Norwegian Research Council (project no. 199387/I99) and Graminor A/S to Odd-Arne Olsen. Hana Šimková and Jaroslav Doležel were supported by the Czech Science Foundation (award no. P501/12/2554) and by Ministry of Education, Youth and Sports of the Czech Republic and the European Regional Development Fund (Operational Programme Research and Development for Innovations No. ED0007/01/01).
- Global Perspective Studies Unit - Food and Agriculture Organization of the United Nation: World agriculturae: towards 2030/2050 - Interim report - Prospects for food, nutrition, agriculture and major commodity groups. 2006, Rome: Food and Agriculture Organization of the United NationGoogle Scholar
- The Government Office for Science: Foresight: The Future of Food and Farming - Final Project Report. 2011, London, United Kingdom: Government Office for ScienceGoogle Scholar
- Jannink J-L, Lorenz AJ, Iwata H: Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics. 2010, 9 (2): 166-177. 10.1093/bfgp/elq001.View ArticlePubMedGoogle Scholar
- Argout X, Salse J, Aury JM, Guiltinan MJ, Droc G, Gouzy J, Allegre M, Chaparro C, Legavre T, Maximova SN: The genome of Theobroma cacao. Nat Genet. 2011, 43 (2): 101-108. 10.1038/ng.736.View ArticlePubMedGoogle Scholar
- The Potato Sequencing Consortium: Genome sequence and analysis of the tuber crop potato. Nature. 2011, 475 (7355): 189-195. 10.1038/nature10158.View ArticleGoogle Scholar
- Schmutz J, Cannon SB, Schlueter J, Ma JX, Mitros T, Nelson W, Hyten DL, Song QJ, Thelen JJ, Cheng JL: Genome sequence of the palaeopolyploid soybean (vol 463, pg 178, 2010). Nature. 2010, 465 (7294): 120-10.1038/nature08957.View ArticleGoogle Scholar
- Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, Liang C, Zhang J, Fulton L, Graves TA: The B73 Maize Genome: Complexity, Diversity, and Dynamics. Science. 2009, 326 (5956): 1112-1115. 10.1126/science.1178534.View ArticlePubMedGoogle Scholar
- Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A: The Sorghum bicolor genome and the diversification of grasses. Nature. 2009, 457 (7229): 551-556. 10.1038/nature07723.View ArticlePubMedGoogle Scholar
- Huang S, Li R, Zhang Z, Li L, Gu X, Fan W, Lucas WJ, Wang X, Xie B, Ni P: The genome of the cucumber, Cucumis sativus L. Nat Genet. 2009, 41 (12): 1275-1281. 10.1038/ng.475.View ArticlePubMedGoogle Scholar
- Dvorak J, Terlizzi P, Zhang HB, Resta P: The evolution of polyploid wheats: identification of the A genome donor species. Genome. 1993, 36 (1): 21-31. 10.1139/g93-004.View ArticlePubMedGoogle Scholar
- Dvorak J, Zhang HB: Variation in repeated nucleotide sequences sheds light on the phylogeny of the wheat B and G genomes. Proc Natl Acad Sci USA. 1990, 87 (24): 9640-9644. 10.1073/pnas.87.24.9640.PubMed CentralView ArticlePubMedGoogle Scholar
- Doležel J, Kubalakova M, Paux E, Bartos J, Feuillet C: Chromosome-based genomics in the cereals. Chromosome Res. 2007, 15 (1): 51-66. 10.1007/s10577-006-1106-x.View ArticlePubMedGoogle Scholar
- Vraná J, Kubalakova M, Simkova H, Cihalikova J, Lysak MA, Dolezel J: Flow sorting of mitotic chromosomes in common wheat (Triticum aestivum L.). Genetics. 2000, 156 (4): 2033-2041.PubMed CentralPubMedGoogle Scholar
- Šafář J, Simkova H, Kubalakova M, Cihalikova J, Suchankova P, Bartos J, Dolezel J: Development of chromosome-specific BAC resources for genomics of bread wheat. Cytogenet Genome Res. 2010, 129 (1–3): 211-223.PubMedGoogle Scholar
- Paux E, Sourdille P, Salse J, Saintenac C, Choulet F, Leroy P, Korol A, Michalak M, Kianian S, Spielmeyer W: A physical map of the 1-gigabase bread wheat chromosome 3B. Science. 2008, 322 (5898): 101-104. 10.1126/science.1161847.View ArticlePubMedGoogle Scholar
- Doležel JŠH, Kubaláková M, Šafář J, Suchánková P, Číhalíková J, Bartoš J, Valárik M: Chromosome genomics in the Triticeae. Genetics and Genomics of the Triticeae. Edited by: Feuillet C, Muehlbauer G. 2009, New York: Springer, 285-316.View ArticleGoogle Scholar
- Simková H, Svensson JT, Condamine P, Hribova E, Suchankova P, Bhat PR, Bartos J, Safar J, Close TJ, Dolezel J: Coupling amplified DNA from flow-sorted chromosomes to high-density SNP mapping in barley. BMC Genomics. 2008, 9: 294-10.1186/1471-2164-9-294.PubMed CentralView ArticlePubMedGoogle Scholar
- Lasken RS, Stockwell TB: Mechanism of chimera formation during the Multiple Displacement Amplification reaction. BMC Biotechnol. 2007, 7: 19-10.1186/1472-6750-7-19.PubMed CentralView ArticlePubMedGoogle Scholar
- Mayer KF, Martis M, Hedley PE, Simkova H, Liu H, Morris JA, Steuernagel B, Taudien S, Roessner S, Gundlach H: Unlocking the barley genome by chromosomal and comparative genomics. Plant Cell. 2011, 23 (4): 1249-1263. 10.1105/tpc.110.082537.PubMed CentralView ArticlePubMedGoogle Scholar
- Hernandez P, Martis M, Dorado G, Pfeifer M, Galvez S, Schaaf S, Jouve N, Simkova H, Valarik M, Dolezel J: Next-generation sequencing and syntenic integration of flow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content. Plant J. 2012, 69 (3): 377-386. 10.1111/j.1365-313X.2011.04808.x.View ArticlePubMedGoogle Scholar
- Berkman PJ, Skarshewski A, Lorenc MT, Lai K, Duran C, Ling EY, Stiller J, Smits L, Imelfort M, Manoli S: Sequencing and assembly of low copy and genic regions of isolated Triticum aestivum chromosome arm 7DS. Plant Biotechnol J. 2011, 9 (7): 768-775. 10.1111/j.1467-7652.2010.00587.x.View ArticlePubMedGoogle Scholar
- Berkman PJ, Skarshewski A, Manoli S, Lorenc MT, Stiller J, Smits L, Lai K, Campbell E, Kubalakova M, Simkova H: Sequencing wheat chromosome arm 7BS delimits the 7BS/4AL translocation and reveals homoeologous gene conservation. Theor Appl Genet. 2012, 124 (3): 423-432. 10.1007/s00122-011-1717-2.View ArticlePubMedGoogle Scholar
- Treangen TJ, Salzberg SL: Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet. 2012, 13 (1): 36-46.Google Scholar
- Choulet F, Wicker T, Rustenholz C, Paux E, Salse J, Leroy P, Schlub S, Le Paslier MC, Magdelenat G, Gonthier C: Megabase level sequencing reveals contrasted organization and evolution patterns of the wheat gene and transposable element spaces. Plant Cell. 2010, 22 (6): 1686-1701. 10.1105/tpc.110.074187.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu X, Pan S, Cheng S, Zhang B, Mu D, Ni P, Zhang G, Yang S, Li R, Wang J: Genome sequence and analysis of the tuber crop potato. Nature. 2011, 475 (7355): 189-195. 10.1038/nature10158.View ArticlePubMedGoogle Scholar
- Mate Pair Library v2 Sample Preparation Guide. [https://shell.cgrb.oregonstate.edu/sites/default/files/Files/Docs/Illumina/prep/MatePair_v2_2-5kb_SamplePrep_Guide_15008135_A.pdf]
- Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I: ABySS: a parallel assembler for short read sequence data. Genome Res. 2009, 19 (6): 1117-1123. 10.1101/gr.089532.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Compeau PE, Pevzner PA, Tesler G: How to apply de Bruijn graphs to genome assembly. Nat Biotechnol. 2011, 29 (11): 987-991. 10.1038/nbt.2023.View ArticlePubMedGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.View ArticlePubMedGoogle Scholar
- Li WL, Zhang P, Fellers JP, Friebe B, Gill BS: Sequence composition, organization, and evolution of the core Triticeae genome. Plant J. 2004, 40 (4): 500-511. 10.1111/j.1365-313X.2004.02228.x.View ArticlePubMedGoogle Scholar
- Zhu W, Ouyang S, Iovene M, O'Brien K, Vuong H, Jiang J, Buell CR: Analysis of 90 Mb of the potato genome reveals conservation of gene structures and order with tomato but divergence in repetitive sequence composition. BMC Genomics. 2008, 9: 286-10.1186/1471-2164-9-286.PubMed CentralView ArticlePubMedGoogle Scholar
- Rodrigue S, Malmstrom RR, Berlin AM, Birren BW, Henn MR, Chisholm SW: Whole genome amplification and de novo assembly of single bacterial cells. PLoS One. 2009, 4 (9): e6864-10.1371/journal.pone.0006864.PubMed CentralView ArticlePubMedGoogle Scholar
- Vitulo N, Albiero A, Forcato C, Campagna D, Dal Pero F, Bagnaresi P, Colaiacovo M, Faccioli P, Lamontanara A, Simkova H: First Survey of the Wheat Chromosome 5A Composition through a Next Generation Sequencing Approach. PLoS One. 2011, 6 (10): e26421-10.1371/journal.pone.0026421.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.