Identification and retrieval of sugarcane LTR retrotransposon sequences
All BACs used are from the R570 sugarcane cultivar library . BACs sequenced for the BIOEN Project  and public sugarcane BAC sequences available at the National Center for Biotechnology Information (NCBI) website as at 01/02/2011 were screened for full-length LTR elements using LTR_STRUC  with the most thorough stringency (1). Sixty sequences were retrieved and provisionally assigned to the Gypsy or Copia superfamily by submission against cores in the Gypsy Database (GyDB)  using BLASTX. To determine whether the sequences were complete elements, we identified target site duplications (TSDs) by submitting the full length sequences as a query and subject to a blast2seq  on the NCBI website.
Sugarcane LTR-RTs, including the probes used for fluorescence in situ hybridization, were assigned to previously described plant LTR lineages [6–8] by phylogenetic analysis using the translated reverse transcriptase (RT) domain excised from all the sugarcane LTR-RTs and published RT sequences.
For both phylogenies we downloaded RT alignments from the Gypsy database (GyDB) , and removed non-plant sequences. Gypsy sequences were also taken from Du et al. (2010) , Copia sequences were taken from Wicker and Keller (2007) . All sequences were renamed to reflect published lineage names. Sequences were aligned using MUSCLE with default settings  and manually adjusted by eye. The optimal model of amino acid substitution was estimated using MEGA5  with default settings. Neighbor-joining and maximum-likelihood phylogenies were estimated with MEGA5  using the highest-ranked substitution model available and a bootstrap of 500 replicates.
Assignment to Families within Lineages and naming of sequences
Sugarcane LTR-RTs were assigned to families within lineages on the basis of 80% sequence identity in at least 80% of their LTRs . Although previous reports assign names to some sugarcane LTR-RT families [11, 14, 20], we opted to standardize the name of sugarcane LTR-RT sequences, using a more straightforward strategy, based in the proposed universal classification of TEs by Wicker et al. (2007) . Sequences were named 'RLC' (Copia) or 'RLG' (Gypsy), 'sc' for 'sugarcane', the lineage name e.g. 'Ale', the family number e.g. '1', then each sequence within a family was numbered sequentially. For example 'RLC_scAle_1.1' is the first sequence named within the Ale lineage, family 1, superfamily Copia.
Analysis of the structure of Sugarcane LTR-RTs
Coding domains were identified using Pfam, or by alignment with MUSCLE  against the domain alignment from the GyDb . Full-length sequences were aligned and analyzed using BioEdit , using the toggle translate option so that we could align the coding domains as well as the LTRs, TSDs, and the regions between the LTRs and the coding domains. LTRs were identified by submitting the sequence of the entire sugarcane LTR-RT as both a query and subject to a MEGABLAST  analysis. The beginning of the LTRs, regions between the LTRs and the coding domains, and the TSDs were manually aligned in BioEdit . Co-ordinates of the beginning of all features of each element were recorded in an Excel table and the information submitted to domain draw  to create a schematic representation of each sugarcane LTR-RT.
General features of each sequence, as element size, LTR size, Target Side Duplications (TSD) and GenBank accession numbers are presented in Additional file 6.
Sugarcane EST database screening
All full-length LTR-RTs were used as queries in a BLASTN search against EST sequences from the sugarcane cultivar SP80-3280. The ESTs were obtained using ENTREZ at NCBI http://www.ncbi.nlm.nih.gov/Entrez/. A total of 155,354 sugarcane ESTs were analyzed, all of them from the SUCEST (Sugarcane EST) project .
ESTs similar to LTR-RTs were assigned to a family according to the criteria based on Wicker et al. : 80% coverage with 80% nucleotide identity.
The number of hits for each library was normalized by dividing the raw number of hits by the total number of valid reads. The normalized numbers of hits per library were then combined according to tissue type. The final number was multiplied by 100,000, so that in Figure 4 the X axis represents the number of ESTs per 100,000 transcripts from each tissue.
Association of cDNAs to full-length LTR-RTs
Thirty manually curated sugarcane cDNAs related to LTR-RTs , described using an older nomenclature, were assigned to a family according to the same criteria used for the ESTs.
RNA extraction and Reverse Transcriptase (RT) PCR Analysis
Leaf blade tissues were collected from one-month-old sugarcane plants (cultivar SP 80-3280) grown under greenhouse conditions. Mature eight-month-old plants of the same cultivar were used to obtain lateral buds. Stalk pieces with one bud (single eye sets) were planted in plastic trays containing a commercial planting mix (Plantmax, Eucatex, Brazil). After five days, developing buds were collected for RNA extraction. Two independent biological replicates were collected for leaf blade and lateral bud tissues. Total RNA was extracted using TRizol reagent (Invitrogen) according to the manufacturer's instructions.
Primers were designed within the reverse transcriptase domain using Primer3Plus  to amplify all known elements from a family. Total RNA was treated with DNAse I Amp Grade (Invitrogen) to remove any residual genomic DNA. Three micrograms of DNAse-treated RNA was used to generate the first strand cDNA using ImProm II Reverse Transcriptase (Promega) according to the manufacturer's instructions. The reaction mixture was placed in a GeneAmp9700 thermocycler (Applied Biosystems) and incubated at 16°C for 30 minutes, followed by 60 cycles of pulsed reverse transcription at 30°C for 30 seconds, 42°C for 30 seconds, and 50°C for one second. cDNA dilutions were used in PCR reactions as following: 1.0 μL of cDNA, 10 pmol of each primer, GoTaqmastermix, and 1 U of GoTaq DNA Polymerase (Promega) in a total volume of 25 μL. The reactions were placed in the thermocycler with the following conditions: 94°C for three minutes and appropriate cycle numbers of 94°C for 30 seconds, 55°C or 60°C for 30 seconds, and 72°C for 45 seconds. All reactions were repeated at least twice.
Small RNA library construction and bioinformatic analysis
To evaluate the small RNA "landscape" of sugarcane LTR-RTs, we prepared a sRNA library from leaves of one-month old SP80-3280 sugarcane cultivar plants, grown under greenhouse conditions. Ten micrograms of total RNA, prepared using TRizol reagent (Invitrogen) according to the manufacturer's instructions, were used to generate sRNA library following Illumina's modified protocol. The sRNA fraction of 19-28 nt was purified by size fractionation on a 15% TBE-Urea polyacrylamide gel. A 5'-adenylated single-stranded adapter was first ligated to the 3'-end of the RNA using T4 RNA ligase without ATP, followed by a second single-stranded adapter ligation at the 5'-end of the RNA using T4 RNA ligase in the presence of ATP. The resulting products were fractioned on a 10% TBE-Urea polyacrylamide gel and then used for cDNA synthesis and PCR amplification. The resulting library was sequenced on an Illumina Genome Analyzer (GA-IIx) following the manufacturer's protocol available at http://www.fasteris.com.
A total of 4,388,665 20-25nt raw sequences were retrieved in a FASTQ formatted file and the adapter sequences were removed using Perl Scripts. After trimming of adapter sequences, the inserts were sorted into separate files according to their lengths. We used the program MAQ  to map 20-25 ntsRNA reads against sugarcane LTR-RT reference sequences (sequence 1 from each family). MAQ is a program that rapidly aligns short reads to reference genome sequences, and in this study we allowed 0-2 nt mismatches between the sRNA and LTR-RTs sequences. Three percent of the total library, that is, 131,641 high quality raw 20-25nt sequences matched against the sugarcane LTR-RT sequences. These sRNAs sequences have been submitted to the NCBI Gene Expression Omnibus database http://www.ncbi.nlm.nih.gov/geo under accession number GSE35143.
Fluorescence in situ hybridization (FISH)
The distribution of the sugarcane LTR-RTs was analyzed by fluorescence in situ hybridization (FISH) on metaphase chromosomes. In order to compare the distribution of the LTR-RT relative to the centromere, a centromeric BAC  was also used as a probe. A single representative probe was used for each evolutionary lineage (Figure 3). The sequence of each probe was submitted as a query to a BLASTN analysis against a database of sugarcane cDNAs related to TEs identified in our lab [11, 20] to check that, at 85% stringency, it would not hybridize against other elements.
All LTR-RT probes were 1.9 to 2.9 kb and covered the reverse transcriptase domain. For the Ale1 and Ivana1 families, probes were selected from previously reported cDNA sequences [11, 20]. For Ale1, we used cDNA TE137 (GenBank accession [GenBank:JN786875]) and TE049 for scIvana1 (GenBank accession [GenBank:DQ115032]) on the basis of size (> 1.9 kb) and the presence of the reverse transcriptase domain. For all other lineages primers were designed from alignments of the RT domain using Primer3Plus . All kits were used according to the manufacturer's instructions. The probe sequences were PCR amplified from R570 cultivar genomic DNA using Elongase (Invitrogen) or GoTaq (Promega) with 2 mM MgCl2, 0.2 mMdNTPs, 0.2 μM primers, 1 ng/μL genomic DNA and 0.025units/uL of Enzyme. Cycling conditions were as described in the Expand Long Template PCR System (Roche). The resulting amplicons were separated on 1% agarose, gel-purified using the NucleoSpin Extract II kit (Macherey Nagel), ligated into the pGEM T-Easy Vector (Promega), and cloned into DH10B electrocompetent cells according to standard procedures . Minipreps from three clones from each PCR reaction and from the cDNA clones were prepared using standard alkaline precipitation methods , and sequenced using the vector primers M13F/R. In order to obtain a probe that consisted of just the probe, one miniprep for each lineage was diluted 1:1000 and used as template in 100 μL PCR reaction with M13F/R primers to amplify the insert only, using GoTaq (Promega) in same reaction conditions as above, but with the following cycling conditions, initial denaturation 95°C 3 min, 35cycles of 95°C 1 min, 55°C 30 sec, 72°C 2 min, followed by a final extension of 72°C for 3 min. The resulting amplicons were separated on 1% agarose, gel purified using the NucleoSpin Extract II kit (Macherey Nagel) and quantified using a NanoDrop Spectrometer (ThermoScientific). For the centromeric BAC probe, BAC DNA was extracted using the Large-Construct Kit (Qiagen).
Between 350-700 ng of probe DNA was used in a 20 μL nick translation reaction with Digoxigenin (DIG)-11-dUTP (Invitrogen) or Biotin-16-dUTP (Invitrogen) and the NT mix (Roche). Labeling efficiency was tested according to Heslop (2000)  (protocol 4.7). The probe was only used if the 1:1000 dilution was clearly visible.
Sections of sugarcane stalk from the cultivar SP80-3280 were planted in a mixture of 1/2 soil 1/2 vermiculite, root tips harvested within 1-3 days and placed directly into 2 mM 8-hydroxyquinoline for 6 hours at 18°C. Next, they were transferred to 3:1 ethanol:acetic acid fixative and stored at -20°C. Root samples were prepared according to Heslop(2000) , protocol 5.3, except that they were digested in either 2% cellulase/0.2% macerozyme/20% pectinase or 1% cellulase/0.2% macerozyme for 2 1/2 to 3 hours (depending on how large the root tip was) at 37°C.
Hybridization and detection was performed according to Heslop (2000)  using protocols 8.1, 8.4, 9.1 and 9.2, with the following conditions: the slide was dried for 30 min at 50-60°C and pretreated with both RNAse A and pepsin (20 min at 37°C); 1 μL of each labeled probe was added to a 20 μL hybridization mix of 50% formamide/2xSSC/10% dextran sulphate/1%SDS; the slide was denatured in 50 mL of 70% formamide/2xSSC at 70°C for 2 min and then dehydrated through an ice-cold ethanol series (70%, 85%, 100% ethanol); washes were 80-82% stringent, 20% formamide with 0.1 or 0.2 xSSC at 42°C; DIG-labeled probes were detected with anti-digoxigenin-rhodamine (Roche), biotin-labeled with NeutrAvidin-Oregon Green-488 (Molecular Probes).
The slide was stained with DAPI, observed and photographed with an Zeiss AxioPlan2 microscope and captured using an Axiocam MR camera and the Isis Fluorescence Imaging System (MetaSystems). Nine to 25 metaphases were photographed for each probe. Slides were stripped by carefully removing the immersion oil, soaked in 4xSSC/0.1% Tween 20 at 37°C until the coverslip floated off, transferred to fresh 4xSSC/0.1% Tween 20 for 3 hours with gentle shaking, transferred to 3:1 ethanol:acetic acid fixative for 30 min and then dehydrated through an ethanol series (70%, 85%, 100% ethanol) for 5 min each at room temperature and air dried for 1 hour.