- Methodology article
- Open Access
Bulk segregant RNA-seq reveals expression and positional candidate genes and allele-specific expression for disease resistance against enteric septicemia of catfish
- Ruijia Wang†1,
- Luyang Sun†1,
- Lisui Bao†1,
- Jiaren Zhang1,
- Yanliang Jiang1,
- Jun Yao1,
- Lin Song1,
- Jianbin Feng1,
- Shikai Liu1 and
- Zhanjiang Liu1Email author
© Wang et al.; licensee BioMed Central Ltd. 2013
Received: 30 July 2013
Accepted: 18 December 2013
Published: 30 December 2013
The application of RNA-seq has accelerated gene expression profiling and identification of gene-associated SNPs in many species. However, the integrated studies of gene expression along with SNP mapping have been lacking. Coupling of RNA-seq with bulked segregant analysis (BSA) should allow correlation of expression patterns and associated SNPs with the phenotypes.
In this study, we demonstrated the use of bulked segregant RNA-seq (BSR-Seq) for the analysis of differentially expressed genes and associated SNPs with disease resistance against enteric septicemia of catfish (ESC). A total of 1,255 differentially expressed genes were found between resistant and susceptible fish. In addition, 56,419 SNPs residing on 4,304 unique genes were identified as significant SNPs between susceptible and resistant fish. Detailed analysis of these significant SNPs allowed differentiation of significant SNPs caused by genetic segregation and those caused by allele-specific expression. Mapping of the significant SNPs, along with analysis of differentially expressed genes, allowed identification of candidate genes underlining disease resistance against ESC disease.
This study demonstrated the use of BSR-Seq for the identification of genes involved in disease resistance against ESC through expression profiling and mapping of significantly associated SNPs. BSR-Seq is applicable to analysis of genes underlining various performance and production traits without significant investment in the development of large genotyping platforms such as SNP arrays.
Performance is related to the subtle variation in gene expression and this relationship differs among individuals . In well-defined families, the first level of variation comes from genetic segregation and recombination of chromosomes. As a result of segregation and chromosomal recombination, each individual has different genetic makeup. Upon a given genetic background, genetic potential carried on DNA can only be realized when the genes are expressed. At the whole genome level, expression of each gene is affected by its genetic regulatory element as well as trans-acting factors including the impact of environment. A composite of genes, transcriptional regulation, post-transcriptional modification and regulation, translational regulation and post-translational modification and regulation, along with environmental impact and genotype-environment interactions eventually determines the phenotypes of individuals. When considered at the whole genome level, expression of tens of thousands of genes and combination of these genes make the variation of performance traits extremely complex with huge variability. The task of modern agricultural genomics is to gain understanding of such variations and their relationship in determination of production and performance traits.
Traditionally, genetic and molecular biological studies are conducted to dissect these variables at different levels. For instance, the effect of alleles can be dissected through genetic and QTL mapping analysis [2–5]. Gene expression can be analyzed using high throughput methodologies such as microarrays and RNA-seq analysis [6–8]. Various epigenetic regulations have also been studied to understand the differences in gene expression with similar genetic background. Such analyses have been very powerful in determination of genetic and epigenetic factors affecting performance and production traits [9–11].
However, performance and production traits are often highly complex and the outcome of agricultural operations is affected by variations at all levels. For example, genetic background is very important because disease resistance genes allow the organism to survive the serious infections [12, 13]. In most cases, disease resistance genes have been studied through genetic linkage and QTL analyses that allow the identification of genomic regions containing disease resistance genes to be identified. Even with the most powerful molecular approaches, analysis of complex traits such as disease resistance can be extremely challenging. In 1991, Michelmore et al. developed a method called bulked segregant analysis (BSA) to study disease resistance in plants [13–20]. The basic idea of BSA was that phenotypic extremes should have drastic differences in genotypes. When samples are selected from phenotypic extremes, say the best and the worst performers, and their genotypes are analyzed in bulk, a correlation of genotypes with phenotypes can be attained. In other word, the variation among individuals may be quite subtle and difficult to detect; however, the pooled samples (bulk) of the phenotypic extremes should pose a strong contrast in their genotypes at the genomic location linked to the trait. BSA has been used in numerous studies to associate phenotypes with related genomic locations.
BSA has been evolving along with various types of molecular markers including Restriction Fragment Length Polymorphisms (RFLPs) , Random Amplified Polymorphic DNAs (RAPDs) , Simple Sequence Repeats (SSRs, or microsatellites) [15, 16], Amplified Fragment Length Polymorphisms (AFLPs) [16, 17] and Single Nucleotide Polymorphisms (SNPs) . With the development of Next-generation sequencing (NGS) technologies, BSA was first enhanced by the application of sequence-based markers such as restriction-site associated DNA (RAD) markers  and whole genome sequencing .
In recent years, the application of RNA-seq [7, 8, 23–27] has allowed rapid and comprehensive understanding of transcriptome level of variations. Coupling of BSA with RNA-seq has led to the development of bulked segregant RNA-seq (BSR-Seq), and it has been successfully applied in plants [28, 29], but not yet demonstrated in animals. Apparently, BSR-Seq possesses the advantages of both BSA and RNA-seq, with the high throughput for deep coverage of the transcriptome as well as the strong ability to detect genetic differences underlining the traits. Such a technique is best suited to organisms with high fecundity such as many species of fish.
Catfish is the major aquaculture species in the United States, accounting for over 60% of all US aquaculture production. The two major cultivated catfish species are channel catfish (Ictalurus punctatus) and blue catfish (Ictalurus furcatus). An inter-specific hybrid (channel catfish female × blue catfish male) has been popular for aquaculture because of strong heterosis . Not only is the interspecific hybrid is popular for aquaculture, it is also a superior system to study disease resistance because of their strong phenotypic difference. Blue catfish is extremely resistant against ESC disease while channel catfish is relatively susceptible. Genetic linkage analysis of F2 generation of the interspecific hybrids [31–33] would allow identification of disease resistance/susceptibility genes. In this study, we take advantage of BSR-Seq for the analysis of disease resistant genes using the F2 generation backcross progenies (F1 hybrid backcrossed with the susceptible channel catfish) of the interspecific hybrids. Here we demonstrate that BSR-Seq is capable of 1) revealing differentially expressed genes; 2) revealing positional candidates containing genes related to disease resistance after mapping SNPs on the whole genome; and 3) revealing allele-specific expression after bacterial infection.
Sequence assembly and analysis
Summary of Illumina sequencing of the catfish liver transcriptome with extreme phenotypes after ESC infection
Number of reads
Read length (bp)
Number of reads after trimming
Percentage kept after trimming
Average read length after trimming
Summary of de novo assembly of the catfish liver transcriptome with infection of ESC generated by Illumina sequencing and assembled with Trinity
Number of non-redundant contigs
Large contigs (≥1,000 bp)
Length of the largest contig
Average length of non-redundant contigs
% reads mapped to the final reference
Number of contigs with hits
Known gene matches
Unknown hypothetical gene matches
Differentially expressed genes after infection
Analysis of differentially expressed genes (fold change ≥2, p ≤ 0.05) after infection with Edwardsiella ictaluri
Fold change > 5
Fold change > 10
Number of genes differentially expressed in resistant fish
Number of genes differentially expressed in susceptible fish
Regulated genes only in resistant fish
Regulated genes only in susceptible fish
Although a relatively small number of genes were differentially expressed in resistant fish, a large number of genes were differentially expressed in susceptible fish, with a total of 1,240 genes being differentially expressed (Table 3). Not only the number of differentially expressed genes was drastically more in susceptible fish, the number of highly regulated genes was also much greater in susceptible fish, with 233 genes being up- or down-regulated 10 folds or more, and additional 287 genes were up- or down-regulated 5–10 folds (Table 3).
Comparison of gene expression in resistant fish and susceptible fish after infection
Comparison of gene expression between resistant fish and susceptible fish after infection with Edwardsiella ictaluri
Differentially expressed genes between resistant and susceptible fish
Gene expressed higher in resistant fish
Genes expressed lower in resistant fish
Identification of SNPs and significant SNPs
Identification of SNPs and significant SNPs (allele frequencies statistically different between the resistant and susceptible groups) from the assembled catfish liver transcriptome
Total number of SNPs
Number of contigs containing SNPs
Number of significant SNPs
Number of contigs containing significant SNPs
Number of contigs with significant hits to genes
Number of genes containing significant SNPs
Number of genomic scaffolds containing significant SNPs
Bulk frequency ratios
Genes with large BFR caused by genetic segregation
As RNA-seq data is analyzed in terms of RPKM at the RNA level, the allele ratios obtained by RNA-seq are compounded by two factors: the genotype allele frequencies at the DNA level and the relative expression levels of the two alleles at the RNA level. For instance, the two alleles may have very different genotype allele frequencies in the two bulked samples, and in these cases, even if the expression is not regulated at the transcriptional level, the final ratio of the two alleles between the bulked samples are expected to be different. However, if one of the two alleles is differentially regulated, the final allele ratio at the RNA level would be different from the allele ratio at the DNA level.
In order to differentiate SNPs with large BFR caused by genotype allele frequency difference from those with large BFR caused by allele-specific expression, the ratio of the two alleles was analyzed with combined bulk of both resistance and susceptible bulks. Theoretically, now the combined bulk should include alleles at expected segregation ratios without any connection with the phenotypes.
Allele ratio combination in two families
1st family/2nd family
AA x AG
AG x AG
AA x AG
AG x AG
AA x AA
Of the 359 genes with large BFR (≥4), 347 had a combined bulk allele ratio of 7 or less. The vast majority of these had a combined bulk allele ratio of 1–3, suggesting that the large BFR of these genes were not caused by allele-specific expression, and likely caused by genetic segregation.
Genes with large BFR caused by allele-specific expression
As shown in Figure 3, a large number of genes harboring significant SNPs had a significantly higher combined allele ratio, with 286 genes had an allele ratio of greater than 9:1 (genes above the threshold of 9 in Figure 3). Considering that most of the genes with BFR >4 had a combined allele ratio of 1–3 (see above and Figure 3), many of the genes with combined allele ratio of greater than 9 could be caused by allele-specific expression. On the cautious side, even in the extreme cases of 7:1 allele ratios at the DNA levels, twice the largest possible allele ratio at the DNA level should be 14:1. As shown in Figure 3, after the fisher’s exact test on the significant different level between the two alleles on the genes with allele ratio ≥ 14, 98 genes had a combined allele ratio of greater than 14 with FDR p-value smaller than 0.05, indicating that these large allele ratios are caused, at least in part, by allele-specific expression. A list of these allele-specific expressed genes was provided in Additional file 4: Table S4. Of the 98 genes with high combined allele ratios, 4 genes were with BFR higher than 4 and allele ratio higher than 14. They are plasminogen activator inhibitor 1, interferon-induced very large GTPase 1-like, uncharacterized protein LOC101157921 and CC chemokine SCYA108. Apparently, these large combined allele ratios were caused both by genetic segregation and by allele-specific expression.
Location of genes with high bulk frequency ratio (BFR)
To determine the genomic location of SNPs with high levels of BFR, genes containing SNPs with high BFR (BRF ≥ 4) were initially used as query for BLAST searches against the draft catfish genome sequence scaffolds in relation to the linkage map. A total of 354 genes were identified to contain significant SNPs with high BFR (Additional file 3: Table S3). BLASTN searches were conducted to determine the locations of the 354 genes on the scaffolds of the catfish genome draft sequence (unpublished data), and they were found to be within 201 genomic sequence scaffolds. Of the 201 scaffolds, 134 can be located in linkage groups, and the remaining cannot be located on linkage groups because no markers from these scaffolds were mapped.
Parental origin of highly expressed alleles
In this study, we conducted BSR-Seq [28, 29] by combining the NGS-based RNA-seq  with bulk segregant analysis  for the analysis of genes involved in disease response and disease resistance against enteric septicemia of catfish (ESC). Such a simple combination of RNA-seq and BSA analysis allowed identification of differentially expressed genes, localization of disease resistance-related genes in linkage groups by mapping SNPs on the whole genome, and analysis of allele-specific expression.
BSR-Seq carried the full capability of RNA-seq that allowed identification of differentially expressed genes. Comparison of expression in resistant fish pool and susceptible fish pool with the control allowed the identification of differentially expressed genes after infection. A total of 1,240 and 224 genes were identified to be differentially expressed after ESC infection in susceptible fish and resistant fish, respectively. In the susceptible fish, many of the up-regulated genes represent the acute phase response protein genes, as previously reported [36, 37]. Apparently, microarray studies were limited to the gene probes existing on the array, while RNA-seq analysis has the ability to detect all induced genes, depending on their expression levels. Clearly, the greater numbers of genes identified from this study after infection indicated that RNA-seq is more sensitive than the microarray analysis. Although RNA-seq analysis was also previously conducted after ESC infection , the tissues were different in these studies. In the work of Li et al. , intestine tissue was used, while liver was used in this study. Nevertheless, many of the differentially expressed genes identified here in the liver were among the differentially expressed genes in the intestine, as well as those identified in the microarray studies. For instance, the acute phase response (APR) genes such as CC chemokines, Toll-like receptors, complement component proteins, catechol-O-methyltransferase domain containing 1, apolipoprotein proteins, fibrinogens, angiotensinogen, MHC class I and II, Tumor necrosis factors were all found to be up-regulated, as found previously [36, 37].
The use of phenotypic extremes, resistant and susceptible fish, allowed comparison of expression patterns of genes involved in immune responses, although the time point was quite different. For instance, a number of immune-related genes were found to be expressed higher in resistant fish than in susceptible fish including apolipoprotein A IV, apolipoprotein Ab, apolipoprotein Eb, apolipoprotein Bb, apolipoprotein B100, apolipoprotein M and complement component 1q (C1q), complement component 1 s (C1s), complement component 3, complement component 3a, fibrinogen alpha, fibrinogen gamma, MHC class I, and MHC II. These genes expressed at higher levels in resistant fish could indicate their importance in the related disease resistance. Apolipoproteins have been shown to be important for disease resistance in mice [39–41] and chickens [42, 43].
Apparently, a much larger number of differentially expressed genes were identified in susceptible fish than in resistant fish as compared with the control. We believe that much of this difference was caused by the different sampling time (3–5 days versus 2 weeks) between these two groups. As heavy mortalities occurred 4–6 days after infection, the vast majority of differentially expressed genes in susceptible fish represented the massive responses of the host against the infection including the acute phase proteins and genes involved in inflammation and immune responses [44, 45]. In contrast, the resistant fish samples were collected two weeks after infection from the survivors. As such, these resistant fish were either “resistant” or not infected, or may have cleared the bacteria from their system. The massive host responses to infection may have been over by the time of two weeks after infection.
One gene can harbor many SNPs, but not all of them are relevant to disease resistant. The low BFR of a SNP means the allele frequency of that SNP is similar between the susceptible group and resistant group, and such SNPs are irrelevant SNPs in relation to disease resistance. As to the fact that different SNPs within a single gene can have different BFR, there may be several explanations including: 1) although the number of sequences from each allele at an SNP site should be proportional to the “allele frequency”, that may not be the case practically, simply because the sequencing depth is limited; 2) We are dealing with pooled samples from two families that were derived from interspecific hybrids. Therefore, there are SNP sites that are polymorphic only within channel catfish, only within blue catfish, polymorphic in both channel catfish and blue catfish, or not polymorphic within channel catfish or blue catfish, but are polymorphic between the two species. Thus allelic ratios at different SNP sites are expected to be different. Given such complexities, it is reasonable to use SNPs with the highest BFR within each gene. In contrast, either use of median or mean BFR or use of all the SNPs for each gene may bring irrelevant SNPs into the consideration for the analysis of SNP localization, which can lead to the underestimate of the BFR in the candidate region or even miss the candidate region due to the incorrect decay pattern of LD. In addition, it is worth to mention that we used pooled sample for the BSR-seq, which could induce the difficulty to the assessment of the within-group variance, however, this is the innate limitation of the Bulked segregant analysis. Currently, there is no optimal resolution to avoid this limitation, but some studies claimed that this flaw will not cause a serious bias in the pooled sample RNA-Seq analysis .
Bulk frequency ratio (BFR) was previously used as an effective parameter for genetic analysis in BSA or BSR-Seq . However, in those cases, genotypes were determined using DNA. Here in this study, the “allele frequency” was calculated from the mapped reads of RNA samples, and thus the calculated BFR could be compromised by allele-specific expression. In order to identify positional candidates for resistance using transcriptome datasets generated from BSR-Seq, we need to differentiate allele frequencies caused by genetic segregation and those caused by allele-specific induction/suppression: Significant SNPs with large BFR and small combined allele ratio are likely to be caused by genetic segregation; significant SNPs with small BFR but large combined allele ratio are likely caused by allele-specific expression, while significant SNPs with large BFR and large combined allele ratios may have been caused by both genetic segregation and allele-specific expression. For instance, if the allele ratio at an A/G SNP site is 10 to 1 in resistance pool, and 1 to 10 in susceptible pool, the BFR should be 10. When the two bulks were combined, now the allele ratio of A/G is 11:11 = 1. This SNP, with a high BFR and a low combined allele ratio, should be a SNP with allele frequency difference between the bulks caused by genetic segregation. In contrast, when one of the two alleles is differentially up-regulated, the combined allele ratio will stay large. For instance, at an A/G SNP site, if A is significantly up-regulated in the resistant fish, say 100A:5G, and in susceptible fish, A and G are roughly expressed equally, both at low levels, say 5A:5G. In this case, the BFR = (100/105)/(5/10) = 1.91; when the two bulks are combined, the combined allele ratio would be 10.5. Apparently in this case, the large allele ratio is caused by allele-specific expression.
In this study, each bulk was made of 24 fish with 12 fish from one family and 12 fish from a second family. As the exact allele ratio at each SNP site is unknown in the two families, we made several assumptions for the analysis of SNPs due to genetic segregation and those due to allele-specific expression. At an A/G SNP site, the parent in one family could be AA x GG, AA x AG, or AG x AG. In these cases, the largest allele ratio can be 3:1 (in the case of AA x AG) at the DNA level. When two families were used, as in this study, the largest allele ratio at the DNA level could be 7:1, i.e., AA x AG in one family, and homozygous AA x AA in another family. Any SNPs with significantly larger combined allele ratio than 7:1 would suggest allele-specific expression. We identified SNPs with combined allele ratio of greater than 14 (twice the largest possible allele ratio at the DNA level) as being allele-specifically expressed. Apparently, many SNPs fell between the two possibilities, and to obtain reasonable assessment of those caused by genetic segregation and those caused by allele-specific expression, we neglected those ones between the two possibilities (Figure 3).
True linked SNPs are characteristic in that many significant SNPs can be observed in nearby genomic locations because of genetic linkage. In catfish, the whole genome sequence assembly is still under way. We therefore, mapped the significant SNPs to scaffolds and then examined the patterns of the SNP distribution. Within a QTL region, statistical significance should be the highest with the gene underlining the performance trait, and LD should decay gradually on both sides of the chromosome surrounding the gene . In our study, quite many significant SNPs were located on unmapped scaffolds, but many were also mapped to genetic linkage groups including LG6, LG15, and LG17 that included at least 10 genes with high BFR (≥5) and low combined allele ratio (≤4). As shown in Figures 5, 6, and 7, the LD appeared to be decaying around the most significant SNPs, suggesting that these genomic regions indeed contain resistance-related genes. For instance, in LG6, the gene containing the most significant SNP was protein G7c-like gene located at the 24 Mb position, and the BFR values on both sides of this gene were significant, but lower than BFR within the G7c-like gene (Figure 5). Similarly, the gene containing the highest BFR was acidic chitinase-like gene that located in the middle of a 23 Mb DNA in LG15, and the BFRs were lower on both sides along this LG (Figure 6). In LG17, the SNPs with highest BFR was located within the DnaJ subfamily A member 2 gene close to the end of the 12 Mb DNA in LG17, and the BFR on both sides were lower (Figure 7). It was unknown if the detected genes with the highest BFR were themselves involved in disease resistance. This was because some linked genes with even greater BFR were not yet mapped to the linkage group, staying as isolated scaffolds, and therefore cannot be viewed under the same “Manhattan plot”, or because the expression level of the real disease resistance gene was so low that it was not detected under the BSR-Seq analysis.
Differentially expressed genes after infection with high bulk frequency ratios and low combined allele ratio
Combined allele ratio
Regulates iron ion homeostasis and involved in endocytosis
Tumor suppressor candidate 5 homolog
Inhibits breast tumor formation in vivo
Effect the susceptibility to lung tumors
Participates in the defense against nematodes, fungi and bacteria. Plays a role in T-helper cell type 2 (Th2) immune response.
Mannan-binding lectin serine peptidase 2
Lectin complement pathway actication
Inter-alpha-trypsin inhibitor heavy chain H4
Type II acute-phase protein (APP) involves in inflammatory responses to trauma. May also play a role in liver development or regeneration.
D-amino acid oxidase
Regulation of the glutamatergic neurotransmission; may play a role in the glutamatergic mechanisms of schizophrenia
MAWD binding protein like
Inhibits proliferation and invasion in gastric cancer
Mediate epithelial innate defense through their antimicrobial properties
Involves in polyamine homeostasis
Stonustoxin subunit alpha-like
Induces hemolytic activities, displays edema-inducing activities, increases vascular permeability and interferes irreversibly with neuromuscular function.
Catalyzes the epimerization of UDP-glucose to UDP-galactose and the epimerization of UDP-N-acetylglucosamine to UDP-N-acetylgalactosamine.
Major histocompatibility complex class I UDA
Play an important role in immune response and antigen processing and presentation
Involves in the gram negative bacteria recognition. Against the intracellular pathogen.
Most prominent cellular substrate for protein kinase C. It can bind calmodulin, actin, and synapsin.
3-oxoacid CoA transferase 1a
Key enzyme for ketone body catabolism. Also plays and important roles in the energy metabolism of spermatozoa.
MHC class II beta chain
Involves in antigen processing and presentation of peptide or polysaccharide antigen via MHC class II
A total of 98 genes were identified as genes with allele-specific expression (Figure 3). One obvious question is what causes allele-specific expression (ASE). Two hypotheses were previously proposed to account for ASE [45, 46]. In the first hypothesis, mutations in cis-acting DNA elements can cause differences in binding of trans-acting factors, especially when such mutations are located in the promoter or enhancer regions. Inversely, mutations in the trans-acting factors would also cause their differences in binding to their target sequence. In the second hypothesis, epigenetic factors such as differential methylation of the two alleles can cause differences in expression levels. It has long been known that mutations in non-coding regions which affect gene expression can cause human genetic disease [45, 47]. A differentially expressed gene exhibits cis-acting variation when the differential expression is caused by factors linked to the differentially expressed alleles, such as differences in promoter sequences or chromatin state. The list of examples in which cis-acting regulatory variation plays a key role in phenotypic variation are increasing . In vitro experiments prove that variants in gene promoter regions effect rates of transcription and these variants may also lead to a significant proportion of differential allelic expression . In addition, expressed genes contain trans-acting variation when the differential expression is caused by factors unlinked to the differentially expressed alleles, such as differences caused by genetic background and regulatory networks.
The importance of DNA methylation as a driving factor in allele specific expression has been claimed and proved by a number of studies [46, 48]. In these studies, a direct correlation between allele specific expression and methylation was observed. Clearly, the different epigenetic state of each haploid genome is a major factor in the expression of the two alleles. Although X-chromosome inactivation and silencing are usually considered to be mainly related to epigenetic effect , some studies also suggested that the change of gene regulation caused by epigenetic modification of sequence variation might be a common pathogenic mechanism in mammals . One hypothetical role for epigenetics is genetic imprinting leading to mono-allelic expression . Both in vivo and in vitro experiments indicate that allele-specific differences in the rate of transcription are common existed, if not all genes are likely to show differential allelic expression in different individuals . However, the role of ASE genes in complex traits is still not clear.
Most ASE studies have been conducted in humans, and no studies have been conducted in fish. In addition to the above discussed possibilities, it is noteworthy that we used an interspecific hybrid system in this study. In the channel catfish backcrossed progenies, overall 50% of chromosomes are “homozygous” from channel catfish while 50% chromosomes are heterozygous with one chromosome being from channel catfish and the other from blue catfish. If a trans-acting factor is transcribed from channel catfish genes, it would bind the cis-acting elements from channel catfish with greater affinity, causing allelic expression. This could be another explanation for the observed ASE.
The significance of the observed allele-specific expression in relation to phenotype is unknown at present. To date, the majority of expression analysis focus on the total amount of the transcripts. However, emerging evidence underlies the importance of understanding the allele specific transcript in cases of disease [45, 52]. In our studies here, among the 98 allele-specifically expressed genes, parental origin of alleles can be determine for only 18 genes. Among these 18 genes, the channel catfish allele was expressed higher in 11 genes (six high in resistant fish and five high in susceptible fish), and blue catfish allele was expressed higher in 7 genes (four high in resistant fish and three high in susceptible fish). There was no correlation of resistance with a specific parent. However, it is possible that certain combinations of alleles would warrant resistance and certain combinations of alleles would lead to susceptibility. This clearly warrants future studies.
In this study, we demonstrated the application of BSR-Seq to study disease resistance by combining RNA-seq with bulk segregant analysis. It has the full capacity for the identification of differentially expressed genes, the capacity to identify significant SNPs between phenotypic bulks, the capacity to potentially identify the positional candidate genes, and the ability to identify allelic expressed genes. Among many differentially expressed genes, 17 genes were also among the genes that had high BFR and low combined allele ratio, suggesting that these genes could be potentially involved in disease resistance.
ESC bacterial challenge
All procedures involving the handling and treatment of fish used during this study were approved by the Auburn University Institutional Animal Care and Use Committee (AU-IACUC) prior to initiation. Four families of backcross progenies (average size 35 ± 1.3 g) were reared at the Auburn University Fish Genetics Research Unit prior to challenge. Fish were challenged in 500 L (400 L water) aquaria with control group containing 400 fish (100 per family) and treatment group containing 1200 fish (300 per family). The MS-S97-773 isolate of E. ictaluri bacteria was obtained from a natural outbreak and utilized in the experimental challenge. Bacteria were re-isolated from a single symptomatic fish and biochemically confirmed by appearance (small, punctate white colonies) and through biochemical assay (oxidase negative, fermentative in O/F glucose or glucose motility deeps (GMD), triple sugar iron (TSI) slant reaction K/A with no H2S, and negative for indole production in tryptone broth). The confirmed bacteria were then cultured in Brain Heart Infusion broth (BHI) and incubated in a shaker incubator at 28°C overnight. The concentration of the bacteria was determined using colony forming unit (CFU) per mL by plating 10 ml of 10-fold serial dilutions onto BHI agar plates. During challenge, 1000 mL bacterial culture with a concentration of 4 × 108 CFU/ml was added into the aquaria. Water was turned off in the aquaria for 2 h of immersion exposure, and then continuous water flow-through resumed for the duration of the challenge experiment. Control group was treated with same volume of brain heart infusion (BHI) medium at the same time. During 3–5 days after challenge, all dying fish with classical ESC clinic signs were collected as susceptible fish from two families. After two weeks of the challenge, all survival fish were collected as resistant fish from the same two families. Also, the fish in control group were collected at that time. The fish were euthanized with tricaine methanesulfonate (MS 222) at 300 mg/l before tissue collection.
Sampling and RNA isolation
Equal amount of liver tissue was used from each of the 72 fish (12 fish/family, 24 fish/group each for resistant, susceptible, and control) used for RNA isolation. The tissue samples were ground separately to a fine powder in the presence of liquid nitrogen. Total RNA was extracted using the RNeasy Universal Tissue Kit (Qiagen, USA). The samples belonging to the same group were then diluted to the same concentration and pooled together prior to library construction.
Sequencing libraries were prepared with 2.14-3.25 μg of starting total RNA and processed using the Illumina TruSeq RNA Sample Preparation Kit, as dictated by the TruSeq protocol. The library were amplified with 15 cycles of PCR and contained TruSeq indexes within the Illumina adaptors, specifically indexes barcode 1–3 to label the three groups. The final, amplified library yields were 30 μl of double-stranded product (19.8-21.4 ng/μl) with an average length of 275 base pair (bp), indicating a concentration of 110–140 nM. After quantitation performed using KAPA Library Quant Kits (Kapa Biosystems, USA) and dilution, the library were clustered 3 in one lane and sequenced on a Hiseq 2500 instrument with 100 bp paired end (PE) reads at the HudsonAlpha Genomic Services Lab (Huntsville, AL, USA). The image analysis, base calling and quality score calibration were processed using Illumina Pipeline Software v1.5, and FASTQ reads files containing the sequencing read, quality scores and paired reads information were exported for the following trimming and assembly process. Raw reads were processed for initial trimming by CLC Genomics Workbench (version 5.5.2; CLC bio, Aarhus, Denmark). Adaptor sequences, ambiguous nucleotides (’N’ in the end of reads), and low quality sequences (quality scores < 30 or read length < 30 bp) were removed.
As the primary algorithm used in RNA-seq assembly, the de bruijn graph method was utilized in this study. Trinity version 2013-02-25 was chosen in this study due to its good performance . Briefly, the raw reads were assembled into unique sequences of transcripts in Inchworm via greedy K-mer extension (k-mer 25). After mapping of reads to Inchworm contigs, Chrysalis incorporated reads into de bruijn graphs and the Butterfly module processed the individual graphs to generate full-length transcripts. And then CD-hit  and CAP3  were used to remove assembly redundancy by setting global sequence identity in CD-hit to 1, the minimal overlap length and percent identity in CAP3 to 100 bp and 99%.
Gene identification and annotation
The final Trinity assembly contigs were used as queries against the NCBI non-redundant (nr) protein database and the zebrafish protein database using BlastX by setting the cut-off value (E-value, the likelihood that the matching sequence is obtained by chance) of 1e-5 and returning only the top 10 hit results of each query. The top gene identifications and names were initially assigned to each contig. “Hypothetical” or “uncharacterized” top BLAST results were replaced by more informative hits from the top ten lists when available.
Transcript-level gene expression analysis of different groups
The modified assembly of Trinity including all unigenes with blast hits was used as the pseudo-reference against which trimmed reads were mapped for gene expression analysis. Reads per kilobase of exon model per million mapped reads (RPKM)  were calculated as the original expression value. The original expression values were scaling normalized in order to ensure that samples were comparable . The expression fold change was calculated based on the modified expression value between the infection and control groups. The Kal’s test  was used to test the significance of the expression fold change. The initial p-value were adjusted by False discovery rate (FDR) method . Analysis was performed using the RNA-seq module and the expression analysis module in CLC. The threshold of gene expression selection was set to: FDR adjusted p-value <0.05, mapped reads >5, weighted proportions of fold change ≥ |2|, or unless otherwise clarified for more stringent analysis.
Sequencing mapping and significant SNP identification
Sequencing mapping for SNP identification analysis was performed using CLC Genomics Workbench (version 5.5.2; CLC bio, Aarhus, Denmark). Trimmed sequence reads were aligned against the Trinity assembly contigs. Mapping of reads from each group to the reference sequence was performed with mismatch cost of 2, deletion cost of 3 and insertion cost of 3. The highest scoring matches that shared ≥ 95% similarity with the reference sequence across ≥ 90% of their length should be included in the alignment. The non-unique mappings were removed. Finally, all mapping results were converted to BAM format.
The initial SNP identification was conducted by the rmdup and mpileup function of SAMtools version 0.1.18 . PoPoolation2 version 1.201  was used to call the genotype at each variant with the lowest criteria setting in order to get all possible real SNPs, SNP loci that has more than 2 allele variants were discarded. Two factors that are important for increasing quality of putative SNPs were set based on the sequence error rate and total coverage: 1) minimum reads in each group ≥ 6, and 2) Total minor allele reads count >3. SNPs that passed all the optimal factors were considered as initial SNPs.
Significant SNPs were identified between resistant catfish group and susceptible catfish group. SNPs which displayed heterozygous genotype (allele variants = 2 and minor allele reads ≥2 in each group) were retrieved to test the difference level of allele frequencies between resistant catfish group and susceptible catfish group using two-tailed Fisher’s Exact test . The threshold was set as FDR p-value ≤0.05.
Analysis of bulk frequency ratios and allele specific expression
In order to compare the SNP allele frequencies more directly, bulk frequency ratios (BFR) were generated from the RNA-seq data between the two bulks, the resistant fish and the susceptible fish, as follows: the frequency of informative base (the major allele of resistant fish) was first calculated for resistant and susceptible bulks and then BFR was equal to the ratio of these frequencies between the bulks. The ratio of the two alleles was also analyzed with combined bulks of both resistance and susceptible fish (i.e. total count of one allele in both resistant and susceptible group divided by total count of the other allele) to help classify the genes containing significant SNPs into several categories: 1) The genes containing significant SNPs with BFR ≥ 4 and allele ratio ≤ 9; 2) The genes containing significant SNPs with BFR ≥ 4 and allele ratio ≥ 14; 3) The genes containing significant SNPs with BFR ≤ 4 and allele ratio ≥ 14; 4) The genes containing significant SNPs with BFR ≤ 4 and allele ratio ≤ 9; 5)The genes containing significant SNPs with allele ratio from 9 to 14. The genes with BFR ≥4 and allele ratio ≤ 9 were defined as segregation involved candidate genes, the genes with BFR <4 and allele ratio ≥ 14 were defined as allele specific expression (ASE) involved candidate genes, and the genes with BFR ≥4 and allele ratio ≥ 14 were defined as both segregation and ASE involved genes. An extra fisher’s exact test were used to check the significant different level between the two alleles on the genes with allele ratio ≥ 14, by setting the expect allele ration equal to 7:1 and FDR adjusted p-value ≤ 0.05. Analysis of parental origin of the alleles for the genes with high allele ratio was then performed. The inter-species SNPs database of blue and channel catfish  and the parents’ genotype of two target families on SNP chip (unpublished data) were used. The inter-species SNPs were mapped to the genes containing significant SNPs with high allele ratio (allele ratio > 14) to check whether they shared the same position with the significant SNPs. The major allele origin of mapped SNPs was labeled based on the genotype information of the inter-specific SNPs.
Genomic location of ESC resistance-related genes
Genes harboring significant SNPs with BFR ≥5 and combined allele ratio ≤4 were used as query to map to the whole genome scaffold and linkage groups (unpublished data) by BLASTN with e-value of 1e-20. The mapped scaffolds were then located to the linkage groups by 2nd generation catfish linkage map . The linkage groups contain more than 10 genes with significant SNPs and at least one gene harboring significant SNPs with BFR ≥ 10 were identified as potential genomic regions harboring candidate genes for ESC resistance.
Availability of Supporting Data
Raw RNA-seq reads data supporting the results of this article are available in the NCBI Sequence Read Archive (SRA) under Accession SRP028159 (http://www.ncbi.nlm.nih.gov/sra).
This project was supported by Agriculture and Food Research Initiative Competitive Grant no. 2009-35205-05101, 2010-65205-20356 and 2012-67015-19410 from the USDA National Institute of Food and Agriculture (NIFA). The authors wish to thank Chao Li, Yu Zhang, Yun Li, Xin Geng, Ailu Chen, Chen Jiang, Huseyin Kucuktas and Ludmilla Kaltenboeck, for their technical assistance in various experiments. Ruijia Wang, Luyang Sun, Lisui Bao, Jiaren Zhang, Lin Song and Shikai Liu were supported by a scholarship from the China Scholarship Council (CSC) for studying abroad.
- Oleksiak MF, Roach JL, Crawford DL: Natural variation in cardiac metabolism and gene expression in Fundulus heteroclitus. Nat Genet. 2004, 37: 67-72.PubMed CentralPubMedGoogle Scholar
- Hallman D, Boerwinkle E, Saha N, Sandholzer C, Menzel H, Csazar A, Utermann G: The apolipoprotein E polymorphism: a comparison of allele frequencies and effects in nine populations. Am J Hum Genet. 1991, 49: 338-PubMed CentralPubMedGoogle Scholar
- He G, Chen B, Wang X, Li X, Li J, He H, Yang M, Lu L, Qi Y, Wang X: Conservation and divergence of transcriptomic and epigenomic variation in maize hybrids. Genome Biol. 2013, 14: R57-10.1186/gb-2013-14-6-r57.PubMed CentralView ArticlePubMedGoogle Scholar
- Bevan M, Uauy C: Genomics reveals new landscapes for crop improvement. Genome Biol. 2013, 14: 206-10.1186/gb-2013-14-6-206.PubMed CentralView ArticlePubMedGoogle Scholar
- Young N: QTL mapping and quantitative disease resistance in plants. Annu Rev Phytopathol. 1996, 34: 479-501. 10.1146/annurev.phyto.34.1.479.View ArticlePubMedGoogle Scholar
- Bunney WE, Bunney BG, Vawter MP, Tomita H, Li J, Evans SJ, Choudary PV, Myers RM, Jones EG, Watson SJ: Microarray technology: a review of new strategies to discover candidate vulnerability genes in psychiatric disorders. Am J Psychiatry. 2003, 160: 657-666. 10.1176/appi.ajp.160.4.657.View ArticlePubMedGoogle Scholar
- Wang Z, Gerstein M, Snyder M: RNA-seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10: 57-63. 10.1038/nrg2484.PubMed CentralView ArticlePubMedGoogle Scholar
- Oshlack A, Robinson MD, Young MD: From RNA-seq reads to differential expression results. Genome Biol. 2010, 11: 220-10.1186/gb-2010-11-12-220.PubMed CentralView ArticlePubMedGoogle Scholar
- Molinié B, Georgel P: Genetic and epigenetic regulations of prostate cancer by genistein. Drug News Perspect. 2009, 22: 247-254. 10.1358/dnp.2009.22.5.1378633.View ArticlePubMedGoogle Scholar
- Ng J-H, Heng J-CD, Loh Y-H, Ng H-H: Transcriptional and epigenetic regulations of embryonic stem cells. Mutat Res. 2008, 647: 52-58. 10.1016/j.mrfmmm.2008.08.009.View ArticlePubMedGoogle Scholar
- He H, Hua X, Yan J: Epigenetic regulations in hematopoietic Hox code. Oncogene. 2010, 30: 379-388.View ArticlePubMedGoogle Scholar
- Gururani MA, Venkatesh J, Upadhyaya CP, Nookaraju A, Pandey SK, Park SW: Plant disease resistance genes: current status and future directions. Physiol Mol Plant Pathol. 2012, 78: 51-65.View ArticleGoogle Scholar
- Zhao S, Zhu M, Chen H: Immunogenomics for identification of disease resistance genes in pigs: a review focusing on Gram-negative bacilli. J Anim Sci. 2012, 3: 1-13.Google Scholar
- Guangli S, Zhanjing H, Congfen H, Yinzhu S, Wang J: Identification of the molecular markers linked to the salt-resistance locus in the wheat using RAPD-BSA technique. Acta Bot Sin. 2001, 43: 598-Google Scholar
- Cheema KK, Grewal NK, Vikal Y, Sharma R, Lore JS, Das A, Bhatia D, Mahajan R, Gupta V, Bharaj TS: A novel bacterial blight resistance gene from Oryza nivara mapped to 38 kb region on chromosome 4 L and transferred to Oryza sativa L. Genet Res. 2008, 90: 397-10.1017/S0016672308009786.View ArticleGoogle Scholar
- Dussle C, Quint M, Melchinger A, Xu M, Lübberstedt T: Saturation of two chromosome regions conferring resistance to SCMV with SSR and AFLP markers by targeted BSA. Theor Appl Genet. 2003, 106: 485-493.PubMedGoogle Scholar
- Asnaghi C, Roques D, Ruffel S, Kaye C, Hoarau J-Y, Telismart H, Girard J, Raboin L, Risterucci A, Grivet L: Targeted mapping of a sugarcane rust resistance gene (Bru1) using bulked segregant analysis and AFLP markers. Theor Appl Genet. 2004, 108: 759-764. 10.1007/s00122-003-1487-6.View ArticlePubMedGoogle Scholar
- Becker A, Chao D-Y, Zhang X, Salt DE, Baxter I: Bulk segregant analysis using single nucleotide polymorphism microarrays. PLoS One. 2011, 6: e15993-10.1371/journal.pone.0015993.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee BY, Penman D, Kocher T: Identification of a sex-determining region in Nile tilapia (Oreochromis niloticus) using bulked segregant analysis. Anim Genet. 2003, 34: 379-383. 10.1046/j.1365-2052.2003.01035.x.View ArticlePubMedGoogle Scholar
- Ruyter-Spira CP, Gu Z, Van der Poel J, Groenen M: Bulked segregant analysis using microsatellites: mapping of the dominant white locus in the chicken. Poult Sci. 1997, 76: 386-391.View ArticlePubMedGoogle Scholar
- Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA: Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008, 3: e3376-10.1371/journal.pone.0003376.PubMed CentralView ArticlePubMedGoogle Scholar
- Wenger JW, Schwartz K, Sherlock G: Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae. PLoS Genet. 2010, 6: e1000942-10.1371/journal.pgen.1000942.PubMed CentralView ArticlePubMedGoogle Scholar
- Ayers KL, Davidson NM, Demiyah D, Roeszler KN, Grutzner F, Sinclair AH, Oshlack A, Smith CA: RNA sequencing reveals sexually dimorphic gene expression before gonadal differentiation in chicken embryos and allows comprehensive annotation of W-chromosome genes. Genome Biol. 2013, 14: R26-10.1186/gb-2013-14-3-r26.PubMed CentralView ArticlePubMedGoogle Scholar
- Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. 2008, 5: 613-619. 10.1038/nmeth.1223.View ArticlePubMedGoogle Scholar
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18: 1509-1517. 10.1101/gr.079558.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat Methods. 2008, 5: 621-628. 10.1038/nmeth.1226.View ArticlePubMedGoogle Scholar
- Pepke S, Wold B, Mortazavi A: Computation for ChIP-seq and RNA-seq studies. Nat Methods. 2009, 6: S22-S32. 10.1038/nmeth.1371.PubMed CentralView ArticlePubMedGoogle Scholar
- Trick M, Adamski NM, Mugford SG, Jiang C-C, Febrer M, Uauy C: Combining SNP discovery from next-generation sequencing data with bulked segregant analysis (BSA) to fine-map genes in polyploid wheat. BMC Plant Biol. 2012, 12: 14-10.1186/1471-2229-12-14.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu S, Yeh C-T, Tang HM, Nettleton D, Schnable PS: Gene mapping via bulked segregant RNA-seq (BSR-Seq). PLoS ONE. 2012, 7: e36406-10.1371/journal.pone.0036406.PubMed CentralView ArticlePubMedGoogle Scholar
- Chatakondi N, Yant D, Dunham R: Commercial production and performance evaluation of channel catfish, Ictalurus punctatus female× blue catfish, Ictalurus furcatus male F-1 hybrids. Aquaculture. 2005, 247: 8-Google Scholar
- Liu Z, Karsi A, Li P, Cao D, Dunham R: An AFLP-based genetic linkage map of channel catfish (Ictalurus punctatus) constructed by using an interspecific hybrid resource family. Genetics. 2003, 165: 687-694.PubMed CentralPubMedGoogle Scholar
- Kucuktas H, Wang S, Li P, He C, Xu P, Sha Z, Liu H, Jiang Y, Baoprasertkul P, Somridhivej B: Construction of genetic linkage maps and comparative genome analysis of catfish using gene-associated markers. Genetics. 2009, 181: 1649-1660. 10.1534/genetics.108.098855.PubMed CentralView ArticlePubMedGoogle Scholar
- Ninwichian P, Peatman E, Liu H, Kucuktas H, Somridhivej B, Liu S, Li P, Jiang Y, Sha Z, Kaltenboeck L: Second-generation genetic linkage map of catfish and its integration with the BAC-based physical map. G3. 2012, 2: 1233-1241. 2012.PubMed CentralView ArticlePubMedGoogle Scholar
- Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK: VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome research. 2012, 22: 568-576. 10.1101/gr.129684.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Michelmore RW, Paran I, Kesseli R: Identification of markers linked to disease-resistance genes by bulked segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations. Proc Natl Acad Sci. 1991, 88: 9828-9832. 10.1073/pnas.88.21.9828.PubMed CentralView ArticlePubMedGoogle Scholar
- Peatman E, Baoprasertkul P, Terhune J, Xu P, Nandi S, Kucuktas H, Li P, Wang S, Somridhivej B, Dunham R: Expression analysis of the acute phase response in channel catfish (Ictalurus punctatus) after infection with a Gram-negative bacterium. Dev Comp Immunol. 2007, 31: 1183-1196. 10.1016/j.dci.2007.03.003.View ArticlePubMedGoogle Scholar
- Peatman E, Terhune J, Baoprasertkul P, Xu P, Nandi S, Wang S, Somridhivej B, Kucuktas H, Li P, Dunham R: Microarray analysis of gene expression in the blue catfish liver reveals early activation of the MHC class I pathway after infection with Edwardsiella ictaluri. Mol Immunol. 2008, 45: 553-566. 10.1016/j.molimm.2007.05.012.View ArticlePubMedGoogle Scholar
- Li C, Zhang Y, Wang R, Lu J, Nandi S, Mohanty S, Terhune J, Liu Z, Peatman E: RNA-seq analysis of mucosal immune responses reveals signatures of intestinal barrier disruption and pathogen entry following Edwardsiella ictaluri infection in channel catfish, Ictalurus punctatus. Fish Shellfish Immunol. 2012, 32: 816-827. 10.1016/j.fsi.2012.02.004.View ArticlePubMedGoogle Scholar
- Sigel S, Bunk S, Meergans T, Doninger B, Stich K, Stulnig T, Derfler K, Hoffmann J, Deininger S, von Aulock S: Apolipoprotein B100 is a suppressor of Staphylococcus aureus-induced innate immune responses in humans and mice. Eur J Immunol. 2012, 42: 2983-2989. 10.1002/eji.201242564.View ArticlePubMedGoogle Scholar
- Kesavalu L, Lucas A, Verma R, Liu L, Dai E, Sampson E, Progulske-Fox A: Increased atherogenesis during Streptococcus mutans infection in ApoE-null mice. Dent Res J. 2012, 91: 255-260. 10.1177/0022034511435101.View ArticleGoogle Scholar
- Peterson MM, Mack JL, Hall PR, Alsup AA, Alexander SM, Sully EK, Sawires YS, Cheung AL, Otto M, Gresham HD: Apolipoprotein B is an innate barrier against invasive Staphylococcus aureus infection. Cell Host Microbe. 2008, 4: 555-566. 10.1016/j.chom.2008.10.001.PubMed CentralView ArticlePubMedGoogle Scholar
- Morgan RW, Sofer L, Anderson AS, Bernberg EL, Cui J, Burnside J: Induction of host gene expression following infection of chicken embryo fibroblasts with oncogenic Marek's disease virus. J Virol. 2001, 75: 533-539. 10.1128/JVI.75.1.533-539.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Wozniak MA, Shipley SJ, Dobson CB, Parker SP, Scott FT, Leedham-Green M, Breuer J, Itzhaki RF: Does apolipoprotein E determine outcome of infection by varicella zoster virus and by Epstein Barr virus?. Eur J Hum Genet. 2007, 15: 672-678. 10.1038/sj.ejhg.5201812.View ArticlePubMedGoogle Scholar
- Shifman S, Kuypers J, Kokoris M, Yakir B, Darvasi A: Linkage disequilibrium patterns of the human genome across populations. Hum Mol Genet. 2003, 12: 771-776. 10.1093/hmg/ddg088.View ArticlePubMedGoogle Scholar
- Buckland PR: Allele-specific gene expression differences in humans. Hum Mol Genet. 2004, 13: R255-R260. 10.1093/hmg/ddh227.View ArticlePubMedGoogle Scholar
- Bell CG, Beck S: Advances in the identification and analysis of allele-specific expression. Genome Med. 2009, 1: 56-10.1186/gm56.PubMed CentralView ArticlePubMedGoogle Scholar
- Wray GA: The evolutionary significance of cis-regulatory mutations. Nat Rev Genet. 2007, 8: 206-216. 10.1038/nrg2063.View ArticlePubMedGoogle Scholar
- Milani L, Lundmark A, Nordlund J, Kiialainen A, Flaegstad T, Jonmundsson G, Kanerva J, Schmiegelow K, Gunderson KL, Lönnerholm G: Allele-specific gene expression patterns in primary leukemic cells reveal regulation of gene expression by CpG site methylation. Genome Res. 2009, 19: 1-11.PubMed CentralView ArticlePubMedGoogle Scholar
- Petronis A: The genes for major psychosis: aberrant sequence or regulation?. Neuropsychopharmacology. 2000, 23: 1-12. 10.1016/S0893-133X(00)00127-5.View ArticlePubMedGoogle Scholar
- Morgan HD, Sutherland HG, Martin DI, Whitelaw E: Epigenetic inheritance at the agouti locus in the mouse. Nat Genet. 1999, 23: 314-318. 10.1038/15490.View ArticlePubMedGoogle Scholar
- Kato MV, Shimizu T, Nagayoshi M, Kaneko A, Sasaki MS, Ikawa Y: Genomic imprinting of the human serotonin-receptor (HTR2) gene involved in development of retinoblastoma. Am J Hum Genet. 1996, 59: 1084-PubMed CentralPubMedGoogle Scholar
- Koivisto U-M, Palvimo JJ, Jänne OA, Kontula K: A single-base substitution in the proximal Sp1 site of the human low density lipoprotein receptor promoter as a cause of heterozygous familial hypercholesterolemia. Proc Natl Acad Sci. 1994, 91: 10526-10530. 10.1073/pnas.91.22.10526.PubMed CentralView ArticlePubMedGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q: Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat Biotechnol. 2011, 29: 644-652. 10.1038/nbt.1883.PubMed CentralView ArticlePubMedGoogle Scholar
- Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22: 1658-1659. 10.1093/bioinformatics/btl158.View ArticlePubMedGoogle Scholar
- Huang X, Madan A: CAP3: a DNA sequence assembly program. Genome Res. 1999, 9: 868-877. 10.1101/gr.9.9.868.PubMed CentralView ArticlePubMedGoogle Scholar
- Brockman W, Alvarez P, Young S, Garber M, Giannoukos G, Lee WL, Russ C, Lander ES, Nusbaum C, Jaffe DB: Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res. 2008, 18: 763-770. 10.1101/gr.070227.107.PubMed CentralView ArticlePubMedGoogle Scholar
- Robinson MD, Oshlack A: A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010, 11: R25-10.1186/gb-2010-11-3-r25.PubMed CentralView ArticlePubMedGoogle Scholar
- Kal AJ, van Zonneveld AJ, Benes V, van den Berg M, Koerkamp MG, Albermann K, Strack N, Ruijter JM, Richter A, Dujon B: Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources. Mol Biol Cell. 1999, 10: 1859-1872. 10.1091/mbc.10.6.1859.PubMed CentralView ArticlePubMedGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Stat Methodol. 1995, 57: 289-300.Google Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The sequence alignment/map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079. 10.1093/bioinformatics/btp352.PubMed CentralView ArticlePubMedGoogle Scholar
- Kofler R, Pandey RV, Schlötterer C: PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics. 2011, 27: 3435-3436. 10.1093/bioinformatics/btr589.PubMed CentralView ArticlePubMedGoogle Scholar
- Fisher RA: On the interpretation of χ 2 from contingency tables, and the calculation of P. J R Stat Soc. 1922, 85: 87-94.View ArticleGoogle Scholar
- Liu S, Zhou Z, Lu J, Sun F, Wang S, Liu H, Jiang Y, Kucuktas H, Kaltenboeck L, Peatman E: Generation of genome-scale gene-associated SNPs in catfish for the construction of a high-density SNP array. BMC Genomics. 2011, 12: 53-10.1186/1471-2164-12-53.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.