Experimental Animals and GeneChip® Studies
Sprague Dawley rats used for these studies were described in detail previously [3]. Rats at various ages were fed modified AIN-93G rodent diets (Dyets Inc.; Bethlehem, PA), which contained either 198 ppm Fe (DYET# 115135; control diet) or 3 ppm Fe (DYET# 115102; low Fe diet). The diets were identical except for the addition of pure ferric citrate to the control diet. Rats were supplied with food and tap water ad libitum. For all studies, only male rats were used and groups of 3–5 animals were considered as one group (n = 1).
GeneChip® experiments were also described in detail previously [2, 3]. Briefly, RNA was purified from duodenal and jejunal scrapes and enzymatically converted to cRNA for reaction with Affymetrix gene chips (RAE230A and RAE230B). Experiments at each age were repeated three times with samples derived from separate groups of control and experimental rats. GeneChip® data have been deposited in the GEO repository with accession # GSE1892.
GeneChip® Data Processing and Analysis
Six experimental groups were studied along with corresponding control groups: duodenum of suckling (8-days-of-age), weanling (21-days-of-age), adolescent (6 weeks-of-age) and adult rats (12- and 36 weeks-of-age), and jejunum of the 12-week-old group only. For data generated from 72 Affymetrix Gene Chips (2 chips [RAE230A and RAE230B] × 6 groups × 2 conditions [control and iron-deficient] × 3 repetitions = 72 chips), we utilized the default RMA function included in the "Affy" package of Bioconductor in the R statistical computing environment [21]. This default function employs median polish for expression summary and quantile normalization for data normalization. We also used MAS5.0 "present calls" to filter out probe sets whose expression intensities were close to the background noise across the majority of the samples, before performing the differential gene analysis. We applied the filtering of at least two "present calls" out of 3 replicated samples in either the control or iron-deficient rat groups. This led to a 52.5% – 57.4% data reduction for the comparisons. To detect gene expression values significantly different between groups, we employed SAM software [22] by controlling the false discovery rate (FDR) < 0.04 and expression fold changes > 1.5.
Clustering Analysis
For clustering analysis, we used 1484 probe sets whose expression values were significantly different between the control and experimental groups in at least one postnatal developmental stage of duodenum or jejunum. We built a 1484 × 6 matrix of average fold change between iron-deficiency and control rat groups and used this as input for clustering analysis. Two clustering algorithms were used; K-mean clustering (cluster number set at 3) and Hierarchical clustering, with Euclidean distance as the distance matrix for both algorithms. Based on biological knowledge of the genetic effect of iron-deficiency in gut function [2, 3, 23], two of the resulting clusters were selected for further analyses; one cluster of up-regulated genes from K-mean clustering contained 228 probe sets, representing ~163 unique genes, and another cluster of up-regulated genes from Hierarchical clustering (called Cluster 7) contained 29 probe sets, representing 25 unique genes.
Search for Iron-Response Elements (IREs)
For genes in the above two targeted clusters, we obtained the corresponding rat mRNA sequences from GenBank, searching in particular for ones that contained full 3' untranslated regions (UTRs) (as evidenced by the poly A tail). For the predicted rat genes without an identifiable 3' UTR, we used the orthologous mouse genes as replacements. In the case of genes with multiple reference sequences, we compared the mRNA sequences independently for each gene to get the one with the longest 3' UTR (which was more likely to be full length). The resulting 3' UTRs of all genes in each cluster were submitted to UTRscan [24] to identify putative IREs.
Acquiring Promoter Sequences
We downloaded promoter sequences within 1-kb upstream of the annotated transcription start site (TSS) for each gene in the target clusters for rat, mouse, and human from Cold Spring Harbor Laboratory Mammalian Promoter Database [25]. For genes with multiple mRNA reference sequences, we selected the one with the TSS determined from experimentally identified promoter regions (Database of Transcriptional Start Sites) [26]. In the cases of a reference sequence with alternative promoters, we selected the one defined as the "best" by Xuan et. al. [25]. Out of 163 unique up-regulated genes from K-mean clustering, we were able to obtain promoter sequences for 133, 144, and 159 genes for rat, mouse, and human, respectively. For the 25 unique up-regulated genes from hierarchical clustering, we were able to get promoter sequences for 20 rat, 23 mouse, and 21 human genes. For background sequences, we used randomly selected promoter sequences from Cold Spring Harbor Laboratory within 1-kb upstream of the TSS for 9,201 rat, 9,475 mouse, and 12,367 human unique mRNA RefSeq genes.
Enrichment Analysis of Transcription Factor Binding Sites
We employed the program CLOVER [9] and 565 vertebrate position weight matrices (PWM) in TRANSFAC (version 9.1) to conduct a search for enriched single TFBSs. Three sets of promoter sequences were used for enrichment analysis of the 163 up-regulated genes from K-mean clustering: 1) 133 rat genes, 2) 144 mouse genes, and 3) 159 human genes. We set the parameters of CLOVER for 1,000 randomizations and a p-value threshold of 0.05. For the estimation of p-values, we supplied 3 sets of background sequences to the algorithm, which estimates p-values for each background set separately (only p values from the species corresponding to that for the experimental promoters are reported in the Results section). The background sequences included were promoter sequences from rat, mouse, and human (described above).
We employed a 3-step approach to control for false positives. We first selected TFBSs with p-values < 0.05 across all 3 sets of background sequences, and then computed the average p-values for each TFBS in each set of promoter sequences. Since TRANSFAC contains multiple PWMs, we then applied the method of false discovery rates for multiple test correction to adjust p-values as shown in the following formula:
where N is the number of PWMs from TRANSFAC, and R is the ascending rank order of the respective p-value at a certain cutoff P
c
. We selected those TFBSs with q-value < 0.1, for the average p-values from the 3 sets of promoter sequences. Finally, we selected the enriched TFBSs which intersected among the above selected rat, mouse, and human comparisons.
Synergistic TFBS analysis
To predict whether any two individual TFs may interact to co-regulate genes, we sought to determine if any combinations of TFBSs were present in the same promoter sequence of the genes in our cluster more often than in a randomly selected group of unrelated gene promoters. We first obtained background probability of TFBS pairs in randomly selected rat, mouse, or human promoter sequences (PS) within 1-kb upstream of the TSS according to the following formula:
For this analysis, we utilized 9,201 promoter sequences for rat, 9,475 for mouse, and 12,367 for human, downloaded from Cold Spring Harbor Laboratory. We then computed the number of genes whose promoter sequences contain combinations of two TFBSs in the cluster. Finally, we calculated the probability of observing an equal or larger number of gene promoters in the cluster with both TFBSs than in randomly selected promoter sequences by chance, by summing the binomial distribution probabilities:
where n is the number of genes in the cluster with both TFs, and N is the total number of genes in the cluster. We selected TFBS pairs with p < 0.05 across rat, mouse, and human.
qRT-PCR
Male Sprague-Dawley rats were acquired at 3-weeks-of-age and fed either a control diet with 198 ppm Fe or an identical diet with 3 ppm Fe. When animals were ~12-weeks-of-age, they were sacrificed by CO2 narcosis followed by cervical dislocation. The low Fe animals were obviously iron-deficient as evidenced by pale eyes and extremities, piloerection and pale internal organs. Animals on this dietary regiment also exhibit microcytic, hypochromic iron-deficiency anemia, characterized by decreased hematocrit and hemoglobin levels [2, 3]. Mucosal scrapes were taken from the duodenum, consisting of ~20 cm of the small intestine distal to the pyloric sphincter, and snap frozen in liquid nitrogen and subsequently stored at -80°C. Tissue from 2–3 animals in the iron-deficient or control groups was pooled and RNA was purified by Trizol reagent. RNA was quantified by UV spectrophotometry and was enzymatically converted to cDNA using the iScript kit (BioRad; Hercules, CA). cDNA was subsequently used as a template for real-time PCR using SYBR Green mix (BioRad) and primers specific for SP6. One primer set was designed based upon the target sequence on the Affymetrix Rat Genome RAE230B array (probe set ID 1381534_at) and one primer set was designed based upon the rat SP6 cDNA sequence in the GenBank (Accession # XM_001081357). The primer sequences were as follows: SP6 forward primer 1 5'-TGT-GCT-ACC-AAG-ACA-ACC-TT-3'; SP6 reverse primer 1, 5'-AAG-TGG-GTT-CAC-AGC-AGT-T-3'; SP6 forward primer 2, 5'-GGA-CAT-GTC-ACA-CCA-CTA-CGA-ATC-3'; SP6 reverse primer 2, 5'-ACA-GAG-CTG-CTC-GTC-TCC-GA-3'. 18S rRNA primers were utilized as constitutive controls. These primer sequences were as follows: 18S forward primer, 5'-TAC-CTG-GTT-GAT-CCT-GCC-A-3'; 18S reverse primer, 5'-TCC-AAG-GAA-GGC-AGC-AGG-C-3'. 18S levels were very similar between all groups (not shown), indicating that 18S expression was not affected by iron status of the animals. All primers had no sequence homology with other known genes, as determined by BLAST searches. Preliminary experiments comparing DNAse treated to non-DNAse treated samples showed no differences in SP6 amplification, demonstrating that amplicons were not the result of amplification from genomic DNA. This control was especially important as the SP6 gene does not contain introns (NCBI GeneID: 363672). PCR amplification parameters were 42 cycles of 58°C annealing for 20 seconds and 72°C extension for one minute. Melt curves were routinely run and single peaks were detected indicating that only one template was being amplified (not shown).
Each RT reaction was analyzed in duplicate for both 18S and SP6 in each experiment. Then, the 18S average was subtracted from the SP6 average to generate the ΔCt value. Data were analyzed by routine methods. Briefly, ΔΔCt values from each gut segment were calculated from SP6 Ct and 18S Ct for the iron-deficient groups vs. the control groups. The ΔΔCt was the exponent of 2 for mean fold induction; its standard deviation was the exponent of 2 as an estimate of range. Statistical analyses were done by t test.