Development of a chicken 5 K microarray targeted towards immune function

Background The development of microarray resources for the chicken is an important step in being able to profile gene expression changes occurring in birds in response to different challenges and stimuli. The creation of an immune-related array is highly valuable in determining the host immune response in relation to infection with a wide variety of bacterial and viral diseases. Results Here we report the development of chicken immune-related cDNA libraries and the subsequent construction of a microarray containing 5190 elements (in duplicate). Clones on the array originate from tissues known to contain high levels of cells related to the immune system, namely Bursa, Peyers patch, thymus and spleen. Represented on the array are genes that are known to cluster with existing chicken ESTs as well as genes that are unique to our libraries. Some of these genes have no known homologies and represent novel genes in the chicken collection. A series of reference genes (ie. genes of known immune function) are also present on the array. Functional annotation data is also provided for as many of the genes on the array as is possible. Conclusion Six new chicken immune cDNA libraries have been created and nearly 10,000 sequences submitted to GenBank [GenBank: AM063043-AM071350; AM071520-AM072286; AM075249-AM075607]. A 5 K immune-related array has been developed from these libraries. Individual clones and arrays are available from the ARK-Genomics resource centre.


Background
In recent years, the tools available to the field of chicken genomics have increased greatly. Detailed genetic and physical maps have been constructed [1], as well as BAC contig maps [2,3] and a radiation hybrid panel [4]. There is also a substantial EST collection [5], SNP database and many full-length cDNAs have been sequenced. The development of these resources has culminated with the recent publication of the chicken draft sequence [6]. The chicken can now be regarded as an important model organism for use in comparative genomics, residing in a potentially informative position in the evolutionary ladder. The chicken is also an extremely useful model for developmental biologists and geneticists as well as being a commercially important species.
The latest tools being developed for the chicken are microarrays. There are several small tissue-specific arrays being used by individual labs. These include an intestinal array (3,072 clones) [7], a macrophage-specific array (4,906 clones) [8], a lymphocyte array (3,011 clones) [9] and an 11 K array based on genes found in heart progenitor cells [10]. A 13 K genome-wide array is also available from ARK-Genomics [11] (Roslin, UK) and from the Fred Hutchinson Cancer Research Centre (Seattle, USA) [12]. We have designed a 5 K immune-related array created from libraries developed from tissue (Bursa, spleen, Peyers patch, thymus) from birds which were previously inoculated with a combination of different vaccines to various common avian diseases including bacterial, protozoa and virus disease-causing organisms (E. coli, Newcastle Disease Virus (NDV), Infectious Bursal Disease Virus (IBDV), coccidiosis, Marek's Disease (MD) and salmonella). The tissues we chose are highly representative of T and B cell populations and were used in order to optimise the numbers of immunologically -related genes that would be present in our libraries. Many known immune genes that have been recently identified in the chicken EST collections [13] have also been added to the array. This array provides a valuable, cost-effective resource for the investigation of immunological gene expression. It has been created from a pool of stimulated immune tissues and contains genes that represent a wide spectrum of immune functions as well as previously unidentified sequences. Each gene on the array is also functionally annotated as much as possible. Gene ontology [14] data and Blast [15] information is provided for each clone, where that information is available.

Construction of the array
Six immune-related libraries were specifically developed for the construction of a 5 K chicken array. Immune tissue from birds inoculated with different vaccine regimes (see Methods) was used to develop two standard libraries. These both underwent two rounds of normalization, thus providing us with six libraries. Initially, 10,173 clones were randomly chosen from the libraries for sequencing. The number chosen from each library depended on the titre (colonies/microlitre) of that particular library. The 10,173 clones that were sequenced were searched for poor quality sequence (<100 bp after removal of vector, repeats etc.) and unwanted Blast homologies, as described in the Methods section. The numbers of high quality sequences (9,434 -which have been submitted to Genbank) from each library are shown in Table 1. Cluster analysis was then undertaken, which resulted in the grouping of clones from which we would choose the 5,000 that were to be represented on the array.

Genes on the array
The clones on the array are derived from custom-made immune-related chicken cDNA libraries. Libraries developed from tissue from Bursa, spleen and Peyers patch were our representative 'B cell' libraries, and libraries developed from thymus were so-called 'T cell' libraries (the names 'B and T cell libraries' are used purely for ease of reference and in no way indicate that the libraries consist of pure cell populations). Clones from both standard and normalized libraries are present on the array. One clone representing each of the 3,811 clusters is included on the array, along with a random selection of singleton clones (1,067). The sequence of each of the clones was also subjected to a Blast search of the SwissProt and TREMBL databases and the highest hit to each sequence was reported. Searches were carried out at a stringency of 1e -10 (this relatively low stringency was to ensure that we identified as many immune homologies as possible). Chicken immune genes have a relatively low level of sequence conservation with mammals, hence the lower stringency used in these searches). We wanted to ensure that certain genes were also represented on the array as 'reference' genes. This included a range of known immune-related genes for which a clone was already available -either from the existing EST databases [12] or from our novel libraries. Various cytokines, chemokines, cell surface antigens, receptors and MHC molecules were all included ( Table 2). The expression profile of genes of unknown function can thus be compared with the profiles of these genes whose roles are known. Standard array controls were also spotted on the array, including various spot report buffers (positive and negative controls for the Cy3 and Cy5 dyes), salmon sperm DNA, calf thymus DNA, bovine genomic DNA (negative controls), chicken genomic DNA, gamma actin and GAPDH (positive controls). Each clone is represented in duplicate.

Analysis of the immune clones
All the sequences of the clones on the array were subject to Blast homology searches against the SwissProt and TREMBL databases using a cut-off value of 1e -10 . Using 1,600 Chicken immune 4 ('T cell' standard) 2,563 Chicken immune 5 ('T cell' normalized 1) 918 Chicken immune 6 ('T cell' normalized 2) 984 CTst_C0000877n08.q1kT7SCF C0000877N8 AM069687 The genes in bold come from the immune libraries described in this paper this means of detection, many known immune-related molecules were identified, including cytokines, interferons, interleukins, transcription factors, receptors, cell differentiation antigens, MHC molecules and genes for proteins belonging to the TOLL receptor pathway. Proteins homologous to hypothetical human proteins and mouse cDNAs were also identified.
Sequences, which gave no Blast homology to anything in the nucleotide or protein databases, accounted for about 38% of the clones. Either the search parameters were too stringent to identify these genes or the chicken sequence was sufficiently divergent to be undetectable in a standard Blast search. This is a common feature of immune-related genes, and it is often very difficult to identify such genes by sequence homology to mammalian homologues. Some of these sequences may also represent non-conserved 3' UTR regions of genes. This set of clones may also include genes that have never been identified before and are not represented in the sequence databases. Further, more detailed analysis of these sequences can sometimes help elucidate the nature of the gene in question. Protein sequences can be predicted from the EST nucleotide sequence using programs such as ESTscan [ [16] and [17]], which takes in to account sequencing errors and thus potential frame-shift mutations which are often present when there is only one EST sequence available for study. Conserved motifs and domains can then sometimes be identified for example, using the Pfam database [18], which is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families. PSI-Blast searches can also help identify to which type of family a gene will belong.
During clustering analysis, our 10,000 immune sequences were aligned with 398,000 existing chicken ESTs. This highlighted 3,845 clusters that contained one or more sequence from our immune libraries and 1,959 singleton clones. This analysis also identified 40 novel clusters that only contained sequences from our new libraries. Upon Blast analysis, 7 of these clusters were found to represent known chicken genes (initially appearing unique as they aligned to a different part of the gene sequence from existing ESTs), 18 showed homology to genes in other species and 15 clusters proved to have no known homology to anything currently in the databases. At the time, we searched against 398,000 existing chicken ESTs. Now however, there are currently 550,510 chicken ESTs in the databases (dbEST release 080505). A current search has shown that 9 of our sequences are indeed still unique to our libraries and have no known identifiable homologue, although two of the sequences do show some similarity to two predicted chicken sequences (AM065333 and the hypothetical protein XP_429359; AM065802 and the predicted P114-RHO-GEF protein XP_418249). Eight of these sequences are identifiable in the whole genome sequence, as shown in Table 3.

Gene ontology (GO) annotations
In order to try and elucidate the function of the genes on the array further, we tried to assign as much annotation to the sequences as possible. GO annotations were assigned to some sequences after searching the GGI and UMIST databases [19], while other annotation was derived from hits to orthologous human sequences from the ENSEMBL [20] and GENSCAN [21] databases, as described in the 'methods' section. Having annotation derived from orthologous human genes means that cross-species comparisons between chicken and human array data may be possible. A search of the ENSEMBL database provided information on 2,292 GO-term associations, the GGGI database 1,542 and GENSCAN 566, while the UMIST fulllength cDNA database provided a further 365 annotations. The sequences on the array cover a total of 227 GO terms, with 73% of all the sequences having at least one GO entry assigned to it. The available annotation for the array sequences is broken down as follows: 52% of genes have a 'cellular component' term assigned, 60% have 'molecular function' and 56% of sequences have the 'biological process' described. 83% of all the genes on the array have some kind of gene description and after searching each sequence against the sequences in the Ensembl chicken genome collection (July 2005 genebuild [22]), 78% of sequences were found to have a known chromosomal location. Now that all these sequences have been added to GenBank and thus have an accession number which can be directly linked into the ENSEMBL databases (work currently underway), obtaining comprehensive, up-to-date annotation data will become much easier.
A file showing the complete annotation for all the sequences on the array is available as supplementary material (Additional file 1). However, Additional file 2 provides an overview of the broad functional classes that are represented by the genes on the array. These are based on more general GO annotations derived from the GOslims database at EBI, and allow us an insight into the different classes of genes present on the array without having to look at detailed functional annotation for each individual gene.
Annotation is also available for some (9,137) of the ESTs in the UMIST collection. By comparing the relevant GO slims [23] terms for the sequences in this collection with those present on our array, we are able to see which types of genes appear to be enriched in our set, compared with a larger, more general collection of EST sequences. As can be seen (shown in bold) in Additional file 2, certain classes of gene appear to be more highly represented. For instance, genes involved in protein transport are more abundant in our set of clones, as are those involved in the response to stimulus. This is consistent with our attempts to pre-select for higher numbers of genes involved in the immune system.

Quality of the array
To assess the quality of the array, various hybridization comparisons were undertaken. Three different conditions were addressed: 1). self v self 2). biological replicate A v biological replicate B and 3). Control sample v activated sample. Dye swap experiments were also carried out for conditions 2 and 3. The 'self' sample was a reference RNA consisting of a pool of various chicken lung samples. The biological replicates were lung samples from two 6-weekold chickens that had not been treated or challenged in any way. In the third group of hybridisations, the 'control' sample was from a similarly, untreated bird and the 'activated' sample was obtained from the lungs of a bird that had been challenged with the avian influenza strain H9N2 five days previously. The graphs in Fig 1 show the tight correlation between self/self (R 2 = 0.9273) and between replicates (R 2 = 0.8766), whereas a much higher level of variance is seen when an activated sample is compared against a control (R 2 = 0.7601).
The boxplots in Fig 2 also demonstrate the differing variances between the comparisons. The greatest variance is shown for the activated animals compared with the controls as would be expected. Regression analysis for each of the data sets confirm the increased variance with correlation coefficients of r = 0.872 for activated samples, r = 0.936 for replicate samples and r = 0.963 for self/self sample data sets.

Using the array
This array is available from the Ark-Genomics resource facility at Roslin Institute, providing an immune-focused array which, for anyone interested in immune-research, offers a much more cost-effective and time-saving platform for gene expression experiments, instead of using the large oligo arrays which have thousands more genes, many of which will be of no interest. Analysis of data is also thus much easier and far less time-consuming. Information on the array has been deposited in ArrayExpress (Accession: A-MEXP-307) [ [24] and [25]] (Additional file 1) and very soon all the sequences will be submitted to the Ensembl database with links to all the GO annotation information in the GOA database [26].

Conclusion
We have constructed a 5 K chicken cDNA microarray, which is highly selected for genes expressed in tissues which have an immune function. This targeted array contains enough widely-expressed genes (whose expression won't be changing) to enable good normalization, as well as containing numerous known immune genes (from our novel libraries and from existing EST collections). The array also contains many genes with as yet unknown homology and function as well as a few novel genes which are specific to the libraries from which the array was created. These genes of unknown function could well have a role in either the adaptive or innate immune response, and thus provide a valuable resource for analysis of gene expression changes occurring in birds that have been subject to immune challenge. The array has been proven to provide highly reproducible results and is now available to the chicken/microarray community as a whole.

Library construction
Six libraries were constructed at Incyte Genomics (Palo Alto, CA): a standard and 2 normalized Bursa/spleen/Peyers patch libraries and a standard and 2 normalized thymus libraries. cDNA synthesis was initiated using an oligo (dT) primer, using methylated C in the first strand synthesis reaction. Following this first strand reaction, doublestranded cDNA was blunted, ligated to NotI adapters, digested with EcoRI, size-selected, and cloned into the NotI and EcoRI compatible sites of a custom modified MCS of the pBluescript (KS+) vector. Normalization was done in two rounds using conditions adapted from [ [27] and [28]] except that a significantly longer re-annealing hybridization was used. Around 10,000 clones were then sequenced at the Sanger Institute according to their protocols. Using the T7 primer, sequence was generated from the 5' end of each clone by the dideoxy chain termination method using an ABI 3700 sequence analyser (Applied Biosystems, Foster City, CA).

EST sequence analysis
Bioinformatic analysis commenced with 10,173 sequences. After eliminating poor quality sequence and Scatter plots showing the variance between A) Figure 1 Scatter plots showing the variance between A). self/self hybridisation B). two biological replicates and C). a control sample compared with an activated sample. Very little spread is seen with the self/self hybridisation and between the two replicates, as would be expected. However, differences in gene expression can be seen between the activated and control samples. . Slides were then treated using succinic anhydride and 1-methyl-2-pyrrolidinone (Sigma, Poole, UK) to block unbound amino groups, followed by a wash in 95°C MilliQ water before hybridisation.

RNA preparation and labelling
Total RNA was isolated from lung tissue using a Trizol extraction according to the manufacturer's protocol (Invitrogen, Paisley, UK) and subsequently purified using the RNeasy Midi RNA Purification kit (Qiagen Ltd., Crawley, UK). RNA concentration was determined spectrophotometrically and RNA quality was determined using an Agilent 2100 Bioanalyser (Agilent Technologies, Waldbronn, Germany). Cy3 or Cy5 was incorporated into each sample using the Fairplay labelling kit (Stratagene, La Jolla, CA) and the labelled cDNA cleaned-up after passage through DyeEx columns (Qiagen Ltd., Crawley, UK). Labelling efficiency was determined by running 0.5 μl of each sample on a 1% agarose gel and measuring the intensity of fluorescence on a GeneTac LS IV scanner (Genomic Solutions, Huntingdon, UK).

Hybridizations
Microarray hybridizations were carried out overnight using a GeneTAC automated hybridization system [37] (Genomic Solutions, Huntingdon, UK). Hybridizations (125 μl) were carried out in Genomic Solutions hybridization solution (Cat. no. RP#0025) in a stepped hybridization: 55°C for 3 hr, 50°C for 3 hr and then 45°C for 12 hr. Slides were then washed in Genomic Solutions wash buffers (Cat. nos. CS#0038, CS#0039 and CS#0040). Upon removal from the hybridization stations, slides were washed for 1 min in Post-Wash buffer (CS#0040) and a further minute in isopropanol, followed by centrifugation at 1000 rpm for 6 min. Dried slides were scanned in a Scanarray 5000 scanner (GSI Lumonics, Rugby, UK) fitted with Cy3 and Cy5 filters.
Box plots showing the variance between self/self hybridisa-tion, two biological replicates and a control sample com-pared with an activated sample Figure 2 Box plots showing the variance between self/self hybridisation, two biological replicates and a control sample compared with an activated sample. Boxes represent the interquartile range from 25-75%, with the median marked.
Outliers to this range are also shown.
self/self replicates activated/control