Transcriptome sequencing has become an increasingly popular means of developing genomic resources for non-model organisms. However, to sample a transcriptome exhaustively usually requires animals to be sacrificed so that mRNA can be harvested from multiple tissues. To circumvent this problem in a natural population of Antarctic fur seals, we 454 sequenced cDNA from various tissues obtained at necropsy from nine adult males that died of natural causes. This appears to have been remarkably successful, leading to measurable increases in the number and diversity of transcripts characterised, a greater representation of transcripts involved in immunity, and many more genetic markers discovered.
It is important to acknowledge that this study is not the first to apply high throughput sequencing to tissues obtained after death. For example, Cherel et al.
 recently explored links between gene expression and meat quality traits in recently slaughtered pigs, and Kang et al.
 characterised the developmental progression of gene expression patterns in the human brain. However, such studies are few and far between, and tend to focus on humans and their companion species. Our study is novel in that we have harvested multiple organs from animals dying of natural causes in a natural population of a non-model organism in order to generate an exhaustive as possible de novo transcriptome assembly.
Given that some of our samples were obtained from adult males that had died up to thirty hours prior to necropsy (Table
1), we initially held concerns over the quantity of the RNA that could be extracted and whether this could have downstream impacts on the amount of resulting 454 sequence data. As anticipated, significant variation was found in the amount of total RNA that could be extracted from the different tissues, the more fibrous heart yielding the lowest average concentration despite our having used a bead mill. However, there was no clear relationship between RNA yield and the time elapsed between death and necropsy. Moreover, 454 read lengths were actually longer on average than previously obtained for the skin transcriptome. Although this is almost certainly due to our having switched from FLX to FLX+ sequencing chemistry, it nevertheless appears that using RNA from recently dead animals did not have a major detrimental impact on sequence length.
An important caveat to the above is that organs were harvested shortly after death (1–30 hours, mean = 13.9 hours) in order to minimise the risk of RNA degradation. Moreover, the prevailing climate at Bird Island during the austral summer is cool (usually between 1.5 and 4.5 degrees centigrade). It remains unclear how much high-quality RNA will be retained in carcasses at more advanced stages of decomposition, nor how dependent RNA degradation will be upon ambient temperature, making it difficult to extrapolate to other study systems. To evaluate this in fur seals would require significant further effort, since carcasses would need to be repeatedly sampled at different time intervals, generating multiple cDNA pools for sequencing. Moreover, such an experiment would ideally evaluate the importance of factors such as carcass size and prevailing environmental conditions.
For the previous assembly based on skin samples
, we employed the cDNA option within Roche Newbler assembler version 2.3. However, a recent paper comparing the merits of five different assemblers found that this program performed poorly, generating the assembly with the greatest sequence redundancy and lowest proportion of mappings to reference sequences
. In contrast, a pre-release version of Newbler 2.5 was identified as the joint top performing assembler, with further improvements being obtained through the concurrent use of more than one assembler. To obtain the best possible quality transcriptome, we therefore assembled our data consecutively with Newbler 2.5 and CAP 3. After filtering on the basis of size and annotation, we obtained 23,096 contigs, which is considerably more than the equivalent value of 14,271 obtained when the original skin data were re-assembled and filtered based on the same criteria. Taken together with the limited degree of overlap between the necropsy and skin assemblies, this suggests that our improved transcriptome is larger and contains numerous transcripts that had not previously been described. This is consistent with previous studies that have found tissue-specific patterns of gene expression
Quality filtering for size and annotation led to many small contigs being discarded from the final assembly. However, this was reflected in a greatly improved rate of annotation (65.3% of contigs yielded BLAST hits with an e-value cutoff of 1e-10) relative to the original skin transcriptome (47.0% of contigs yielded BLAST hits with an e-value cutoff of 1e-4). As found previously, the majority of matches were to vertebrates, with the giant panda and the dog being the two most frequently represented species. Interestingly, more matches were obtained to the panda than the dog for the necropsy and skin assemblies, but this pattern was reversed for the combined assembly. This difference may reflect the fact that the combined assembly contains substantially more larger transcripts than either the necropsy or skin assemblies (number of transcripts >1,000bp in length = 7,780, 3,789 and 4,320 respectively) due to the greater total number of reads available.
Reassuringly, the proportion of BLAST matches to bacterial sequences was actually lower for the necropsy than the skin assembly (0.5% versus 2.1%). This not only implies that we managed to avoid contaminating the samples during the necropsy procedure, but may also indicate that the seals we necropsied were not suffering from heavy bacterial infections at the time of death. The difference between the two assemblies if genuine would also be consistent with a previous study that isolated a diverse bacterial assemblage from fight wounds and lesions in adult males of this species
. Thus, the greater representation of bacterial sequences in skin could quite possibly reflect the relatively unsanitary conditions of life in a crowded fur seal breeding colony.
Although it was not critical that our transcriptome be perfectly representative of normal patterns of gene expression, we initially held concerns that a transcriptome derived from necropsy samples could be dominated by transcripts involved in cell death. To evaluate this possibility, we therefore interrogated each of the assemblies, including the tissue-specific assemblies, for GO terms relating to apoptosis. Contigs with apoptosis-related GO terms were around twice as numerous in the necropsy than in the skin assembly. However, their overall percentage was low, at only 2.6% for the combined assembly and rising to a maximum of 4.3% for the heart and lung tissues. Thus, although our data are consistent with a certain degree of up-regulation of genes involved in apoptosis, there is no suggestion of a major bias being present.
This study was partly motivated by the limited representation of immune-related transcripts in our original skin assembly, together with the fact that many of these had insufficient depth of sequence coverage to allow SNPs to be identified with high confidence. Using the same approach as above, but this time filtering for transcripts based on GO annotations relating to immunity, we identified a total of 521 immune-related contigs, almost four times more than in the skin assembly. This improvement appears largely attributable to the inclusion of samples from the spleen, which carried by far the greatest proportion of immune-related transcripts, at 13.2%. However, the main causes of death in adult male Antarctic fur seals are fight wounds and pneumonia, raising the possibility that at least some of these individuals could have been mounting an immune response to bacterial infection prior to death. This is difficult to judge given the low proportion of sequence matches to bacteria, but could potentially have contributed towards the elevated representation of immune-related transcripts in the necropsy assembly.
Through comparative genomics with the dog, we also analysed the tissue-specific representation of transcripts revealing homology to 21 different canine MHC genes. All but three of these genes, including class I, II and III genes, were represented in at least one tissue. Consistent with immune-related contigs being four times more common in the spleen, the total number of transcripts per million mapping to MHC class I and II genes was also highest for the spleen, followed by the lung. A contrasting pattern was obtained for the MHC class III, with the lung having almost twice as many transcripts per million mapping as the spleen. This may reflect the role of class III genes in mounting the immune response via innate immunity, inflammation and immunomodulation
, processes that could conceivably be of greater importance in sensitive mucosal tissues such as the lung. Overall, we also found that MHC-related transcipts had the lowest representation in the skin, providing further justification for our having expanded and improved upon our original transcriptome.
Despite seals and dogs having diverged approximately 43 million years ago
, the two genomes appear to have retained large syntenic blocks
. Sequence homology is also strong enough to allow the flanking sequences of most pinniped microsatellites to be mapped to unique locations in the dog
[28, 40], making the canine genome a powerful resource for comparative purposes. We therefore took previous seal studies a step further by inferring the genomic distribution of 1,521,212 reads (61.1%), 17,121 contigs (74.1%), 2,119 microsatellite loci (81.8%) and 9,382 SNPs (58.0%) by reference to the dog. Much as expected, the resulting distributions appear relatively even (Figure
6), as supported by a strong positive correlation between the number of contigs mapping to a given chromosome and the length of that chromosome in the dog. Moreover, although the mapping locations should be regarded as putative, we have reasons to believe that a substantial proportion will be correct. For example, Hoffman et al.
 previously found that four microsatellite loci, described as putatively X-linked because they were homozygous in 84 males but carried the expected proportions of heterozygote genotypes in females, all mapped to the X chromosome in the dog
. Similarly, a SNP recently developed within a transcript revealing homology to mitochondrial NADH dehydrogenase revealed a pattern in which all individuals were called as homozygotes but for different alleles
. Nevertheless, to shed further light on synteny between the two genomes, it would be desirable to develop a high-density comparative linkage map. This is made feasible for the first time by the large number of genetic markers we have identified in this study (see below).
Mining the combined transcriptome assembly for microsatellites yielded marginally more markers than found in the original skin assembly (2,592 loci versus 2,271 loci respectively), although these numbers are not strictly comparable because different programs were used. A more obvious improvement was observed for SNPs, with SWAP454 identifying 2.5 times as many SNPs at the 'strict' level (1,585 versus 642) and 1.8 times more SNPs at the 'relaxed' level (11,454 versus 6,261). This is perhaps to be expected given the increased number and diversity of contigs and the improved depth of coverage, to which SNP discovery is particularly sensitive. Even more SNPs were identified by the Newbler mapping program (n = 14,574), with a clear peak in the parameter space corresponding to a MAF of around 0.3 and a depth of coverage of approximately 16x. This suggests that many of these SNPs may comfortably exceed the minimum selection criteria of at least three non-duplicate reads showing the variant and seven or more high-quality reads in total. One possible reason for the difference in the number of SNPs called by the two programs could be that SWAP454 relies on mapping the raw sequence reads back to a reference sequence, in this case the transcriptome assembly. This can lead to some loss of data due to incomplete mapping. Moreover, because the program only calls SNPs on the basis of reads that map reliably to a single contig, redundancy within the assembly, whether unintentional or due to the use of assembly methods that classify related contigs into 'isogroups' as constructed using the Newbler assembler, could potentially reduce the total number of SNPs called.