The utility of PacBio circular consensus sequencing for characterizing complex gene families in non-model organisms
© Larsen et al.; licensee BioMed Central Ltd. 2014
Received: 9 October 2013
Accepted: 8 August 2014
Published: 26 August 2014
Molecular characterization of highly diverse gene families can be time consuming, expensive, and difficult, especially when considering the potential for relatively large numbers of paralogs and/or pseudogenes. Here we investigate the utility of Pacific Biosciences single molecule real-time (SMRT) circular consensus sequencing (CCS) as an alternative to traditional cloning and Sanger sequencing PCR amplicons for gene family characterization. We target vomeronasal gene receptors, one of the most diverse gene families in mammals, with the goal of better understanding intra-specific V1R diversity of the gray mouse lemur (Microcebus murinus). Our study compares intragenomic variation for two V1R subfamilies found in the mouse lemur. Specifically, we compare gene copy variation within and between two individuals of M. murinus as characterized by different methods for nucleotide sequencing. By including the same individual animal from which the M. murinus draft genome was derived, we are able to cross-validate gene copy estimates from Sanger sequencing versus CCS methods.
We generated 34,088 high quality circular consensus sequences of two diverse V1R subfamilies (here referred to as V1RI and V1RIX) from two individuals of Microcebus murinus. Using a minimum threshold of 7× coverage, we recovered approximately 90% of V1RI sequences previously identified in the draft M. murinus genome (59% being identical at all nucleotide positions). When low coverage sequences were considered (i.e. < 7× coverage) 100% of V1RI sequences identified in the draft genome were recovered. At least 13 putatively novel V1R loci were also identified using CCS technology.
Recent upgrades to the Pacific Biosciences RS instrument have improved the CCS technology and offer an alternative to traditional sequencing approaches. Our results suggest that the Microcebus murinus V1R repertoire has been underestimated in the draft genome. In addition to providing an improved understanding of V1R diversity in the mouse lemur, this study demonstrates the utility of CCS technology for characterizing complex regions of the genome. We anticipate that long-read sequencing technologies such as PacBio SMRT will allow for the assembly of multigene family clusters and serve to more accurately characterize patterns of gene copy variation in large gene families, thus revealing novel micro-evolutionary patterns within non-model organisms.
KeywordsChemosensory genes Microcebus murinus Multigene family Pacific Biosciences Pheromone detection Single molecule real-time sequencing
Multigene families have played a fundamental role in the evolution of metazoan genomes [1–5]. Mechanisms such as gene duplication, gene conversion, and lineage diversification underlie multigene family complexity and contribute to genetic patterns that can be extremely difficult to molecularly characterize [6, 7]. Whereas processes such as positive selection and lineage diversification can yield gene copies of increasing nucleotide divergence, opposing processes such as gene duplication and gene conversion can yield copies that are so similar that they are virtually impossible to distinguish from sequencing error [8, 9]. The accurate characterization of gene copy number is fundamental to the differentiation of paralogy and orthology, and by extension, to the identification of heterozygotes versus homozygotes. This latter distinction is in turn central to determining the effects of genotype on phenotype, with the MHC (Major Histocompatibility Complex) gene family offering a classic example [10–13].
Given the intrinsic interest of accurate gene copy representation, it follows that methods of molecular characterization should be highly sensitive both to levels of low nucleotide diversity and to regions of high complexity. Unfortunately, such is not presently the case for organisms that lack a well-characterized genome: i.e., non-model organisms. Although low-coverage “draft” genomes are increasingly available for non-model organisms, these draft genomes are notoriously unreliable for accurate gene calling, particularly for regions of high genomic complexity [6, 14]. Thus, until such time that high-coverage, fully-assembled and annotated genomes are available for all species of interest, alternative molecular methods are desirable.
The vomeronasal organ (VNO) is the primary olfactory organ responsible for pheromone detection in mammals and two families of VNO G protein-linked receptors (V1R and V2R) allow for the recognition of different classes of chemosensory cues. V1Rs are typically encoded by a single coding exon (~900-1000 base pairs) and are distantly related to bitter taste receptors whereas V2Rs are encoded by multiple coding exons and are closely related to Ca2+-sensing receptors [19–21]. Much research has been directed to distinguishing V1R receptors, owing to their single coding exon and relatively short sequence length characteristics [18, 20, 22]. However, despite recent advancements in the understanding of V1R sequence diversity in mammals [18, 23, 24], relatively little is known about the intra-specific or intra-individual (i.e., intragenomic) V1R diversity of many non-model species. This is because V1R repertories are hypothesized to evolve rapidly and are likely lineage-specific, resulting in relatively few one-to-one orthologs between species [18, 25]. These factors make characterization of the V1R repertoire difficult for DNA sequence based studies because i) traditional approaches to sequencing large, closely related gene families can be time consuming and expensive (e.g. cloning and subsequent Sanger sequencing of PCR amplicons), and ii) short-read high-throughput sequence data of such gene families are difficult to assemble given high sequence similarity and the potential for multiple paralogs to exist throughout the genome.
Intra- and intergenomic comparisons of Microcebus murinus V1R subfamily diversity examined herein
M. murinus 1
CCS vs. draft genome1
CCS vs. draft genome1
M. murinus 2
CCS vs. Sanger2
M. murinus 1 vs. M. murinus 2
CCS vs. CCS
draft genome1 vs. CCS
draft genome1 vs. Sanger2
CCS vs. CCS
draft genome1 vs. CCS
CCS quality and clustering
Results of CCS and cluster analyses of the V1R I and V1R IX repertoires of two individuals of M. murinus
Raw CCS reads
Post quality filter
V1RI CCS reads
V1RIX CCS reads
V1RI Clusters (≥7×)
V1RIX Clusters (≥7×)
CCS V1RI loci (98 to 99%)
CCS V1RIX loci (98 to 99%)
Estimated V1RI repertoire
Estimated V1RIX repertoire
Clustering analyses resulted in 8,545 and 8,694 V1RI clusters and 2,936 and 3,673 V1RIX clusters for M. murinus 1 and 2, respectively (Additional file 1: Figure S2). Of these, approximately 18% and 17% of V1RI clusters and 5.4% and 4.6% of V1RIX clusters were identified as putative chimeras, M. murinus 1 and 2 respectively (Additional file 1: Figure S2). The majority of the chimeras (85%; n = 2,861) consisted of singleton clusters and only 13 chimeras had cluster sizes greater than a 7× threshold. Clusters consisting of putative chimeras were removed prior to all downstream analyses. Results of cluster analyses, including de novo chimera detection results, are presented in Table 2 and Additional file 1: Figure S2. We identified 15 clusters as consisting of putative pseudogenes and these were also excluded from downstream analyses. Final analyses were performed on consensus sequences from 106 and 114 V1RI clusters and 61 and 85 V1RIX clusters for M. murinus 1 and 2, respectively (≥7× coverage; Table 2). These consensus sequences were aligned using 98% and 99% similarity thresholds (see Methods) in order to determine the minimum and maximum number of V1R genes obtained using PacBio CCS technology (see Table 2). For M. murinus 1, 13 CCS V1R genes were not identified in sequences mined from the draft genome  and are considered novel.
Comparisons between CCS and draft genome V1R sequences
Polymorphism statistics of DNA sequence data for the V1R subfamilies examined herein
Fragment size (bp)
M. murinus 2
M. murinusV1R repertoire comparisons
Insights into the Microcebus murinusV1R repertoire
Chemosensory communication has played a critical role in mammalian evolution from both physiological and behavioral perspectives. In particular, many species rely heavily on pheromones for intraspecific communication, especially with respect to sexual and social behaviors . Of all mammalian orders, Primates exhibits what is perhaps the greatest variation of functional versus pseudogenized V1R diversity [18, 22, 23]. For example, no functional V1R genes have been identified in the macaque (Macaca mulatta) genome whereas ~214 intact V1R genes have been identified in M. murinus. This proportion of functional versus pseudogenized V1R genes is likely correlated with a reduction of pheromone communication in some primate lineages (e.g. Old World catarrhines; [23, 32]), whereas lineages that have maintained or evolved enhanced chemical communication typically exhibit diverse V1R repertories (e.g. strepsirrhine primates [18, 29]). In particular, mouse lemurs practice complex chemosensory communications and M. murinus has one of the highest proportions of functional vs. pseudogenized V1R repertories of all mammalian species characterized thus far . Our results reinforce this finding and suggest that the functional V1R repertoire of M. murinus has likely been underestimated, perhaps by as much as 25% in the V1RI and V1RIX subfamilies, collectively. This observation, coupled with recent documentation of strong positive selection throughout the mouse lemur V1R repertoire , strengthens hypotheses regarding the highly specialized pheromone communication mechanisms used by species of lemurs [26–29].
While the putative function of the V1RI subfamily is unknown [29, 30], the available data indicate that this subfamily binds a diverse variety of ligands . Alternatively, the genetic variation within the V1RIX subfamily suggests this subfamily is more conserved, perhaps binding to a reduced number of ligand classes (Table 3; Figure 8). This hypothesis is reinforced by the observation that the two individuals examined herein share ~84% of their V1RIX repertoires, compared to ~60% shared V1RI loci, at the 98% sequence similarity threshold. Moreover, we identified three V1RIX loci that were identical between M. murinus 1 and 2, whereas zero V1RI loci were identical between the two individuals. These results support previous studies that have hypothesized differing rates of evolution within lemurid V1R subfamilies [29, 30]. Based on comparisons with putative mouse orthologs, Hohenbrink et al.  hypothesized that the V1RIX subfamily was closely related to the mouse V1Rc subfamily, a subfamily that has been shown to detect female, heterospecific, and predator cues in mice . Future studies focused on the identification of the ligands associated with the nine known V1R subfamilies present in M. murinus will be an important advance for understanding the functional roles of these gene families and whether or not genetic variation underlying V1R repertoires contribute to the maintenance of species boundaries within the genus.
Utility of CCS for gene family characterization and discovery in non-model species
Sanger sequencing of cloned inserts is a well-established and common approach for characterizing multigene family diversity [29, 34–36]. Although effective, this method can be time-consuming, labor-intensive, and expensive. A growing number of studies utilize next generation sequencing technologies for targeted approaches to gene validation and discovery [37, 38]. These technologies have limitations however, and issues such as systematic and stochastic error rates as well as average read lengths must be considered when developing experimental designs . We selected Pacific Biosciences SMRT sequencing technology because the long read lengths eliminated the necessity for downstream assembly of highly similar fragments and because CCS reduces stochastic error rates. Moreover, the option of filtering reads by number of CCS pass provides greater flexibility to quality control of PacBio sequence data. Recent advancements to the PacBio RS sequencing instrument and sequencing chemistry have improved read length and accuracy  and, as of this writing, Pacific Biosciences has released the RS II upgrade which allows for higher throughput and even greater read accuracy.
Relatively few studies have utilized PacBio CCS for targeted sequencing in non-model species [39, 40]. In the absence of high quality genome assemblies, the long read lengths provided by SMRT CCS offer new opportunities for characterizing complex multigene families (e.g. immunoglobulin, MHC, olfactory receptors, V1R, etc.). The observation that our clustering approach of CCS reads resulted in capturing 100% of V1RI sequences mined from the draft genome assembly (including low coverage clusters), coupled with the identification of 13 putatively novel genes (Figures 3, 4, 5, 6, 7 and 8), documents that the methods reported herein are useful for gene discovery and for describing the diversity of large gene families. Moreover, no bias was detected in the nucleotide variation of sequences originating from CCS clusters with respect to those mined from the draft genome  or generated via Sanger sequencing of cloned inserts  (Table 3; Additional file 1: Table S2). Our results concerning the reduced coverage of the V1RIX subfamily (when compared to V1RI) likely stem from PCR amplification bias and/or preferential sequencing of the shorter V1RI sequences in pooled sequencing libraries. This finding, in addition to the identification of putative PCR chimeras by de novo chimera detection software (Additional file 1: Figure S2), generally agrees with other studies that have identified PCR bias and PCR artifacts within data originating from high-throughput sequencing of PCR amplicons .
Although our experimental design is useful for identifying potentially unrecognized gene diversity, a major drawback is the inability to distinguish closely related paralogs and to reliably identify orthologs among individuals. This problem is compounded by the observation that V1Rs are encoded by a single exon [21, 23] and therefore lack intronic sequences that may help to identify orthologs and/or paralogs. Thus, future studies aimed at characterizing V1R gene diversity in non-model species may benefit from other methods such as the targeted capture and sequencing of genomic regions harboring V1R genes (e.g., using biotinylated probes in combination with PacBio sequencing of long templates). Such an approach would be useful for identifying orthologous and paralogous genes and for characterizing allelic variation in multigene families of non-model species.
Our findings suggest that the V1R repertoire of M. murinus is larger than previously hypothesized and underscore previous observations that low coverage genome assemblies provide a limited view of multigene-family diversity [14, 18]. Even so, it is probable that we have still underestimated V1R diversity given the potential for the clustering of closely related paralogs (i.e. < 2% sequence divergence). Importantly, the forthcoming availability of a high coverage (~150×) M. murinus genome (Human Genome Sequencing Center, Baylor College of Medicine) will allow our hypotheses regarding the V1R repertoire size to be more definitively tested.
Pacific Biosciences SMRT CCS provides an alternative to traditional Sanger sequencing of cloned inserts. We anticipate that the methods described herein will be useful for the characterization of diverse gene families in other non-model species where genome sequences are unavailable or consist of low coverage draft assemblies. Our results concerning the presence of putative PCR artifacts agree with previous observations  and necessitate the implementation of strict quality control measures when high-throughput sequencing is performed on libraries constructed from PCR amplicons. Modifications to our approach, such as barcoding and advanced targeted capture methodologies will be useful for increasing sample size and gene discovery. These methods will greatly advance genome assembly and annotation of multigene families in non-model species.
We examined V1R sequences mined from the draft Microcebus murinus genome by Young et al. and selected two diverse subfamilies (V1RI and V1RIX sensu Hohenbrink et al.) for circular consensus sequencing (Additional file 1: Figure S3). These subfamilies were amplified from whole genomic DNA, isolated from two non-related individuals of M. murinus, using primers targeting conserved transmembrane regions 2 and 7 (V1RI) and 1 and 7 (V1RIX; Additional file 1: Table S1). We refer to the individual from which the draft genome was derived as M. murinus 1, and the second individual, included in the study by Yoder et al., is referred to as M. murinus 2 (Duke Lemur Center voucher number 7013). Animal procedures were reviewed and approved by the Duke University Institutional Animal Care and Use Committee under protocol number A250-12-09.
Amplicons were obtained using a high fidelity Taq DNA polymerase (Platinum Taq; Invitrogen) and PCRs were conducted in 50 ul reactions with the following final concentrations: 1× high fidelity buffer, 2 mM MgCl2, 200 uM dNTPs, 0.8uM primers, 0.625 units Taq, and ~15 ng DNA template. The following touchdown thermal profile was used for all amplifications: initial denaturation 95°C for 3 min followed by 15 cycles of 95°C for 1 min, 60°C (1°C decrease per cycle) for 1 min, 72°C for 1 min 30 sec, then another 20 cycles of 95°C for 1 min, 45°C for 1 min, 72°C for 1 min 30 sec, and a final extension of 72°C for 10 min. PCR products were visualized on a 2% agarose gel using SYBR Green I (Lonza Rockland, Inc.) and bands within the expected size ranges (V1RI = ~725 bp and V1RIX = ~800 bp) were excised and extracted using the Mo Bio gel purification kit (Mo Bio Laboratories, Inc.).
Three PCR reactions per individual per locus were pooled separately and quantified using a NanoDrop spectrophotometer (Thermo Scientific). V1RI and V1RIX amplicons were then pooled (1.5 μg V1RI and 1.0 μg V1RIX) resulting in two 2.5 μg samples for the construction of two sequencing libraries. V1RI amplicons were enriched to ensure sequence coverage given the increased variation observed within the V1RI subfamily when compared to V1RIX[18, 30]. Samples were submitted to the Duke IGSP Genome Sequencing & Analysis Core Resource for real-time circular consensus sequencing using a Pacific Biosciences RS instrument and C2 chemistry. Two small-insert libraries (one per individual) were prepared following manufacturers protocols and were sequenced using two SMRT cells (one SMRT cell per library) with 2 × 55 min movie run times. The resulting bas.h5 files were used for downstream analyses.
Quality filtering and sequence clustering
CCS sequences were quality filtered using pbh5tools (https://github.com/ PacificBiosciences/pbh5tools) and the Galaxy platform [42–44]. The pbh5tools package was used to extract CCS fastq sequences from bas.h5 files according to minimum number of CCS pass, thus allowing for inspection of average read quality as a function of CCS pass (see Figure 2 and Results). We used FastQC software (v0.10.1; http://www.bioinformatics.babraham.ac.uk/projects) to summarize average Phred score per CCS pass category (Figure 2). Cluster analyses were performed on sequences that originated from a minimum of 4 CCS passes and within which 90% of the bases averaged a quality score ≥ Phred 20 (1% error rate). Our pooled samples consisted of amplicons separated by ~65 bp in length, thus allowing for demultiplexing V1RI and V1RIX sequences according to length.
The USEARCH software package (v6.0)  was used for clustering, de novo PCR chimera detection, and preliminary cluster alignment. The UCHIME algorithm (as implemented within USEARCH) was used to detect putative chimeric sequences with the de novo mode and an -abskew parameter of 2.0. Clusters containing putative chimeras were not included in downstream analyses. Quality filtered CCS sequences were clustered based on a 98% similarity threshold with the -cluster_fast option and resulting alignments of clusters containing ≥ 7 sequences (i.e., a 7× threshold) were imported into the Geneious software package (v6.1; http://www.geneious.com) re-aligned using the MAFFT (v7.017) alignment plugin and then manually edited for accuracy. We selected the 7× coverage threshold based on our chimera detection results (i.e., 0.4% of all clusters comprised of putative chimeras contained 7 or more sequences; see Results and Additional file 1: Figure S2). Cluster consensus sequences were identified as V1R using NCBI BLAST (http://blast.ncbi.nlm.nih.gov/) and V1R subfamily membership was confirmed by phylogenetic comparisons with Hohenbrink et al..
The minimum number of distinct V1R genes for each subfamily was estimated following Rodriguez et al. whereby cluster consensus sequences sharing greater than 98% nucleotide homology were considered identical. This approach reduced concerns of spurious results due to sequencing error and/or repertoire inflation due to paralogous loci, but at the same time it is likely to underestimate the total number of genuine V1R paralogous copies. To overcome this limitation, to some extent, we also used a 99% minimum genetic similarity threshold to estimate maximum V1R repertoire size. Moreover, 99% is the minimum genetic similarity separating V1R sequences mined from distinct regions of the draft M. murinus genome . Cluster consensus sequences were translated into amino acids and were checked for complete open reading frame to identify putatively functional and pseudogenized loci. Final alignments for all consensus sequences are provided in Additional file 2: Dataset 1. Given the clustering approaches described above, in combination with the observation that we used primers that bound to conserved regions within the V1R exon, we anticipated that closely related paralogs would be clustered together and thus we refrained from attempting to identify allelic variation within potentially non-homologous loci.
Phylogenetic and statistical analyses
Alignments of PacBio CCS cluster consensus sequences with V1R data from Young et al. (Additional file 2: Dataset 1) and Yoder et al.; [GenBank:KF272289–KF272350] were performed using MAFFT v7.017 (gap open penalty 1.53; offset value 0.123) as implemented within the Geneious software package. Sequences originating from Yoder et al.  were also clustered based on the 98% threshold described above in order to avoid the incorporation of potentially paralogous loci in the analyses presented herein (Additional file 2: Dataset 2). Phylogenetic analyses were performed using MrBayes v.3.2  and RAxML v.7.7 . The GTR + gamma model of substitution was used for all Bayesian and Maximum Likelihood analyses. Statistical support for nodes was evaluated using Bayesian posterior probabilities (resulting from 5 million iterations, 4 heated chains, 25%% burn-in length) and maximum likelihood bootstrap support values (percentage of 1,000 iterations). Resulting trees were edited using FigTree v1.4 software (http://tree.bio.ed.ac.uk/software/figtree/). Pairwise sequence similarity was measured using custom BLAST searches with the percent identity output option. Sequence similarity was visualized using hive plots (jHive v0.0.18; ) and the arcdiagram R package (https://github.com/gastonstat/arcdiagram). The software package DNAsp v5.1  was used to calculate basic polymorphism statistics for each V1R subfamily including number of segregating sites (S), average number of nucleotide differences between alleles (k), nucleotide diversity (π), Watterson’s estimator of population mutation rate (θ W ), and number of synonymous and nonsynonymous mutations. Genetic distances were calculated using the Kimura-2 parameter (nucleotide) and p-distance (amino acid) algorithms as implemented within Mega v5.2 software . Genetic divergence among V1R repertoires was assessed using Chi-square statistical tests as implemented in DNAsp.
Availability of supporting data
Final V1R consensus sequences generated by this study have been deposited in GenBank and have the following accession numbers [KF721294 - KF721403]. CCS sequence data generated from M. murinus 1 (origin of the M. murinus draft genome) are identified with a specimen-voucher number of DGM01. Additional file 2: Dataset 1 contains alignments of all CCS data used in the final analyses as well as V1R data mined from Young et al.. Additional file 2: Dataset 2 contains filtered V1RI sequences originating from Yoder et al. . Both Additional file 2: Datasets are located at http://www.labarchives.com with the following doi:10.6070/H4G73BN0.
Circular consensus sequencing
Vomeronasal type 1 receptor
Vomeronasal type 2 receptor
Single molecule real-time.
We greatly appreciate the assistance of the Duke GCB Genome Sequencing Shared Resource staff, especially Graham Alexander and Olivier Fedrigo. We thank the anonymous reviewers for their comments and suggestions that helped to improve this manuscript. PAL thanks the American Society of Mammalogists for financial support. This project was funded by Duke University start-up funds to ADY. This is Duke Lemur Center publication number 1266.
- Nei M, Rooney AP: Concerted and birth-and-death evolution of multigene families. Annu Rev Genet. 2005, 39: 121-10.1146/annurev.genet.39.073003.112240.PubMed CentralPubMedView ArticleGoogle Scholar
- Holland PW, Garcia-Fernàndez J, Williams NA, Sidow A: Gene duplications and the origins of vertebrate development. Development. 1994, 1994: 125-133.Google Scholar
- Amores A, Force A, Yan Y-L, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang Y-L: Zebrafish hox clusters and vertebrate genome evolution. Science. 1998, 282: 1711-1714.PubMedView ArticleGoogle Scholar
- Iwabe N, Kuma K-i, Miyata T: Evolution of gene families and relationship with organismal evolution: rapid divergence of tissue-specific genes in the early evolution of chordates. Mol Biol Evol. 1996, 13: 483-493. 10.1093/oxfordjournals.molbev.a025609.PubMedView ArticleGoogle Scholar
- Lundin LG: Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse. Genomics. 1993, 16: 1-19. 10.1006/geno.1993.1133.PubMedView ArticleGoogle Scholar
- Alkan C, Sajjadian S, Eichler EE: Limitations of next-generation genome sequence assembly. Nat Methods. 2010, 8: 61-65.PubMed CentralPubMedView ArticleGoogle Scholar
- Hirsch CN, Robin Buell C: Tapping the Promise of Genomics in Species with Complex, Nonmodel Genomes. Annu Rev Plant Biol. 2013, 64: 89-110. 10.1146/annurev-arplant-050312-120237.PubMedView ArticleGoogle Scholar
- Cheung J, Estivill X, Khaja R, MacDonald JR, Lau K, Tsui L-C, Scherer SW: Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 2003, 4: R25-10.1186/gb-2003-4-4-r25.PubMed CentralPubMedView ArticleGoogle Scholar
- Strand T, Wang B, Meyer-Lucht Y, Höglund J: Evolutionary history of black grouse major histocompatibility complex class IIB genes revealed through single locus sequence-based genotyping. BMC Genet. 2013, 14: 29-PubMed CentralPubMedView ArticleGoogle Scholar
- Klein J: Natural History of the Major Histocompatibility Complex. 1986, New York: WileyGoogle Scholar
- Garrigan D, Hedrick PW: Perspective: detecting adaptive molecular polymorphism: lessons from the MHC. Evolution. 2003, 57: 1707-1722. 10.1111/j.0014-3820.2003.tb00580.x.PubMedView ArticleGoogle Scholar
- Ilmonen P, Penn DJ, Damjanovich K, Morrison L, Ghotbi L, Potts WK: Major histocompatibility complex heterozygosity reduces fitness in experimentally infected mice. Genetics. 2007, 176: 2501-2508. 10.1534/genetics.107.074815.PubMed CentralPubMedView ArticleGoogle Scholar
- Brouwer L, Barr I, Van De POL M, Burke T, Komdeur J, Richardson DS: MHC-dependent survival in a wild population: evidence for hidden genetic benefits gained through extra-pair fertilizations. Mol Ecol. 2010, 19: 3444-3455. 10.1111/j.1365-294X.2010.04750.x.PubMedView ArticleGoogle Scholar
- Zhang X, Goodsell J, Norgren RB: Limitations of the rhesus macaque draft genome assembly and annotation. BMC Genomics. 2012, 13: 206-10.1186/1471-2164-13-206.PubMed CentralPubMedView ArticleGoogle Scholar
- Roberts RJ, Carneiro MO, Schatz MC: The advantages of SMRT sequencing. Genome Biol. 2013, 14: 405-10.1186/gb-2013-14-6-405.PubMedView ArticleGoogle Scholar
- Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B: Real-time DNA sequencing from single polymerase molecules. Science. 2009, 323: 133-138. 10.1126/science.1162986.PubMedView ArticleGoogle Scholar
- Travers KJ, Chin C-S, Rank DR, Eid JS, Turner SW: A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 2010, 38: e159-e159. 10.1093/nar/gkq543.PubMed CentralPubMedView ArticleGoogle Scholar
- Young JM, Massa HF, Hsu L, Trask BJ: Extreme variability among mammalian V1R gene families. Genome Res. 2010, 20: 10-18. 10.1101/gr.098913.109.PubMed CentralPubMedView ArticleGoogle Scholar
- Yang H, Shi P, Zhang Y-p, Zhang J: Composition and evolution of the V2r vomeronasal receptor gene repertoire in mice and rats. Genomics. 2005, 86: 306-315. 10.1016/j.ygeno.2005.05.012.PubMedView ArticleGoogle Scholar
- Grus WE, Zhang J: Origin and evolution of the vertebrate vomeronasal system viewed through system-specific genes. Bioessays. 2006, 28: 709-718. 10.1002/bies.20432.PubMedView ArticleGoogle Scholar
- Dulac C, Axel R: A novel family of genes encoding putative pheromone receptors in mammals. Cell. 1995, 83: 195-206. 10.1016/0092-8674(95)90161-2.PubMedView ArticleGoogle Scholar
- Young JM, Kambere M, Trask BJ, Lane RP: Divergent V1R repertoires in five species: Amplification in rodents, decimation in primates, and a surprisingly small repertoire in dogs. Genome Res. 2005, 15: 231-240. 10.1101/gr.3339905.PubMed CentralPubMedView ArticleGoogle Scholar
- Grus WE, Shi P, Zhang Y-p, Zhang J: Dramatic variation of the vomeronasal pheromone receptor gene repertoire among five orders of placental and marsupial mammals. Proc Natl Acad Sci USA. 2005, 102: 5767-5772. 10.1073/pnas.0501589102.PubMed CentralPubMedView ArticleGoogle Scholar
- Shi P, Bielawski JP, Yang H, Zhang Y: Adaptive diversification of vomeronasal receptor 1 genes in rodents. J Mol Evol. 2005, 60: 566-576. 10.1007/s00239-004-0172-y.PubMedView ArticleGoogle Scholar
- Grus WE, Zhang J: Rapid turnover and species-specificity of vomeronasal pheromone receptor genes in mice and rats. Gene. 2004, 340: 303-312. 10.1016/j.gene.2004.07.037.PubMedView ArticleGoogle Scholar
- Delbarco-Trillo J, Burkert B, Goodwin T, Drea C: Night and day: the comparative study of strepsirrhine primates reveals socioecological and phylogenetic patterns in olfactory signals. J Evol Biol. 2011, 24: 82-98. 10.1111/j.1420-9101.2010.02145.x.PubMedView ArticleGoogle Scholar
- Irwin MT, Samonds KE, Raharison J-L, Wright PC: Lemur latrines: observations of latrine behavior in wild primates and possible ecological significance. J Mammal. 2004, 85: 420-427. 10.1644/1545-1542(2004)085<0420:LLOOLB>2.0.CO;2.View ArticleGoogle Scholar
- Andrew RJ, Klopman RB: Urine washing: comparative notes. Prosimian Biology. Edited by: Martin RD, Doyle GA, Walker AC. 1974, London: Duckworth, 303-312.Google Scholar
- Yoder A, Chan L, dos Reis M, Larsen P, Campbell C, Rasolarison R, Barrett M, Roos C, Kappeler P, Bielawski J, Yang Z: Molecular evolutionary characterization of a novel V1R subfamily in strepsirrhine primates. Genome Biol Evol. 2014, 6: 213-227. 10.1093/gbe/evu006.PubMed CentralPubMedView ArticleGoogle Scholar
- Hohenbrink P, Radespiel U, Mundy NI: Pervasive and ongoing positive selection in the vomeronasal-1 receptor (V1R) repertoire of mouse lemurs. Mol Biol Evol. 2012, 29: 3807-3816. 10.1093/molbev/mss188.PubMedView ArticleGoogle Scholar
- Dulac C, Torello AT: Molecular detection of pheromone signals in mammals: from genes to behaviour. Nat Rev Neurosci. 2003, 4: 551-562. 10.1038/nrn1140.PubMedView ArticleGoogle Scholar
- Zhang J, Webb DM: Evolutionary deterioration of the vomeronasal pheromone transduction pathway in catarrhine primates. Proc Natl Acad Sci USA. 2003, 100: 8337-8341. 10.1073/pnas.1331721100.PubMed CentralPubMedView ArticleGoogle Scholar
- Isogai Y, Si S, Pont-Lezica L, Tan T, Kapoor V, Murthy VN, Dulac C: Molecular organization of vomeronasal chemoreception. Nature. 2011, 478: 241-245. 10.1038/nature10437.PubMed CentralPubMedView ArticleGoogle Scholar
- Steiger SS, Fidler AE, Valcu M, Kempenaers B: Avian olfactory receptor gene repertoires: evidence for a well-developed sense of smell in birds?. Proc R Soc Lond B Biol Sci. 2008, 275: 2309-2317. 10.1098/rspb.2008.0607.View ArticleGoogle Scholar
- Del Punta K, Rothman A, Rodriguez I, Mombaerts P: Sequence diversity and genomic organization of vomeronasal receptor genes in the mouse. Genome Res. 2000, 10: 1958-1967. 10.1101/gr.10.12.1958.PubMed CentralPubMedView ArticleGoogle Scholar
- Go Y, Satta Y, Kawamoto Y, Rakotoarisoa G, Randrianjafy A, Koyama N, Hirai H: Frequent segmental sequence exchanges and rapid gene duplication characterize the MHC class I genes in lemurs. Immunogenetics. 2003, 55: 450-461. 10.1007/s00251-003-0613-6.PubMedView ArticleGoogle Scholar
- Liang B, Luo M, Scott-Herridge J, Semeniuk C, Mendoza M, Capina R, Sheardown B, Ji H, Kimani J, Ball BT: A comparison of parallel pyrosequencing and sanger clone-based sequencing and its impact on the characterization of the genetic diversity of HIV-1. PLoS ONE. 2011, 6: e26745-10.1371/journal.pone.0026745.PubMed CentralPubMedView ArticleGoogle Scholar
- Hughes GM, Gang L, Murphy WJ, Higgins DG, Teeling EC: Using Illumina Next Generation Sequencing technologies to sequence multigene families in de novo species. Mol Ecol Res. 2013, 13: 510-521. 10.1111/1755-0998.12087.View ArticleGoogle Scholar
- Larsen P, Smith T: Application of circular consensus sequencing and network analysis to characterize the bovine IgG repertoire. BMC Immunol. 2012, 13: 52-10.1186/1471-2172-13-52.PubMed CentralPubMedView ArticleGoogle Scholar
- Neves LG, Davis JM, Barbazuk WB, Kirst M: Whole-exome targeted sequencing of the uncharacterized pine genome. Plant J. 2013, 75: 146-156. 10.1111/tpj.12193.PubMedView ArticleGoogle Scholar
- Sommer S, Courtiol A, Mazzoni CJ: MHC genotyping of non-model organisms using next-generation sequencing: a new methodology to deal with artefacts and allelic dropout. BMC Genomics. 2013, 14: 1-17. 10.1186/1471-2164-14-1.View ArticleGoogle Scholar
- Goecks J, Nekrutenko A, Taylor J, Team TG: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11: R86-10.1186/gb-2010-11-8-r86.PubMed CentralPubMedView ArticleGoogle Scholar
- Blankenberg D, Kuster GV, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J: Galaxy: A Web-Based Genome Analysis Tool for Experimentalists. Curr Protoc Mol Biol. 2010, 19 (10): 11–19.10. 21-Google Scholar
- Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J: Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 2005, 15: 1451-1455. 10.1101/gr.4086505.PubMed CentralPubMedView ArticleGoogle Scholar
- Edgar RC: Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010, 26: 2460-2461. 10.1093/bioinformatics/btq461.PubMedView ArticleGoogle Scholar
- Rodriguez I, Del Punta K, Rothman A, Ishii T, Mombaerts P: Multiple new and isolated families within the mouse superfamily of V1r vomeronasal receptors. Nat Neurosci. 2002, 5: 134-140. 10.1038/nn795.PubMedView ArticleGoogle Scholar
- Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP: MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012, 61: 539-542. 10.1093/sysbio/sys029.PubMed CentralPubMedView ArticleGoogle Scholar
- Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006, 22: 2688-2690. 10.1093/bioinformatics/btl446.PubMedView ArticleGoogle Scholar
- Krzywinski M, Birol I, Jones SJ, Marra MA: Hive plots—rational approach to visualizing networks. Brief Bioinform. 2012, 13: 627-644. 10.1093/bib/bbr069.PubMedView ArticleGoogle Scholar
- Librado P, Rozas J: DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009, 25: 1451-1452. 10.1093/bioinformatics/btp187.PubMedView ArticleGoogle Scholar
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28: 2731-2739. 10.1093/molbev/msr121.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.