Skip to main content

Mammalian keratin associated proteins (KRTAPs) subgenomes: disentangling hair diversity and adaptation to terrestrial and aquatic environments



Adaptation of mammals to terrestrial life was facilitated by the unique vertebrate trait of body hair, which occurs in a range of morphological patterns. Keratin associated proteins (KRTAPs), the major structural hair shaft proteins, are largely responsible for hair variation.


We exhaustively characterized the KRTAP gene family in 22 mammalian genomes, confirming the existence of 30 KRTAP subfamilies evolving at different rates with varying degrees of diversification and homogenization. Within the two major classes of KRTAPs, the high cysteine (HS) subfamily experienced strong concerted evolution, high rates of gene conversion/recombination and high GC content. In contrast, high glycine-tyrosine (HGT) KRTAPs showed evidence of positive selection and low rates of gene conversion/recombination. Species with more hair and of higher complexity tended to have more KRATP genes (gene expansion). The sloth, with long and coarse hair, had the most KRTAP genes (175 with 141 being intact). By contrast, the “hairless” dolphin had 35 KRTAPs and the highest pseudogenization rate (74% relative to the 19% mammalian average). Unique hair-related phenotypes, such as scales (armadillo) and spines (hedgehog), were correlated with changes in KRTAPs. Gene expression variation probably also influences hair diversification patterns, for example human have an identical KRTAP repertoire as apes, but much less hair.


We hypothesize that differences in KRTAP gene repertoire and gene expression, together with distinct rates of gene conversion/recombination, pseudogenization and positive selection, are likely responsible for micro and macro-phenotypic hair diversification among mammals in response to adaptations to ecological pressures.


Terrestrial life in extant vertebrates was accompanied by the formation of diverse and rigid body coverings (scales, feathers and hairs), along with other cornified appendages (e.g. horns, hoofs, claws, nails), that evolved in response to strong selective pressures. These body coverings helped protect vertebrates and allowed them to successfully adapt to environmental pressures like heat, ultra violet radiation, water loss, and mechanical forces [1, 2]. Since keratinization helps protect the body by forming a barrier between the body and outside world, genes involved in keratinization evolve rapidly in response to changing environment (e.g., KRTAP4-5 showed evidence of positive selection in chimpanzee and hominids) [3]. Changes in gene family (gene gain, and gene loss/pseudogenization) are involved in adaptive evolution and changes in gene family size could affect expression levels [4]. In terrestrial vertebrates, the formation of hard cornified skin appendages involves interactions between fibrous (keratin) and matrix proteins (KRTAPs) [57]. The fibrous alpha-keratins, type I and II, appear to have evolved in stem vertebrates [8, 9] and recent studies suggest the presence of hair-specific alpha keratins orthologs in amphibians, reptiles, and birds [1012]. However, there is no evidence of KRTAPs proteins in fishes and amphibians, suggesting that these proteins originated only after the divergence of sauropsids (sKRTAPs/beta-keratins) and mammals (mKRTAPs), leading to the formation and diversification of hard keratin appendages, feather, claw, scale in sauropsids, and hairs in mammals [5, 6, 11]. The structural and functional conservation of keratin intermediate filaments (KIFs) within mammals contrasts with the large diversity of mammalian hair phenotypes [1315] and highlights the importance of understanding the molecular diversification of the keratin associated protein (KRATP) multigene family.

Hair is a dynamic mini-organ formed by ectodermal-mesodermal interactions [1619] and is broadly divided into the root sheath (outer and inner), hair shaft, and matrix zone. Hair has microscopic differences (e.g., cuticular, medullar and cross section), which have long been used as forensic markers for identifying human ethnicity and classifying mammalian species [2023]. Hair-fiber formation is a cyclical process, which involves growth (anagen), regression (catagen), and resting phases (telogen), followed by the shedding of the hair shaft. The process involves the expression of both hair keratin intermediate filament proteins and their keratin associated proteins [2428]. This cycle is of particular importance in diverse processes such as determining hair size, shedding fur for body surface cleansing, and changing the body cover to adapt to changing environments, such as from hot summers to cold winters [29].

The current diversity of hair in extant mammals is due to innovations and changes in numerous genes and their corresponding proteins. Humans have 54 functional alpha-keratin genes comprising 28 type I and 26 type II keratins [30, 31, 13] arranged in two clusters on chromosomes 17q21.2 and 12q13.13 [32, 33], which include 11 type I and 6 type II hair keratins [34, 35]. Hair keratin types I and II undergo higher-ordered copolymerization-forming keratin intermediate filaments (KIFs) [3639], which are embedded into a matrix formed by keratin associated proteins (KRTAPs) involved in the formation of hard cornified resilient hair shafts [4042]. The KRTAP multigene family is divided into two broad groups, high cysteine and high glycine-tyrosine, which together comprise 30 subfamilies based on amino acid composition and phylogenetic relationships [14]. In humans, KRTAPs include approximately 100 gene members that are arranged in tandem and are clustered on chromosomes 11p15.5, 11q13.4, 17q21.2, 21q22.1, and 21q22.3 [43, 26, 4446, 7, 25]. Given the role of the KRTAP multigene family in formation of hair morphology, we have characterized them in the genomes of 22 diverse mammalian species to provide insights on KRTAP evolution and diversification. We found contrasting KRTAP gene family repertoires among mammals, as well as differences in rates of gene expansion, contraction and pseudogenization. The two major groups of KRTAPs showed distinct evolutionary patterns with high concerted evolution influencing species-specific copy number variation and gene homogenization in high cysteine KRTAPs. In contrast, high glycine-tyrosine genes had more dynamic evolutionary patterns with less gene conversion and recombination, lower GC content, and evidence of positive selection (e.g. subfamily 20), which may also have been an important force of the evolution in subfamilies of high glycine-tyrosine.


Genome scans

Advances in genome sequencing have made it easier to explore multigene families across different genomes. Expansion, contraction and pseudogenization, along with genomic/chromosomal organization (gene clusters) of gene families, are important mechanisms driving genome evolution and influencing fitness within lineages or species [47], as suggested by lineage- or species-specific variations in genes involved in pathogen recognition, stress response and structural proteins [4851]. Here, we explored the KRTAP multigene family in the genome assemblies of 22 mammalian species: (1) alpaca (Vicugna pacos) low-coverage 2.51×, assembly, vicPac1, Jul 2008, (2) armadillo (Dasypus novemcinctus) low-coverage 2×, assembly, dasNov2, Jul 2008, (3) bushbaby (Otolemur garnettii) low-coverage 1.5×, assembly, otoGar1, May 2006, (4) cow (Bos taurus) coverage 7×, assembly Btau_4.0, Oct 2007, (5) dolphin (Tursiops truncatus) low-coverage 2.59×, assembly, turTru1, Jul 2008, (6) elephant (Loxodonta africana) coverage 7×, assembly, Loxafr3.0, Jul 2009, (7) gibbon (Nomascus leucogenys) whole genome coverage 5.6×, assembly, Nleu1.0, Jan 2010, (8) gorilla (Gorilla gorilla) gorGor3, Dec 2009, (9) guinea Pig (Cavia porcellus) high-coverage 6.79×, assembly, cavPor3, Mar 2008, (10) hedgehog (Erinaceus europaeus) low-coverage 1.86×, assembly, eriEur1, Jun 2006, (11) horse (Equus caballus) coverage 6.79×, assembly, Equ Cab 2, Sep 2007, (12) marmoset (Callithrix jacchus), NCBI build 1.1, (13) megabat (Pteropus vampyrus) low-coverage 2.63× assembly, pteVam1, Jul 2008, (14) mouse Lemur (Microcebus murinus) low-coverage 1.93×, assembly, micMur1, Jun 2007, (15) orangutan (Pongo abelii) NCBI build 1.2, (16) panda (Ailuropoda melanoleuca) high-coverage, assembly, ailMel1, Jul 2009, (17) pig (Sus scrofa) from NCBI build 3.1, high-coverage, assembly, Sscrofa10, Jun 27, 2011, (18) rabbit (Oryctolagus cuniculus) high-coverage, assembly, oryCun2, Nov 2009, (19) sloth (Choloepus hoffmanni) low-coverage 2.05×, assembly, choHof1, Sep 2008, (20) tarsier (Tarsius syrichta) low-coverage, 1.82× assembly, tarSyr1, Jul 2008, (21) tree shrew (Tupaia belangeri) low-coverage 2×, assembly, tupBel1, Jun 2006, and (22) wallaby (Macropus eugenii) coverage 2×, assembly, Meug_1.0, Dec 2008, available at Ensembl and NCBI websites[52, 53] and[54].

Characterization of KRTAP gene family

The KRTAP multigene family consists of ~100-180 gene members divided into two major classes, High Cystine (HS) and High Glycine/Tyrosine (HGT), which in turn are divided into many subfamilies with unique motifs and sequence repeats. We assigned all KRTAP multigene family members to their respective subfamilies following previously published guidelines [14]. We built species-specific phylogenetic trees to classify the gene subfamilies for each genome (Additional file 1: Figures S1-21), as well as a phylogenetic tree incorporating all members belonging to the high glycine-tyrosine KRTAP multigene family from 22 genomes (Figure 1 and Additional file 1: Figure S24). We also observed that one-to-one orthologous relationships diminished as species diverged over time (Additional file 1: Figures S22 and S23). We used the amino acid composition, unique motifs and sequence repeats, as well as blast results, to classify intact genes, partial genes and pseudogenes. The most-closely-related subfamilies are generally located in close proximity and in tandem arrangements in the genome, as are the members of the same subfamily. Due to the incomplete nature of the genomes analyzed, not all the genes may have been retrieved. Therefore, in the dolphin and other species of low genome coverage, we have assumed that missing genes may be due to low coverage that will be characterized by future research. However, we found a large number of pseudogenes in the dolphin genome compared with other low coverage genomes.

Figure 1

The phylogeny of all high glycine-tyrosine gene family members of 22 mammalian genomes. Neighbor-joining method used with P-distance and interiors branch test with 1,000 replications. The different color represents different subfamilies of high glycine-tyrosine KRTAP.

To prove the absence of KRTAP genes in high coverage genomes with intact gene clusters, we performed synteny analysis and searched for human orthologs that should be flanking the missing KRTAP genes. For example, in pig the 5’ and 3’ human orthologs flanking the KRTAP cluster 5 was missing, indicating that this region has most-likely not been sequenced and that further research and/or higher genomic coverage is needed for confirmation. We verified the synteny of conserved orthologs flanking the missing genes for the subfamily KRTAP25 in the callitrix, cow and elephant, along with KRTAP25, KRTAP19 and KRTAP29 in cavia, and KRTAP12 in rabbit.

Species-specific subfamily differences, changes in the total number of genes, functional genes, pseudogenes, amino acid content (changes in sulfur content are responsible for disulphide bonds, which provide rigidity, strength and flexibility to hair) and size polymorphism in genes within subfamilies may be responsible for the species-specific hair characteristics and the marked variability found in hair patterns among mammalian species.

Genomic organization of the KRTAP gene family

The KRTAP gene family consists of 30 subfamilies, 24 of which are high cysteine and six are high glycine-tyrosine. The complete KRTAP gene family is arranged into five clusters at five different genomic locations (Figure 2). Each cluster contains members of one or more subfamilies arranged in a tandem array. The genomic organization of the KRTAP gene family is similar in all species studied, with only slight variations. Subfamilies KRTAP 1, 2, 3, 4, 9, 16, 17 and 29 are present in cluster one. All high glycine-tyrosine (HGT) KRTAP subfamilies, together with KRTAP 11, 13, 24-27 subfamilies, form cluster two. Subfamilies KRTAP10 and KRTAP12 form cluster three, whereas cluster four consists of subfamily KRTAP28 and cluster five of subfamily KRTAP5.

Figure 2

Genomic organization of KRTAP gene family in the gorilla genome. The KRTAP gene family is arranged in five different clusters, shown with the size in base pairs (bp) for each cluster with name of cluster and chromosome in which they are present. Each triangle represents a gene member; where p means a pseudogene, same subfamily members are shown with same colors. The triangle points the direction of transcription. The distance between the genes is not to scale.

Cluster 5 shows some variation. For example in primates, KRTAP cluster 5 is divided into two paralogous gene clusters, most likely through segmental duplication, with both clusters having members of the KRTAP5 subfamily (Figure 2). In all of the other mammals studied, genes of the KRTAP5 subfamily form a single cluster. The KRTAP subfamilies that are clustered together in the genome (Figure 2) are phylogenetically closely related (Additional file 1: Figure S1) (e.g. all subfamilies of high glycine-tyrosine KRTAPs are located in close proximity in cluster 2 represented by HGT in Figure 2, which supports their functional relatedness and common ancestry arising from duplications and divergence. The conserved genomic organization of the KRTAP gene clusters over more than 166 Myr (i.e. divergence of therian from the monothermes mammals) [55] confirms the strong evolutionary constrain acting on their genomic arrangement [56]. The conserved clustering of KRTAPs seems to be related with its ordered expression in follicle [57].

KRTAP Gene family dynamics and hair characteristics

Previously, the KRTAP gene repertoire had been assessed in eight mammalian species [14], all terrestrial species with few characteristic differences in hair phenotypes. Here we expanded on previous results by analyzing 22 additional mammalian species consisting of a much more diverse group of mammals including species from different mammalian orders with diverse hair characteristics, such as the armadillo (modified scales), hedgehog (spines), alpaca (fiber), sloth (symbionts) and dolphin (mostly hairless and aquatic), along with several more-closely related species, e.g. members of hominidae family in primates.

We identified near complete KRTAP gene repertoires in 22 mammalian genomes, including 11 high-coverage genomes (Table 1, Figure 3, and Additional file 2). Our findings suggest that the most recent common ancestor of mammals is supposed to have had 53% (16 of 30) of the known KRTAP subfamilies (1-5, 8, 10, 11, 13, 16, 17, 20, 21, 26, 28, and 29) (Figure 3). Extant monotremes (Platypus) and marsupials (Opossum and Wallaby), have slightly different subfamilies representation (60%, 18 of 30 subfamilies, and 50%, 15 of 30, respectively), while eutherians have up to 93% (28 of 30) of the KRTAP subfamilies. This shows that the diversification of the KRTAP gene family occurred early in mammalian evolution, likely starting after the split of sauropsids (leading to birds and reptiles) and synapsids (leading to mammals-like reptiles) around 350 Myr ago [58, 55]. Sauropsids developed KRTAPs (the beta keratins) in hard appendages like feathers, beaks, scales and claw, etc., and synapsids developed mKRTAPs present in hair, nails, hoofs, claws, etc. The presence of glycine rich proteins such as HGT in mammals and HGP (high glycine proline) in reptiles and birds is evidence of their radiation from a common ancestor [5] and suggest that these changes may have contributed to the successful radiation of mammals, reptiles and birds. Further expansion and diversification of the KRTAP gene family, favored by high rates of concerted evolution in HS-KRTAPs and positive selection in HGT-KRTAPs, led to the species-specific hair characteristics observed in extant mammals. Additional analyses of sauropsid and the mammalian KRTAPs are likely to reveal insights into the patterns of adaptive radiation present in extant reptilian, birds and mammalian KRTAPs. Subfamilies 7 and 12 first appear in therian mammals after their divergence from monothremes around 166 Myr ago [55]. Subfamilies 6, 9, 19, 24, and 27 are specific to placental mammals (eutherians), and thus appeared after their divergence from marsupials around 148 Myr ago [55]. Subfamily 25 is absent in Afrotheria and Xenarthra, which suggests an origin within placental mammals only after the divergence from the atlantogenata clade (Figure 3 and Table 1). Monotremes and marsupials lack subfamily 9, which we observed to have expanded dramatically in the basal placental mammal xenartha (sloth) to 50 members. We noted that the KRTAP gene family shows species-specific variation as expected due to concerted evolution, and some of the subfamilies are restricted to particular species, e.g. subfamilies 30, 31, and 34 are present only in mouse and rat, subfamily 35 in mouse, and subfamilies 32 and 33 in platypus [14] (Figure 3 and Table 1). We also observed remarkable differences among these KRTAP genes (Table 1, Figure 3 and Additional file 2), including a dramatic gene expansion with 175 members (50 in subfamily 9 and 37 genes in subfamily 20, respectively) in sloth (Choloepus hoffmanni), a nocturnal hairy mammal with long, coarse and shaggy fur that serves as a host for different microorganism [59] (Table 1, Figures 3 and 4). Similarly, we found gene expansion in subfamily 20 (27 genes copies) in the rodent Guinea Pig (Cavia porcellus), and 38 genes copies in the marsupial Wallaby (Macropus eugenii). Subfamily 28 has expanded in Rabbit (Oryctolagus cuniculus) (23 genes copies), which belongs to order lagomorpha. We typically observed functional genes in (HS-KRTAPs) subfamilies 11, 16, 17, 24-27 and 29 varying from a minimum of one to a maximum of three. The subfamilies 11, 16, 17 and 25 have a maximum of one functional gene member, subfamilies 24 and 29 have a maximum of two functional genes (present in orangutan and cow, respectively), and subfamily 26 has a maximum of three members (present in sloth and elephant). Subfamily 7, belonging to the high glycine-tyrosine group, has a maximum of one functional gene member (Table 1). We found that closely related species, e.g. among Hominidae family (Human, chimpanzee, gorilla and orangutan), have very similar gene repertoires with only slight differences (e.g. Humans have highest HGT pseudogenes; Figure 3). We also observed the apparent reduction in the KRTAP gene repertoire in alpaca (fibre), armadillo (modified scales), hedgehog (spines), and dolphin (mostly hairless and aquatic) (Figure 3, Figure 5), probably due to the replacement or modification of hair function or to extensive specialization and subsequent selection and pseudogenization.

Table 1 Number of KRTAP gene present in each subfamily in twenty-two mammalian species
Figure 3

The topological tree representing evolution of KRTAP gene family repertoires in 30 mammalian species. Twenty-two from the present study and eight from Wu et al, 2008 [14] marked with an asterisk). Stars and circles respectively show the gain and loss of subfamilies, by numbers below.

Figure 4

Hair characteristic adaption in terrestrial and aquatic mammal. Sloth an arboreal mammal with high density of hair harboring algae (image credits: Representation of sloth hair with algal growth and cross section of hair showing the major layers of hair shaft (A). Bottlenose dolphin (image credits: Public source Dolphin NASA) with rostrum selected in circle and detailed in image with arrows point the hairless vibrissae crypts of dolphin (image credits: Élio A. Vicente, Zoomarine) (B). Overall number of KRTAP genes and percentage of pseudogene present in sloth and dolphin (C).

Figure 5

Variation in KRTAP gene family in mammals and relation with hair characteristic features.

For example, we observed high rates of pseudogenization (Figures 3, 4 and 5 and Table 1) (74% compared to the mammalian average of 19%) and only nine intact genes in the dolphin (Tursiops truncatus). This aquatic mammal is almost hairless, with only a few hairs (bristles) on the upper lip of the rostrum, which are shed soon after birth, leaving hairless pits on the rostrum of adults that have specialized sensory function [6065] (Figure 4). The epidermal surface also undergoes high proliferation and sloughing of epidermis cells in order to maintain a smooth skin, a major advantage for swimming [66, 67].

Concerted evolution, GC bias and sequence divergence

Tandemly arranged gene members of multigene families often show more similarity among each other than with their counterpart’s orthologs in other species, which suggest that they evolved in similar or concerted fashion. This would further lead to species-specific variation, as observed in KRTAP gene family. Two mechanisms play an important role in concerted evolution. Recombination increases the copy number of gene by providing raw material for further functional innovations and diversification, and gene conversion, which principally homogenizes genes, can help insure the rapid synthesis of a gene product (protein) that may be required during a precise stage of cell cycle [68]. Gene conversion also decreases the evolutionary distance among paralogous members and shifts the substitutions from weak (A or T) to strong (G or C) by increasing GC content through biased gene conversion (gBGC) [6972]. The negative correlation between evolutionary distance calculated by synonymous substitution rates and GC content provides the level of divergence between the members of a subfamily [73].

Using Geneconv [74] and RDP3 [75] we found higher rates of gene conversion and recombination events in the high cysteine KRTAPs compared with the high glycine-tyrosine KRTAP genes (Additional files 3 and 4). KRTAP subfamilies also displayed different rates of gene conversion in different species. For example, in gorilla we found 44 gene pairs of the KRTAP10 under gene conversion, compared to only 17 gene pairs found for this subfamily in gibbon (Additional file 3). The high level of gene conversion also reduces orthologous relationship between genes of two different species.Sequences with higher synonymous substitution rates (dS) had higher overall GC content (GC%) and third-codon GC content (GC3%), and lower synonymous substitution rates in the high-cysteine genes than the high glycine-tyrosine genes. The negative correlation between GC content and synonymous substitution rate (dS) is consistent with the higher rates of concerted evolution observed in high cysteines (Figure 6A and B). The high GC content in the HS –KRTAP gene family compared with the HGT-KRTAP could be a consequence of the high number of gene conversion events.

Figure 6

GC-content dynamics. Figure legend text GC-biased gene conversion (gBGC) and evolutionary distance between the KRTAP genes, shown by the correlation between the synonymous substitution rates (dS) and GC content (GC%) among paralogous members of each subfamily (A) and third codon GC content (GC3%) (B). Negative correlation points towards the gene conversion. High cyteine KRTAP (HS) and high glycine-tyrosine KRTAP (HGT) are represented by blue and red squares respectively. The linear regression is shown.

Adaptive evolution

Gene expansion provides the essential raw material for the positive selection to act [76], which in turn accelerates the diversification of duplicated copies by increasing the number of nonsynonymous substitutions (dN) relative to the synonymous substitutions (dS) through positive selection (dN/dS > 1). The PAML package [77] was used to identify signatures of positive selection. Specifically, we used likelihood ratio test for positive selection [78, 79] to test site-specific models comparing twice the difference in log-likelihood between two models to chi square distribution with two degrees of freedom. For expanded subfamilies, such as in the case of the KRTAP20 in wallaby with 38 members, we tested if this species-specific expansion has been influenced by adaptive evolution. We tested two nested pairs of site-specific models (M1a vs. M2a and M7 vs. M8), where M1a and M7 states no positive selection (ω ≤ 1) and M2a and M8 states positive selection (ω ≥ 1). In both cases the likelihood of positive selection was significantly higher (p < 0.0001), retrieving similar sites under positive selection. The likelihood ratio test is a conservative approach, which can be biased by false positives in the presence of high recombination rates [80]. Thus, we evaluated the possibility of gene conversion/recombination in the KRTAP20 subfamily that has expanded dramatically in the wallaby genome, but did not detect such evidence. The results of positive selection tests are shown in Table 2. The positive selection acting of KRTAP probably favored the diversification and adaptation to different environment.

Table 2 Likelihood ratio test for PAML site models within Wallaby

Differential evolution of the HS and HGT KRTAPs

The KRTAP multigene family has experienced dynamic evolution and diversification within and among genomes as observed in the 30 diverse subfamilies of high cysteine and high glycine-tyrosine subfamilies. These two groups have evolved differently, with the high cysteine group showing high rates of gene conversion within subfamilies, with some exhibiting characteristic differences in copy number, while others have been more conserved. This may be an adaptive mechanism promoting a high order of amplification of similar copies to meet the high demand for the structural proteins required to adapt to changing environmental conditions (e.g. sloth have extensive hairs that can harbor symbiotic microorganism communities while the dolphin is “hairless” in response to a more-predictable and constant environment and to create less resistance when swimming). We also compared the differential evolutionary patterns between high cysteine and high glycine–tyrosine genes using the Pearson correlation coefficient for the number of genes in each subfamily between species. The coefficient value for high cysteine is significantly higher than for high glycine-tyrosine (Figure 7A) and the coefficient values for the two are positively correlated (p < 0.001) (Figure 7B). The high GC content and negative correlation between GC content and synonymous substitution rates also support the higher rates of gene conversion observed in high cysteine genes relative to high glycine-tyrosine KRTAPs, suggesting that high cysteine are under high rates of concerted evolution promoted by gene conversion and recombination events (see Additional files 3 and 4). By contrast, HGT-KRTAP had a more-dynamic evolutionary pattern, with less evidence of gene conversion or recombination, but with signatures of positive selection.

Figure 7

Pearson correlation coefficients (r) show the evolutionary differentiation of KRTAP genes. Pearson correlation coefficients (r) values of the high cysteine and high glycine-tyrosine KRTAP are positively correlated. The linear regression is shown. (A) The boxplot for Pearson correlation coefficients (r) of gene numbers of each subfamily between species shows, high cysteine KRTAP genes have higher correlation coefficient than high glycine-tyrosine KRTAP genes (B).

Size Polymorphism and amino acid composition affects KRTAP matrix formation and interactions with hair KIFs

The KRTAP family is widely grouped into three major categories based on amino acid composition: (i) high sulfur (<30% cysteine content), including subfamilies 1, 2, 3, 10-13, 16, 24-27, 29, 31, 34 and 35; (ii) ultrahigh sulfur (>30% cysteine content), including subfamilies 4, 5, 9, 17, 28, 30, 32 and 33 subfamilies; and (iii) high glycine/tyrosine, including subfamilies 6, 7, 8, 19, 20 and 21. The amino acid composition is shown in Table 3. Subfamily gene members also showed size polymorphism [8183] mostly due to cysteine-rich repeats, which create difference in cysteine content. Cysteine is important for the formation of strong disulphide bonds. Thus, changes in cysteine composition can result in differential interaction among KRTAPs and between KIFs and KRATPs leading to combinatorial complexity and thereby creating morphological differences in hair fiber strength, rigidity and flexibility [40].

Table 3 Amino acid composition of KRTAPs subfamily genes in mammals


Gene families are formed by gene duplication, a process that provides important raw material for functional innovation and adaptive selection. Gene families vary in size from a few to thousands of gene members, which makes it difficult to identify and characterize them without sufficient genome sequences. The genome sequencing projects have made it possible to explore complex gene families involved in different phenotypes. Here we explored the mammal-specific KRTAP gene family, which is the major constituent of the hair proteome and plays a primary role in hair formation and thus long been associated with phenotypic differences in hair and wool. This study assessed patterns of variation using comparative genomic approaches in the KRTAP gene family. Our study used 22 diverse mammalian genomes that encompassed closely related species, e.g. family Hominidae of primates, comparing apes with dense hair cover with human with much less hair cover, along with species with diverse hair related characteristic, e.g. alpaca (fibre), armadillo (modified scales), hedgehog (spines), sloth (hosting hair symbionts) and dolphin (mostly hairless and aquatic), to obtain greater insights into the KRTAP gene family evolution relative to mammalian hair and phenotypic variations.

We found high molecular diversity within the KRTAP gene family, with 30 subfamilies (24 belonging to high cysteine and six belonging to high glycine-tyrosine KRTAP) (Additional file 2 and Table 1) and approximately 100-180 KRTAP gene members, which are arranged in five clusters at five different chromosome locations in a genome (Figure 2). Most KRTAP subfamilies are found in all mammalian orders, with variations in expansion, contraction, presence/absence, different rates of pseudogenization and sequence variation (length polymorphisms and amino acid changes). For example, we found species-specific differences in the size and compositions of some subfamilies (e.g. subfamilies 4, 5 and 9) probably caused by unequal crossing over accompanied with high GC content. Moreover, we also found lineage-specific trends, such as in marsupials, where both wallaby and opossum lacked subfamilies 13, 21 and 26 and showed expansion of the KRTAP20 subfamily, which is under positive selection (Table 2). However, highly conserved sequences and the maintenance of the same number of members in subfamilies 1, 2, and 3 suggests that high rates of gene conversion maintain homogeneity and with evolution occurring through a process of punctuated equilibrium [14, 84, 85]. Similarly, the conserved synteny of KRTAP gene clusters shows that there are also strong constraints acting on this gene family and supports the important role of KRTAP gene family in shaping hair characteristics.

Together with the high molecular diversity of the KTRAP gene family observed in our study, considerable intraspecific diversification has been reported earlier with copy number variations [8691] in ethnic human populations and allelic variations in sheep. Such sequence polymorphism found in KRTAP gene members may influence its expression, protein structure, and/or post-translation modifications and consequently effect wool/hair fibre structure and wool/hair quality traits [92, 23, 93]. For example, evidence of linkage reported between KRTAP6-8 and wool fiber diameter (quantitative trait) in sheep [94] may be related with similar characteristics in alpaca (fiber), which has one of the largest number of KRTAP6 genes (n = 9) in mammals. Further exploration of KRATP gene family in sheep could help shed light on the improvement of hair/wool traits [57, 95, 94, 96].Interestingly, we found differences in KRTAP gene repertoire related with hair features. A very expanded KRTAP gene family repertoire (175 total genes and 141 intact genes) was found in sloth, which has long, dense and coarse body hair cover, which also serves as a host for symbiotic microorganisms (e.g., cyanobacteria) in this arboreal mammal. By contrast, we have detected a reduced number of functional KRTAP genes and high percentage of KRTAP pseudogenes (74%) in dolphin (aquatic mammal), highlighting the much lower KRTAP gene requirement in this smooth-skinned species that only has a few hairs (bristle) at the rostrum. These are lost soon after birth and in adults the hairless pits are adapted for sensory functions (Figure 4). This example illustrates the adaptive potential of hair follicles to diversify into more specialized sensory organs.We also observed that several unique hair-related phenotypes in some of the species, such as scales in armadillo, fiber in alpaca and spines in hedgehog, are linked with an inverse correlation between the number of intact KRTAP genes and the number of pseudogenes. Armadillo and hedgehog have specialized hair features, where the pattern in alpaca could be due to inbreeding and genome homogenization during domestication. The “hairless” dolphin had a large number of KRTAP pseudogenes relative to intact genes (Figure 5). In contrast, the sloth showed a high positive correlation with intact KRTAP genes, suggesting that changes in KRTAP can be related to morphological diversity of hair phenotypes (Figure 5). In contrast, we did not find any correlation between the comparatively hairless human and other primates, which favors the hypothesis, that diversification of keratinization structures in mammals may be collectively explained by KRTAP gene number variation together with other biological mechanisms, such as gene expression variation in KRTAPs (which can be further influenced by KRTAP genes polymorphism).

We suggest that the diverse repertoire and variability in KRTAPs (at gene, family and genome level) provides extraordinary combinatorial complexity [97] for interaction between KRTAPs and Keratin intermediate filaments, resulting in a rich diversity of pathways for evolutionary change, which together with differences in higher order expression of KRTAP genes results in the diverse hair morphological characteristic visible in extant mammals. Overall, we conclude that KRTAPs play an important role in evolution and diversification of hair character across mammals and are responsible for unique features of hair.


The present study explored KRTAP gene family evolution in various mammalian species inhabiting diverse terrestrial and aquatic environments. The two groups of the KRTAP gene family, high cysteine and high glycine-tyrosine KRTAP genes, have evolve differently, resulting in species-specific diversification of this multi-gene family and leading wide morphological diversity in hair characteristics in extant mammals. We conclude that differences in KRTAP gene family repertoires, together with changes in expression patterns, are responsible for shaping unique hair characteristics in diverse mammalian species. These differences are more pronounced between aquatic and terrestrial species and demonstrate the important adaptive role of hairs in terrestrial colonization and the radiation of mammals from water to land. Future studies comparing the KRTAP repertoire in key model organisms, e.g. Alpaca and sheep, may provide insights to understanding the role of KRTAP gene variations in hair fibre traits and its use in textile industry.


Gene identification

All KRTAP genes are relatively small (ca. 1 kb) and generally have single exon [83]. Some KRTAP genes appear to possess small introns. However these are similar to repeat regions present in the gene [98] and can be translated in-frame with the coding exon, leading to the conclusion that all KRTAP are intron-less [14, 99]. The presence of KRTAP gene clusters in mammalian genomes makes it easy to identify and fully characterize the gene family in genomes with high coverage, but in low coverage genomes it requires much more manual inspection and in-depth screening to insure an almost complete or maximum possible repertoire of non-redundant KRTAP genes (Additional file 2). In order to identify the complete gene repertoire in the KRTAP gene family, all previously annotated gene sequences were taken and used as query in blast searches against the genomes from Ensembl and NCBI genome database using BLASTN algorithm [100] and E-value cut-off of 10. We retrieved multiple hits for each query and selected all the non-redundant hits by extending 500 bp at both 5’ and 3’ ends. Non-redundant hits, which were seen to be clustered in the same region (chromosome, contig, genescaffold, scaffold, supercontig) were merged together to form a single extended common DNA fragment, bearing all these hits and the ends of this fragment were further extended to maximum 0.3 Mbp were ever possible. Finally all the hits were used to identify and annotate KRTAP gene using program BLAST 2 Sequences [101] TFASTX and TFASTY incorporated in Fasta programs [102] and ORF finder from NCBI and Mobyle [103]. The identified gene were blast searched against non redundant NBCI blast database, all best hits which resulted in KRTAP or KRTAP like sequences were finally taken as KRTAP genes. The KRTAP genes were further classified into intact/complete genes, partial genes and pseudogene with interrupting frame-shift mutations and/or stop codons.

Phylogenetic analysis

We employed phylogenic tree building method to further classify the identified KRTAP gene repertoire to their respective subfamilies. For each species the intact genes were used for building phylogentic tree. All intact KRTAP genes were translated to amino-acid and aligned using ClustalW incorporated in MEGA4.0 [104, 105] with Blosum protein weight matrix the manual adjustments were done when ever needed to correct the final alignment. This final protein sequence alignment was used to build the KRTAP gene tree with the Neighbour-Joining method with P-distance and the interior branch test evaluated with 1,000 replications [106] (Additional file 1: Figures S1-24). We make use of unique motifs and repeat sequence structure present in KRTAP subfamilies along with phylogeny and blast results to further help identify and classify partial and pseudogenes to the respective subfamilies.

Gene conversion and Recombination study

We used the program Geneconv[74] to detect statistically significant events of sequence homogenization on paralogs using Global Bonferroni corrected P values. The lower P values indicate greater support for gene conversion. The multiple sequence alignment of protein were back translated and used as input for the Geneconv to give both global and pairwise fragments involved in gene conversion. We also used the RDP3 software [75] to detect recombination events using RDP, Bootscan, MaxChi and Chimaera with 1,000 permutations and cutoff p value of 0.01 employing Bonferroni correction.

The evolutionary distance between genes can be calculated with synonymous substitution, which are immune to selection and are not decreased by negative selection [50]. The sequence divergence was estimated using approximate synonymous substitution rates (dS) implemented in MEGA using modified Nei-Gojobori (P-distance) method with transition/transversion ratio of 2. GC content was estimated using MEGA5.0 [107]. More than two sequences are needed to detect the signals of recombination therefore subfamilies having more than three genes were used for studies of gene conversion (Additional files 3 and 4).

Statistical analysis

In order to study the differential evolutionary pattern of high cysteine and high glycine-tyrosine KRTAP genes, we compared the pairwise-pearson correction coefficient (Figure 7) of the number of genes present in each subfamily (Table 1). We also compared the correlation between GC content (GC% and GC3%) and synonymous substitution rates (Figure 6) using the Nei-Gojobori (P-distance) method with transition/transversion ratio of 2 in MEGA4.0 [95].

Availability of supporting data

All the supporting data are included as additional files.


  1. 1.

    Alibardi L: Adaptation to the land: the skin of reptiles in comparison to that of amphibians and endotherm amniotes. Biologia. 2003, 41: 12-41. doi:10.1002/jez.b.00024

    Google Scholar 

  2. 2.

    Chuong C-M, Homberger DG: Development and evolution of the amniote integument: current landscape and future horizon. J Exp Zool B Mol Dev Evol. 2003, 298: 1-11. doi:10.1002/jez.b.23

    PubMed Central  PubMed  Google Scholar 

  3. 3.

    George RD, McVicker G, Diederich R, Ng SB, MacKenzie AP, Swanson WJ, Shendure J, Thomas JH: Trans genomic capture and sequencing of primate exomes reveals new targets of positive selection. Genome Res. 2011, 21: 1686-1694. 10.1101/gr.121327.111. doi:10.1101/gr.121327.111

    CAS  PubMed Central  PubMed  Google Scholar 

  4. 4.

    Sun Y-B, Zhou W-P, Liu H-Q, Irwin DM, Shen Y-Y, Zhang Y-P: Genome-Wide Scans for Candidate Genes Involved to the Aquatic Adaptation of Dolphins. Genome Biol Evol. 2012, 10.1093/gbe/evs123 Available:

    Google Scholar 

  5. 5.

    Alibardi L, Valle LD, Nardi A, Toni M: Evolution of hard proteins in the sauropsid integument in relation to the cornification of skin derivatives in amniotes. J Anat. 2009, 214: 560-586. 10.1111/j.1469-7580.2009.01045.x. doi:10.1111/j.1469-7580.2009.01045.x

    CAS  PubMed Central  PubMed  Google Scholar 

  6. 6.

    Alibardi L: Embryonic keratinization in vertebrates in relation to land colonization. Acta Zoologica. 2009, 90: 1-17. doi:10.1111/j.1463-6395.2008.00327.x

    Google Scholar 

  7. 7.

    Rogers MA, Winter H, Langbein L, Wollschläger A, Praetzel-Wunder S, Jave-Suarez LF, Schweizer J: Characterization of human KAP24.1, a cuticular hair keratin-associated protein with unusual amino-acid composition and repeat structure. J Invest Dermatol. 2007, 127: 1197-1204. 10.1038/sj.jid.5700702. 10.1038/sj.jid.5700702

    CAS  PubMed  Google Scholar 

  8. 8.

    Zimek A, Weber K: Terrestrial vertebrates have two keratin gene clusters; striking differences in teleost fish. Eur J Cell Biol. 2005, 84: 623-635. 10.1016/j.ejcb.2005.01.007. doi:10.1016/j.ejcb.2005.01.007

    CAS  PubMed  Google Scholar 

  9. 9.

    Alibardi L: Structural and immunocytochemical characterization of keratinization in vertebrate epidermis and epidermal derivatives. Int Rev Cytol. 2006, 253: 177-259. doi:10.1016/S0074-7696(06)53005-0

    CAS  PubMed  Google Scholar 

  10. 10.

    Alibardi L, Jaeger K, Valle LD, Eckhart L: Ultrastructural localization of hair keratin homologs in the claw of the lizard Anolis carolinensis. J Morphol. 2011, 272: 363-370. 10.1002/jmor.10920. doi:10.1002/jmor.10920

    PubMed  Google Scholar 

  11. 11.

    Vandebergh W, Bossuyt F: Radiation and functional diversification of alpha keratins during early vertebrate evolution. Mol Biol Evol. 2011, 2011: doi:10.1093/molbev/msr269

    Google Scholar 

  12. 12.

    Eckhart L, Valle LD, Jaeger K, Ballaun C, Szabo S, Nardi A, Buchberger M, Hermann M, Alibardi L, Tschachler E: Identification of reptilian genes encoding hair keratin-like proteins suggests a new scenario for the evolutionary origin of hair. Proc Natl Acad Sci U S A. 2008, 105: 18419-18423. 10.1073/pnas.0805154105. doi:10.1073/pnas.0805154105

    CAS  PubMed Central  PubMed  Google Scholar 

  13. 13.

    Hesse M, Zimek A, Weber K, Magin TM: Comprehensive analysis of keratin gene clusters in humans and rodents. Eur J Cell Biol. 2004, 83: 19-26. 10.1078/0171-9335-00354.

    CAS  PubMed  Google Scholar 

  14. 14.

    Wu D-D, Irwin DM, Zhang Y-P: Molecular evolution of the keratin associated protein gene family in mammals, role in the evolution of mammalian hair. BMC Evol Biol. 2008, 8: 241-10.1186/1471-2148-8-241. doi:10.1186/1471-2148-8-241

    PubMed Central  PubMed  Google Scholar 

  15. 15.

    Alibardi L: Fine structure of marsupial hairs, with emphasis on trichohyalin and the structure of the inner root sheath. J Morphol. 2004, 261: 390-402. 10.1002/jmor.10257. doi:10.1002/jmor.10257

    PubMed  Google Scholar 

  16. 16.

    Botchkarev VA, Paus R: Molecular biology of hair morphogenesis: development and cycling. J Exp Zool B Mol Dev Evol. 2003, 298: 164-180. doi:10.1002/jez.b.33

    PubMed  Google Scholar 

  17. 17.

    Millar SE: Molecular Mechanisms Regulating Hair Follicle Development. J Invest Dermatol. 2002, 118: 216-225. 10.1046/j.0022-202x.2001.01670.x.

    CAS  PubMed  Google Scholar 

  18. 18.

    Schneider MR, Schmidt-Ullrich R, Paus R: The hair follicle as a dynamic miniorgan. Curr Biol. 2009, 19: R132-R142. 10.1016/j.cub.2008.12.005. doi:10.1016/j.cub.2008.12.005

    CAS  PubMed  Google Scholar 

  19. 19.

    Hardy MH: The secret life of the hair follicle. Trends Genet. 1992, 8: 55-61. 10.1016/0168-9525(92)90044-5. 16/0168-9525(92)90350-D

    CAS  PubMed  Google Scholar 

  20. 20.

    Franbourg A, Hallegot P, Baltenneck F, Toutaina C, Leroy F: Current research on ethnic hair. J Am Acad Dermatol. 2003, 48: S115-S119. 10.1067/mjd.2003.277. doi:10.1067/mjd.2003.277

    CAS  PubMed  Google Scholar 

  21. 21.

    Sahajpal V, Goyal S, Singh K, Thakur V: Dealing Wildlife Offences in India: Role of the Hair as Physical Evidence. Int J Trichology. 2009, 1: 18-26. 10.4103/0974-7753.51928. doi:10.4103/0974-7753.51928

    PubMed Central  PubMed  Google Scholar 

  22. 22.

    Bahuguna A, Mukherjee SK: Use of SEM to recognise Tibetan antelope (Chiru) hair and blending in wool products. Sci Justice. 2000, 40: 177-182. 10.1016/S1355-0306(00)71973-3. doi:10.1016/S1355-0306(00)71973-3

    Google Scholar 

  23. 23.

    Jenkins BJ, Powell BC: Differential expression of genes encoding a cysteine-rich keratin family in the hair cuticle. J Invest Dermatol. 1994, 103: 310-317. 10.1111/1523-1747.ep12394770.

    CAS  PubMed  Google Scholar 

  24. 24.

    Shimomura Y, Aoki N, Rogers MA, Langbein L, Schweizer J, Ito M: hKAP1.6 and hKAP1.7, two novel human high sulfur keratin-associated proteins are expressed in the hair follicle cortex. J Invest Dermatol. 2002, 118: 226-231. 10.1046/j.0022-202x.2001.01653.x.

    CAS  PubMed  Google Scholar 

  25. 25.

    Rogers MA, Langbein L, Praetzel-Wunder S, Giehl K: Characterization and expression analysis of the hair keratin associated protein KAP26.1. Br J Dermatol. 2008, 159: 725-729. 10.1111/j.1365-2133.2008.08743.x.

    CAS  PubMed  Google Scholar 

  26. 26.

    Rogers MA, Langbein L, Winter H, Ehmann C, Praetzel S, Schweizer J: Characterization of a first domain of human high glycine-tyrosine and high sulfur keratin-associated protein (KAP) genes on chromosome 21q22.1. J Biol Chem. 2002, 277: 48993-49002. 10.1074/jbc.M206422200.

    CAS  PubMed  Google Scholar 

  27. 27.

    Powell BC, Arthur J, Nesci A: Characterization of a gene encoding a cysteine-rich keratin associated protein synthesized late in rabbit hair follicle differentiation. Differentiation. 1995, 58: 227-232. 10.1046/j.1432-0436.1995.5830227.x. doi:10.1046/j.1432-0436.1995.5830227.x

    CAS  PubMed  Google Scholar 

  28. 28.

    Pruett ND, Tkatchenko TV, Jave-Suarez L, Jacobs DF, Potter CS, Tkatchenko AV, Schweizer J, Awgulewitsch A: Krtap16, characterization of a new hair keratin-associated protein (KAP) gene complex on mouse chromosome 16 and evidence for regulation by Hoxc13. J Biol Chem. 2004, 279: 51524-51533. 10.1074/jbc.M404331200. doi:10.1074/jbc.M404331200

    CAS  PubMed  Google Scholar 

  29. 29.

    Stenn KS, Paus R: Controls of hair follicle cycling. Physiol Rev. 2001, 81: 449-494.

    CAS  PubMed  Google Scholar 

  30. 30.

    Rogers GE: Hair follicle differentiation and regulation. Int J Dev Biol. 2004, 48: 163-170. 10.1387/ijdb.15272381. doi:10.1387/ijdb.021587gr

    CAS  PubMed  Google Scholar 

  31. 31.

    Schweizer J, Bowden PE, Coulombe PA, Langbein L, Lane EB, Magin TM, Maltais L, Omary MB, Parry DD, Rogers MA, Wright MW: New consensus nomenclature for mammalian keratins. J Cell Biol. 2006, 174: 169-174. 10.1083/jcb.200603161. doi:10.1083/jcb.200603161

    CAS  PubMed Central  PubMed  Google Scholar 

  32. 32.

    Rogers M, Winter H, Langbein L, Bleiler R: The human type I keratin gene family: characterization of new hair follicle specific members and evaluation of the chromosome 17q21. 2 gene domain. Differentiation. 2004, 72: 527-540. 10.1111/j.1432-0436.2004.07209006.x. doi:10.1111/j.1432-0436.2004.07209006.x

    CAS  PubMed  Google Scholar 

  33. 33.

    Rogers MA, Edler L, Winter H, Langbein L, Beckmann I, Schweizer J: Characterization of new members of the human type II keratin gene family and a general evaluation of the keratin gene domain on chromosome 12q13.13. J Invest Dermatol. 2005, 124: 536-544. 10.1111/j.0022-202X.2004.23530.x.

    CAS  PubMed  Google Scholar 

  34. 34.

    Rogers MA, Winter H, Wolf C, Heck M, Schweizer J: Characterization of a 190-kilobase pair domain of human type i hair keratin genes. J Biol Chem. 1998, 273: 26683-26691. 10.1074/jbc.273.41.26683. doi:10.1074/jbc.273.41.26683

    CAS  PubMed  Google Scholar 

  35. 35.

    Rogers MA, Winter H, Langbein L, Wolf C, Schweizer J: Characterization of a 300 kbp region of human DNA containing the type II hair keratin gene domain. J Invest Dermatol. 2000, 114: 464-472. 10.1046/j.1523-1747.2000.00910.x. doi:10.1046/j.1523-1747.2000.00910.x

    CAS  PubMed  Google Scholar 

  36. 36.

    Steinert PM, North AC, Parry DA: Structural features of keratin intermediate filaments. J Invest Dermatol. 1994, 103: 19S-24S.

    CAS  PubMed  Google Scholar 

  37. 37.

    Powell BC, Nesci A, Rogers GE: Regulation of keratin gene expression in hair follicle differentiation. Ann N Y Acad Sci. 1991, 642: 1-20.

    CAS  PubMed  Google Scholar 

  38. 38.

    Powell BC, Rogers GE: The role of keratin proteins and their genes in the growth, structure and properties of hair. EXS. 1997, 78: 59-148.

    CAS  PubMed  Google Scholar 

  39. 39.

    Fujikawa H, Fujimoto A, Farooq M, Ito M, Shimomura Y: Characterization of the Human Hair Keratin–Associated Protein 2 (KRTAP2) Gene Family. J Invest Dermatol. 2012, 132: 1806-1813. 10.1038/jid.2012.73. doi:10.1038/jid.2012.73

    CAS  PubMed  Google Scholar 

  40. 40.

    Shimomura Y, Ito M: Human hair keratin-associated proteins. J Invest Dermatol Symp Proc Soc Invest Dermatol Inc Eur Soc Dermatol Res. 2005, 10: 230-233. 10.1111/j.1087-0024.2005.10112.x. doi:10.1111/j.1087-0024.2005.10112.x

    CAS  Google Scholar 

  41. 41.

    Lee YJ, Rice RH, Lee YM: Proteome analysis of human hair shaft: from protein identification to posttranslational modification. Mol Cell Proteomics. 2006, 5: 789-800. 10.1074/mcp.M500278-MCP200. doi:10.1074/mcp.M500278-MCP200

    CAS  PubMed  Google Scholar 

  42. 42.

    Koehn H, Clerens S, Deb-Choudhury S, Morton JD, Dyer JM, Plowman JE: The proteome of the wool cuticle. J Proteome Res. 2010, 9: 2920-2928. 10.1021/pr901106m. doi:10.1021/pr901106m

    CAS  PubMed  Google Scholar 

  43. 43.

    Rogers MA, Langbein L, Winter H, Ehmann C, Praetzel S, Korn B, Schweizer J: Characterization of a cluster of human high/ultrahigh sulfur keratin-associated protein genes embedded in the type I keratin gene domain on chromosome 17q12-21. J Biol Chem. 2001, 276: 19440-19451. 10.1074/jbc.M100657200. doi:10.1074/jbc.M100657200

    CAS  PubMed  Google Scholar 

  44. 44.

    Rogers MA, Langbein L, Winter H, Beckmann I, Praetzel S, Schweizer J: Hair keratin associated proteins: characterization of a second high sulfur KAP gene domain on human chromosome 21. J Invest Dermatol. 2004, 122: 147-158. 10.1046/j.0022-202X.2003.22128.x. doi:10.1046/j.0022-202×.2003.22128.x

    CAS  PubMed  Google Scholar 

  45. 45.

    Shibuya K, Obayashi I, Asakawa S, Minoshima S, Kudoh J, Shimizu N: A cluster of 21 keratin-associated protein genes within introns of another gene on human chromosome 21q22.3. Genomics. 2004, 83: 679-693. 10.1016/j.ygeno.2003.09.024.

    CAS  PubMed  Google Scholar 

  46. 46.

    Yahagi S, Shibuya K, Obayashi I, Masaki H, Kurata Y, Kudoh J, Shimizu N: Identification of two novel clusters of ultrahigh-sulfur keratin-associated protein genes on human chromosome 11. Biochem Biophys Res Commun. 2004, 318: 655-664. 10.1016/j.bbrc.2004.04.074. doi:10.1016/j.bbrc.2004.04.074

    CAS  PubMed  Google Scholar 

  47. 47.

    Fumasoni I, Meani N, Rambaldi D, Scafetta G, Alcalay M, Ciccarelli FD: Family expansion and gene rearrangements contributed to the functional specialization of PRDM genes in vertebrates. BMC Evol Biol. 2007, 7: 187-10.1186/1471-2148-7-187. doi:10.1186/1471-2148-7-187

    PubMed Central  PubMed  Google Scholar 

  48. 48.

    Lespinet O, Wolf YI, Koonin EV, Aravind L: The role of lineage-specific gene family expansion in the evolution of eukaryotes. Genome Res. 2002, 1048-1059. doi:10.1101/gr.174302.eages

    Google Scholar 

  49. 49.

    Leister D: Tandem and segmental gene duplication and recombination in the evolution of plant disease resistance gene. Trends Genet. 2004, 20: 116-122. 10.1016/j.tig.2004.01.007.

    CAS  PubMed  Google Scholar 

  50. 50.

    Zhang J: Evolution by gene duplication: an update. Trends Ecol Evol. 2003, 18: 292-298. 10.1016/S0169-5347(03)00033-8. doi:16/S0169-5347(03)00033-8

    Google Scholar 

  51. 51.

    Lynch M, Force A: The probability of duplicate gene preservation by subfunctionalization. Genetics. 2000, 154: 459-473.

    CAS  PubMed Central  PubMed  Google Scholar 

  52. 52.

    Ensembl Genome Browser.,

  53. 53.

    Flicek P, Amode MR, Barrell D, Beal K, Brent S, Denise C-S, Clapham P, Coates G, Fairley S, Fitzgerald S, Gil L, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri AK, Keefe D, Keenan S, Kinsella R, Komorowska M, Koscielny G, Kulesha E, Larsson P, Longden I, McLaren W, Muffato M, Overduin B, Pignatelli M, Pritchard B, Riat HS, et al: Ensembl 2012. Nucl Acids Res. 2011, doi:10.1093/nar/gkr991 Available:

    Google Scholar 

  54. 54.

    Map Viewer - National Center for Biotechnology Information.,

  55. 55.

    Bininda-Emonds ORP, Cardillo M, Jones KE, MacPhee RDE, Beck RMD, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A: The delayed rise of present-day mammals. Nature. 2007, 446: 507-512. 10.1038/nature05634. doi:10.1038/nature05634

    CAS  PubMed  Google Scholar 

  56. 56.

    Walsh JB, Stephan W: Multigene Families: Evolution. Encyclopedia Life Sci. 2001, 1-6.,

    Google Scholar 

  57. 57.

    McLaren RJ, Rogers GR, Davies KP, Maddox JF, Montgomery GW: Linkage mapping of wool keratin and keratin-associated protein genes in sheep. Mamm Genome. 1997, 8: 938-940. 10.1007/s003359900616.

    CAS  PubMed  Google Scholar 

  58. 58.

    Warren WC, Hillier LW, Graves JAM, Birney E, Ponting CP, Grützner F, Belov K, Miller W, Clarke L, Chinwalla AT, Yang S-P, Heger A, Locke DP, Miethke P, Waters PD, Veyrunes F, Fulton L, Fulton B, Graves T, Wallis J, Puente XS, López-Otín C, Ordóñez GR GR, Eichler EE, Chen L, Cheng Z, Deakin JE, Alsop A, Thompson K, Kirby P, et al: Genome analysis of the platypus reveals unique signatures of evolution. Nature. 2008, 453: 175-183. 10.1038/nature06936.

    CAS  PubMed Central  PubMed  Google Scholar 

  59. 59.

    Suutari M, Majaneva M, Fewer DP, Voirin B, Aiello A, Friedl T, Chiarello AG, Blomster J: Molecular evidence for a diverse green algal community growing in the hair of sloths and a specific association with Trichophilus welckeri (Chlorophyta, Ulvophyceae). BMC Evol Biol. 2010, 10: 86-10.1186/1471-2148-10-86. doi:10.1186/1471-2148-10-86

    PubMed Central  PubMed  Google Scholar 

  60. 60.

    Palmer E, Weddell G: The Relationship between structure, innervation and function of the skin of the bottle nose dolphin (Tursiops truncatus). Proc Zool Soc London. 1964, 143: 553-568. doi:10.1111/j.1469-7998.1964.tb03881.x

    Google Scholar 

  61. 61.

    Meyer W, Schmidt J, Busche R, Jacob R, Naim HY: Demonstration of free fatty acids in the integument of semi-aquatic and aquatic mammals. Acta Histochem. 2012, 114: 145-150. 10.1016/j.acthis.2011.03.011. doi:10.1016/j.acthis.2011.03.011

    CAS  PubMed  Google Scholar 

  62. 62.

    Czech-Damal NU, Liebschner A, Miersch L, Klauer G, Hanke FD, Marshall C, Dehnhardt G, Hanke W: Electroreception in the Guiana dolphin (Sotalia guianensis). Proc R Soc B. 2012, 2011, doi:10.1098/rspb.2011.1127 Available: 23 July

    Google Scholar 

  63. 63.

    Mauck B, Eysel U, Dehnhardt G: Selective heating of vibrissal follicles in seals (Phoca vitulina) and dolphins (Sotalia fluviatilis guianensis). J Exp Biol. 2000, 203: 2125-2131.

    CAS  PubMed  Google Scholar 

  64. 64.

    Jenkins J: “Tursiops truncatus” (On-line), animal diversity web. 2009, Accessed July 23, 2012 at

    Google Scholar 

  65. 65.

    Thewissen JGM, Cooper LN, George JC, Bajpai S: From land to water: the origin of whales, dolphins, and porpoises. Evol Educ Outreach. 2009, 2: 272-288. 10.1007/s12052-009-0135-2. doi:10.1007/s12052-009-0135-2

    Google Scholar 

  66. 66.

    Fish FE, Hui CA: Dolphin swimming–a review. Mammal Rev. 1991, 21: 181-195. 10.1111/j.1365-2907.1991.tb00292.x. doi:10.1111/j.1365-2907.1991.tb00292.x

    Google Scholar 

  67. 67.

    Hicks BD, Aubin DJS, Geraci JR, Brown WR: Epidermal growth in the Bottlenose Dolphin, Tursiops truncatus. J Invest Dermatol. 1985, 85: 60-63. 10.1111/1523-1747.ep12275348. doi:10.1111/1523-1747.ep12275348

    CAS  PubMed  Google Scholar 

  68. 68.

    Brown TA: Genomes. How Genomes Evolve.NCBI Bookshelf. Chapter 15. Oxford: Wiley-Liss, Available:

  69. 69.

    Kostka D, Hubisz MJ, Siepel A, Pollard KS: The role of GC-biased gene conversion in shaping the fastest evolving regions of the human genome. Mol Biol Evol. 2012, 29 (3): 1047-1057. 10.1093/molbev/msr279. doi:10.1093/molbev/msr279. Epub 2011 Nov 10

    CAS  PubMed Central  PubMed  Google Scholar 

  70. 70.

    Duret L, Arndt PF: The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet. 2008, 4: e1000071-10.1371/journal.pgen.1000071. doi:10.1371/journal.pgen.1000071

    PubMed Central  PubMed  Google Scholar 

  71. 71.

    Escobar JS, Glémin S, Galtier N: GC-biased gene conversion impacts ribosomal DNA evolution in vertebrates, angiosperms, and other eukaryotes. Mol Biol Evol. 2011, 28: 2561-2575. 10.1093/molbev/msr079. doi:10.1093/molbev/msr079

    CAS  PubMed  Google Scholar 

  72. 72.

    Galtier N, Duret L: Adaptation or biased gene conversion? extending the null hypothesis of molecular evolution. Trends Genet. 2007, 23: 273-277. 10.1016/j.tig.2007.03.011. doi:10.1016/j.tig.2007.03.011

    CAS  PubMed  Google Scholar 

  73. 73.

    Noonan JP, Grimwood J, Schmutz J, Dickson M, Myers RM: Gene conversion and the evolution of protocadherin gene cluster diversity. Genome Res. 2004, 14: 354-366. 10.1101/gr.2133704. doi:10.1101/gr.2133704

    CAS  PubMed Central  PubMed  Google Scholar 

  74. 74.

    Sawyer S: Statistical tests for detecting gene conversion. Mol Biol Evol. 1989, 6: 526-538.

    CAS  PubMed  Google Scholar 

  75. 75.

    Martin DP, Lemey P, Lott M, Moulton V, Posada D, Lefeuvre P: RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics. 2010, 26: 2462-2463. 10.1093/bioinformatics/btq467. doi:10.1093/bioinformatics/btq467

    CAS  PubMed Central  PubMed  Google Scholar 

  76. 76.

    Han MV, Demuth JP, Mcgrath CL, Casola C, Hahn MW: Adaptive evolution of young gene duplicates in mammals. Genome Res. 2009, 19 (5): 859-867. 10.1101/gr.085951.108. doi: 10.1101/gr.085951.108

    CAS  PubMed Central  PubMed  Google Scholar 

  77. 77.

    Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088. doi:10.1093/molbev/msm088

    CAS  PubMed  Google Scholar 

  78. 78.

    Yang Z: Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998, 15: 568-573. 10.1093/oxfordjournals.molbev.a025957.

    CAS  PubMed  Google Scholar 

  79. 79.

    Nielsen R, Yang Z: Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998, 148: 929-936.

    CAS  PubMed Central  PubMed  Google Scholar 

  80. 80.

    Anisimova M, Nielsen R, Yang Z: Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics. 2003, 164: 1229-1236.

    CAS  PubMed Central  PubMed  Google Scholar 

  81. 81.

    Kariya N, Shimomura Y, Ito M: Size polymorphisms in the human ultrahigh sulfur hair keratin-associated protein 4, kap4, gene family. J Invest Dermatol. 2005, 124: 1111-1118. 10.1111/j.0022-202X.2005.23662.x.

    CAS  PubMed  Google Scholar 

  82. 82.

    Parry DD, Smith TA, Rogers MA, Schweizer J: Human hair keratin-associated proteins: sequence regularities and structural implications. J Struct Biol. 2006, 155: 361-369. 10.1016/j.jsb.2006.03.018. doi:10.1016/j.jsb.2006.03.018

    CAS  PubMed  Google Scholar 

  83. 83.

    Rogers M, Schweizer J: Human KAP genes, only the half of it? extensive size polymorphisms in hair keratin-associated protein genes. J Invest Dermatol. 2005, 124: vii-ix. doi:10.1111/j.0022-202×.2005.23728.x

    CAS  PubMed  Google Scholar 

  84. 84.

    Gould SJ, Eldredge N: Punctuated equilibrium comes of age. Nature. 1993, 366: 223-227. 10.1038/366223a0. 10.1038/366223a0

    CAS  PubMed  Google Scholar 

  85. 85.

    Mattila TM, Bokma F: Extant mammal body masses suggest punctuated equilibrium. Proc Biol Sci. 2008, 275: 2195-2199. 10.1098/rspb.2008.0354. doi:10.1098/rspb.2008.0354

    PubMed Central  PubMed  Google Scholar 

  86. 86.

    Marotta M, Chen X, Inoshita A, Stephens R, Budd GT, Crowe J, Lyons J, Kondratova A, Tubbs R, Tanaka H: A common copy number breakpoint of ERBB2 amplification in breast cancer co-localizes with a complex block of segmental duplications. Breast Cancer Res. 2012, 14: R150-10.1186/bcr3362. doi:10.1186/bcr3362

    CAS  PubMed Central  PubMed  Google Scholar 

  87. 87.

    Gautam P, Jha P, Kumar D, Tyagi S, Varma B, Dash D, Mukhopadhyay A, Mukerji M: Spectrum of large copy number variations in 26 diverse Indian populations: potential involvement in phenotypic diversity. Hum Genet. 2012, 131: 131-143. 10.1007/s00439-011-1050-5. doi:10.1007/s00439-011-1050-5

    PubMed  Google Scholar 

  88. 88.

    Gong H, Zhou H, Yu Z, Dyer J, Plowman JE, Hickford J: Identification of the ovine keratin-associated protein KAP1-2 gene (KRTAP1-2). Exp Dermatol. 2011, 20: 815-819. 10.1111/j.1600-0625.2011.01333.x. doi:10.1111/j.1600-0625.2011.01333.x

    CAS  PubMed  Google Scholar 

  89. 89.

    Gong H, Zhou H, Plowman JE, Dyer JM, Hickford JGH: Analysis of variation in the ovine ultra-high sulphur keratin-associated protein KAP5-4 gene using PCR-SSCP technique. Electrophoresis. 2010, 31: 3545-3547. 10.1002/elps.201000301. doi:10.1002/elps.201000301

    CAS  PubMed  Google Scholar 

  90. 90.

    Gong H, Zhou H, Hickford JGH: Diversity of the glycine/tyrosine-rich keratin-associated protein 6 gene (KAP6) family in sheep. Mol Biol Rep. 2011, 38: 31-35. 10.1007/s11033-010-0074-6. doi:10.1007/s11033-010-0074-6

    CAS  PubMed  Google Scholar 

  91. 91.

    Zhou H, Gong H, Yan W, Luo Y, Hickford JGH: Identification and sequence analysis of the keratin-associated protein 24‒1 (KAP24-1) gene homologue in sheep. Gene. 2012, 511: 62-65. 10.1016/j.gene.2012.08.049. doi:10.1016/j.gene.2012.08.049

    CAS  PubMed  Google Scholar 

  92. 92.

    Yu Z, Gordon SW, Nixon AJ, Bawden CS, Rogers MA, Wildermoth JE, Maqbool NJ, Pearson AJ: Expression patterns of keratin intermediate filament and keratin associated protein genes in wool follicles. Differentiation. 2009, 77: 307-316. 10.1016/j.diff.2008.10.009. doi:10.1016/j.diff.2008.10.009

    PubMed  Google Scholar 

  93. 93.

    Gong H, Zhou H, Dyer JM, Hickford JGH: Identification of the ovine KAP11-1 gene (KRTAP11-1) and genetic variation in its coding sequence. Mol Biol Rep. 2011, 38: 5429-5433. 10.1007/s11033-011-0697-2. doi:10.1007/s11033-011-0697-2

    CAS  PubMed  Google Scholar 

  94. 94.

    Parsons YM, Cooper DW, Piper LR: Evidence of linkage between high-glycine-tyrosine keratin gene loci and wool fibre diameter in a Merino half-sib family. Anim Genet. 1994, 25: 105-108. 10.1111/j.1365-2052.1994.tb00414.x.

    CAS  PubMed  Google Scholar 

  95. 95.

    McKenzie GW, Abbott J, Zhou H, Fang Q, Merrick N, Forrest RH, Sedcole JR, Hickford JG: Genetic diversity of selected genes that are potentially economically important in feral sheep of New Zealand. Genet Sel Evol. 2010, 42: 43-10.1186/1297-9686-42-43. doi:10.1186/1297-9686-42-43

    PubMed Central  PubMed  Google Scholar 

  96. 96.

    Purvis IW, Jeffery N: Genetics of fibre production in sheep and goats. Small Ruminant Res. 2007, 70: 42-47. 10.1016/j.smallrumres.2007.01.002. doi:10.1016/j.smallrumres.2007.01.002

    Google Scholar 

  97. 97.

    Henikoff S: Gene families: the taxonomy of protein paralogs and chimeras. Science. 1997, 278: 609-614. 10.1126/science.278.5338.609. doi:10.1126/science.278.5338.609

    CAS  PubMed  Google Scholar 

  98. 98.

    Shibuya K, Kudoh J, Obayashi I, Shimizu A, Sasaki T, Minoshima S, Shimizu N: Comparative genomics of the keratin-associated protein (KAP) gene clusters in human, chimpanzee, and baboon. Mamm Genome. 2004, 15: 179-192. 10.1007/s00335-003-2313-9. doi:10.1007/s00335-003-2313-9

    CAS  PubMed  Google Scholar 

  99. 99.

    Wu D-D, Irwin DM, Zhang Y-P: Correction: molecular evolution of the keratin associated protein gene family in mammals, role in the evolution of mammalian hair. BMC Evol Biol. 2009, 9: 213-10.1186/1471-2148-9-213. doi:10.1186/1471-2148-9-213

    PubMed Central  Google Scholar 

  100. 100.

    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1016/S0022-2836(05)80360-2. doi:10.1016/S0022-2836(05)80360-2

    CAS  PubMed  Google Scholar 

  101. 101.

    Tatusova TA, Madden TL: BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett. 1999, 174: 247-250. 10.1111/j.1574-6968.1999.tb13575.x.

    CAS  PubMed  Google Scholar 

  102. 102.

    Pearson WR, Wood T, Zhang Z, Miller W: Comparison of DNA sequences with protein sequences. Genomics. 1997, 46: 24-36. 10.1006/geno.1997.4995. doi:10.1006/geno.1997.4995

    CAS  PubMed  Google Scholar 

  103. 103.

    Néron B, Ménager H, Maufrais C, Joly N, Maupetit J, Letort S, Carrere S, Tuffery P, Letondal C: Mobyle: a new full web bioinformatics framework. Bioinformatics. 2009, 25: 3005-3011. 10.1093/bioinformatics/btp493. doi:10.1093/bioinformatics/btp493

    PubMed Central  PubMed  Google Scholar 

  104. 104.

    Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W : improving the sensitivity of progressive multiple ; sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.

    CAS  PubMed Central  PubMed  Google Scholar 

  105. 105.

    Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.

    CAS  PubMed  Google Scholar 

  106. 106.

    Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4: 406-425.

    CAS  PubMed  Google Scholar 

  107. 107.

    Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28: 2731-2739. 10.1093/molbev/msr121. doi:10.1093/molbev/msr121

    CAS  PubMed Central  PubMed  Google Scholar 

Download references


The authors acknowledge the Portuguese Fundação para a Ciência e a Tecnologia (FCT) for financial support to IK (SFRH/BD/48518/2008) and the projects PTDC/AAC- AMB/104983/2008 (FCOMP-01-0124-FEDER-008610), PTDC/AAC-AMB/121301/2010 (FCOMP-01-0124-FEDER-019490) and PesT-C/MAR/LA0015/2011 to AA. SJO was supported as PI by Russian Ministry of Science Mega-grant no.11.G34.31.0068. We would also like to thank Siby Philip and João Paulo Machado for useful discussions during this work. The authors thank the anonymous reviewers for constructive comments on an earlier version of this manuscript.

Author information



Corresponding author

Correspondence to Agostinho Antunes.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

IK performed all the genomics, phylogenetics, and evolutionary analyses and drafted the manuscript. EM participated in the genome mining analysis and drafting of the study. VV participated in the drafting and coordination of the study. SJOB participated in the drafting and coordination of the study. WEJ participated in the drafting and coordination of the study. AA participated in the design, genetic analyses, drafting and coordination of the study. All authors read and approved the final manuscript.

Electronic supplementary material


Additional file 1: Figure S1-S24: The phylogeny of high cysteine KRTAP genes in 22 mammalian species. Neighbor-joining method with P-distance and interiors branch test with 1,000 replications (shown on the branches) was employed to build the trees. Figures S23 and S24 shows loss of, one to one orthologous relationship between two species due to concerted evolution. The KRTAP members are labeled with species abbreviation, Gene ID and KRTAP subfamily (Additional file 2) Figure S1-21 are in order, Gorilla, Pongo, Gibbon, Mormoset, Tarsies, Mouse lemur, Bushbaby, Treeshrew, Cavia, rabbit, Cow, Pig, Alpaca, Horse, Panda, Bat, Hedgehog, Elephant, Armadillo, Sloth and Wallaby. Figure S22 (Gorilla and Gibbon) and S23 (Gorilla and Cavia) shows reduced orthology with increase in divergence time. Figure S24 shows relationship between all HGT members in 22 genomes. (PDF 4 MB)

Additional file 2: Table S2: The excel file shows the genomic coordinates of the KRTAP gene repertoires in 22 mammalian species studied. The Gene ID corresponds to the genomic location. (XLS 338 KB)

Additional file 3: Table S3: Gene pairs under significant gene conversion, as detected by GeneConv program. (TXT 49 KB)

Additional file 4: Table S4: Results of RDP3 showing unique recombination events with statistical significance P value of less than 0.01 employing Bonferroni correction. (XLSX 11 KB)

Authors’ original submitted files for images

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Khan, I., Maldonado, E., Vasconcelos, V. et al. Mammalian keratin associated proteins (KRTAPs) subgenomes: disentangling hair diversity and adaptation to terrestrial and aquatic environments. BMC Genomics 15, 779 (2014).

Download citation


  • Concerted evolution
  • Gene family
  • Keratin Associated Proteins
  • Keratin
  • Hair
  • Gene conversion
  • Recombination
  • Positive selection