Comparative sequence analysis of leucine-rich repeats (LRRs) within vertebrate toll-like receptors

Background Toll-like receptors (TLRs) play a central role in innate immunity. TLRs are membrane glycoproteins and contain leucine rich repeat (LRR) motif in the ectodomain. TLRs recognize and respond to molecules such as lipopolysaccharide, peptidoglycan, flagellin, and RNA from bacteria or viruses. The LRR domains in TLRs have been inferred to be responsible for molecular recognition. All LRRs include the highly conserved segment, LxxLxLxxNxL, in which "L" is Leu, Ile, Val, or Phe and "N" is Asn, Thr, Ser, or Cys and "x" is any amino acid. There are seven classes of LRRs including "typical" ("T") and "bacterial" ("S"). All known domain structures adopt an arc or horseshoe shape. Vertebrate TLRs form six major families. The repeat numbers of LRRs and their "phasing" in TLRs differ with isoforms and species; they are aligned differently in various databases. We identified and aligned LRRs in TLRs by a new method described here. Results The new method utilizes known LRR structures to recognize and align new LRR motifs in TLRs and incorporates multiple sequence alignments and secondary structure predictions. TLRs from thirty-four vertebrate were analyzed. The repeat numbers of the LRRs ranges from 16 to 28. The LRRs found in TLRs frequently consists of LxxLxLxxNxLxxLxxxxF/LxxLxx ("T") and sometimes short motifs including LxxLxLxxNxLxxLPx(x)LPxx ("S"). The TLR7 family (TLR7, TLR8, and TLR9) contain 27 LRRs. The LRRs at the N-terminal part have a super-motif of STT with about 80 residues. The super-repeat is represented by STTSTTSTT or _TTSTTSTT. The LRRs in TLRs form one or two horseshoe domains and are mostly flanked by two cysteine clusters including two or four cysteine residue. Conclusion Each of the six major TLR families is characterized by their constituent LRR motifs, their repeat numbers, and their patterns of cysteine clusters. The central parts of the TLR1 and TLR7 families and of TLR4 have more irregular or longer LRR motifs. These central parts are inferred to play a key role in the structure and/or function of their TLRs. Furthermore, the super-repeat in the TLR7 family suggests strongly that "bacterial" and "typical" LRRs evolved from a common precursor.


Background
Toll-like receptors (TLRs) play a central role in innate immunity [1][2][3]. TLRs are type I integral membrane glycoproteins consisting of leucine rich repeat (LRR) motif in the ectodomain (ECD), and cytoplamic signaling domains known as Toll IL-receptor (TIR) domains, joined by a single trans membrane helix (Figure 1). They recognize and respond to a variety of components derived from pathogenic or commensal microorganisms principally bacteria and viruses. These molecules include lipids such as lipopolysaccharide (LPS) from Gram-negative bacteria and peptidoglycan fragments from bacterial cell walls, proteins such as flagellin and nucleic acids such as singlestranded and double-stranded RNA and unmethylated CpG DNA from bacteria or viruses. The ECDs including LRRs have been inferred to recognize directly various ligands. The TLR family counts 10 members in human and 12 in mice and Takifugu rubripes. Six major families of vertebrate TLRs have been proposed in a molecular dendrogram [4].
Leucine-rich repeat (LRR)-containing domains are present in over 6000 proteins listed in PFAM, PRINTS, SMART, and InterPro data bases [5][6][7][8]. All LRR repeats can be divided into a highly conserved segment (HCS) and a variable segment (VS). The HCS part consists of an eleven residue stretch, LxxLxLxxNxL, or a tweleve residue stretch, LxxLxLxxCxxL, in which "L" is Leu, Ile, Val, or Phe, "N" is Asn, Thr, Ser, or Cys, and "C" is Cys, Ser or Asn [7,9]. Seven classes of LRRs have been proposed, characterized by different lengths and consensus sequences of the VS part of repeats [9,10]. They are "RI-like", "CC", "bacterial", "SDS22-like", "plant specific", "typical", and "TpLRR". Each subfamily of small leucine-rich repeat proteoglycan (SLRP) has LRRs from more than one of the seven classes [8,11]. The structures of twenty-two different proteins that contain LRRs are available . They include TLR3 and CD14 [48][49][50]. The LRR domains in all known structures adopt an arc shape. Most of the known LRR structures have a cap, which shields the hydrophobic core of the first LRR at the N-terminus or the last LRR at the C-terminus. In extracellular proteins or extracellular domains, these caps frequently consist of cysteine clusters including two or four cysteine residues [8,9].
The indicated repeat number of LRRs and its "phasing" (that is, what segment or residue corresponds to the beginning of a repeating unit) in individual TLRs are different among the databases (or researchers) and species. This difference reflects the irregularity of LRR motifs in TLRs. Over one hundred complete TLRs are available. Several methods of protein secondary structure predictions such as Proteus and SSPro4.0 show a correspondence of about 75% [52][53][54]. For the identification of LRRs we propose a new method, which uses the known structures of several LRRs, multiple sequence alignments and secondary structure predictions of TLRs. This new method indicates that each of the six recognized TLR families can be characterized by its LRR motifs, their repeat number and the motifs of two cysteine clusters flanking the LRRs. The actual repeat number of LRRs is generally larger than those reported in the databases. The present analysis leads to the hypothesis that all the LRRs in TLRs form one or two horseshoe domains.

A new method for the identification of LRRs within TLRs LRR known structures
All of the LRR domains in one protein form a single continuous structure and adopt an arc or horseshoe shape. On the inner, concave face there is a stack of parallel βstrands and on the outer, convex face there are a variety of secondary structures such as α-helix, 3 10 -helix, polyproline II helix, or a tandem arrangement of β-turns [8,55]. The HCS part of all the LRRs consists of LxxLxLxxNxL or LxxLxLxxCxxL,, as noted, in which "L" is Leu, Ile, Val, or Phe, "N" is Asn, Thr, Ser, or Cys, and "C" is Cys, Ser or Asn [7,9]. The short β-strands are mostly formed by three residues at positions 3 through 5 in the HCS part. In most LRR proteins the β-strands on the concave face and (mostly) helical elements on the convex face are connected by short loops or β-turns. Four leucine residues at positions 1, 4, 6 and 11 participate in the hydrophobic core in LRR arcs. Similarly, conserved hydrophobic residues in the VS parts of the seven LRR classes participate in the hydrophobic core. The side chains of asparagine at position 9 form hydrogen bonds in the loop structure [6].
Structural alignments of the known LRR structures reveal that the LRR motif is surprisingly variable ( Table 1). The lengths of LRRs range from 20 to 43 residues. Leucines at positions 1, 4, 6 and 11 of the HCS part are sometimes replaced by Met, Ala, or Cys, as seen in TLR3 [49,50], Internalin A (Inl-A) [26], and Internalin B (Inl-B) [22][23][24]. Leucines at positions 1 and 11 are also occupied by relatively hydrophilic residues such as Gly, Thr, Asn and Tyr. Figure 1 Structural organization of vertebrate TLRs. Mangenta is signal peptide sequence. Green is LRRNT (the cysteine clusters on the N-terminal side of LRRs) and LRRCT (the cysteine clusters on the C-terminal side of LRRs

Protein secondary structure prediction
The result of the protein secondary structure prediction of human TLR2 having 20 LRRs is shown in Figure 4. Both SSpro4.0 and Proteus predict that 15 of the 20 LRRs prefer β-strands at positions 3 through 5 and/or its neighboring positions in the HCS part. They include all five irregular motifs, LRR4, LRR5, LRR7, LRR9, and LRR11. The occurrence of β-strands in LRR6 is predicted only by Proteus. However, LRR6 with the HCS part of LEELEIDASDL is clearly a canonical LRR. All twenty including LRR6 can be reasonably identified as LRR motifs.

The identification of LRRs within TLRs
These analyses of the known LRR structures, the multiple sequence alignments and the secondary structure predictions of TLR2 provide strong evidence that allow us to identify LRRs over an extended range of sequences and inferred structures. Taken together four steps for the identification of LRRs in each member of TLRs were used.
Step 1. Detection of LRRs by the PFAM program Step 2. Identification of a candidate LRR that can not be recognized by PFAM.
Step 3: Evaluation of protein secondary structure predictions by Proteus and SSpro4.0.
Step 4. Determination of all LRRs in each member based on the results obtained by Steps 1-3. In Step 2, the LRR candidates are selected using the criterion that they are longer than 18 residues and that the HCS part consisting of LxxLxLxxNxL occupies at least hydrophobic residues at positions 4 and 6. The candidate includes irregular motifs that are similar to LRRs recognized by the known structures. In case there are TLRs from many species, multiple sequence alignments are also considered for identification. In Step 3, the preference of βstrand in the HCS part of the LRR candidate selected by Step 1 and Step 2 was investigated. In Step 4, when the candidate prefers β-strand by either Proteus or SSpro4.0 (at least in one species), it is identified as an LRR. In some cases such as LRR12 in TLR14, the initial LRR candidate was changed into another LRR based on the results of the secondary structure prediction. The crystal structure of human TLR3 [52,53] contains 25 LRRs. The present method confirmed this. In contrast, the PFAM and SMART programs predicted only 16-17 LRRs and the databases have reported 22 LRRs (Table 2).
There are two exceptions. In five mammalian TLR6s with 20 LRRs, LRR9, PTLLN(F/V/L)TL(N/Q)H(I/V), that contains Pro at position 1 is not predicted to have a β-strand by both prediction methods ( Figure 4). However, this pattern is seen in FSHr (Table 1). Similarly, in human and pig TLR10 with 20 LRRs LRR10, GGK(A/V)YLDHNSF, is not predicted to have a β-strand by both programs (Figure 4). However, this pattern shows remarkable similarity with the sixteenth LRR (LRR16) in TLR7 and TLR8 with 27 LRRs.

LRRs in the six major families of TLRs
The multiple sequence alignment of LRRs within mammalian TLR2 from 14 species Figure 2 The multiple sequence alignment of LRRs within mammalian TLR2 from 14      LRRs. Ten of these 22 LRRs are clearly "typical". LRR15 in mammalian TLR5 is only 19 residues ( Figure 7); the homolog in Takifugu rubripes and rainbow trout is 24 residues long. The TLR11 family contains 24-27 LRRs. Most of LRRs in mouse TLR11, TLR12, and TLR13 are "typical". The same feature is observed in Takifugu rubripes TLR21, TLR22 and TLR23. Two Japanese lamprey TLRs appear to belong to the TLR1 family.
The TLR7 family consists of TLR7, TLR8 and TLR9 and contains 27 LRRs. Cross dot plots were computed for all of TLR7, TLR8, and TLR9 from human and mouse, and green puffer TLR. More important the super-motif is about 80 residues. Superposition of 21 ((7 × 6)/2) cross dotplots for the seven proteins emphasize the super-repeat of LRRs at the N-terminal part of the LRR domain ( Figure 11) [11]. This super-motif comes from nine LRRs from LRR1 to LRR9 in TLR7 and TLR8, and from eight LRRs from LRR2 to LRR9 in TLR9 ( Figure 12). The sequence align- The multiple sequence alignment of LRRs within mammalian TLR2 from 14 species Figure 3 The multiple sequence alignment of LRRs within mammalian TLR2 from 14 species. This panel continued from Figure 2 shows the sequences from LRR11 to the C-termini.  Although the third, the sixth and the seventh LRRs is longer than the second, the fifth and the eighth LRRs, their C-terminal VS parts keep the pattern of LxxxxFxxLxx that is seen in "typical" motif. Consequently their LRRs are type T. Thus, there are three super-repeats, STTSTTSTT, in TLR7 and TLR8, and two and two-third super-repeats, _TTSTTSTT, in TLR9. Green puffer TLR forms two horseshoe domains of LRRs. The first domain is homologous to the TLR7 family and thus contains also the super-repeat of STTSTTSTT ( Figure 12). LRR15 located at the central part of the 27 LRRs consists of long amino acid sequence with 73 residues in TLR7, 64 in TLR8, and 58 in TLR9, as seen in TLR15. This long LRR motif is observed in chicken TLR15. In all the case the next LRR, LRR16, is an irregular LRR that is described by (G/ N)xLxLxxNx(I/L)xxVxxxxFxxLxx is similar to "typical" motif, although position 1 in the HCS part is not occupied by leucine.

Two cysteine clusters flanking the LRR domain
The LRRs within most of TLRs are flanked by two cysteine clusters, each of which contains two to five cysteine residues (Table 2 and Figures 5, 6, 7, 8, 9, and 10). Here the cysteine clusters on the N-and C-terminal sides of LRRs are termed LRRNT and LRRCT, respectively [58]. The Nterminal cluster usually consists of two cysteines, Cx 5-14 C, but sometimes 3, 4 or 5 cysteines. With high frequency, as noted, the last cysteine of the clusters occupies a structurally equivalent position to those of leucines in the HCS part of LRR1. The Cx 8 C motif in TLR3 and the Cx 10 C motif of TLR4 form a disulfide bond [52,59], as does the Cx 12 C motif in GPIbα [41]. The Cx 5-14 C motifs presumably form disulfide bonds. The C-terminal clusters, excepting those in three TLRs (

LRRs within human TLRs
The present analyses of LRRs within vertebrate TLRs indicate that there are at least two types of LRR motifs; "typical"; "T", LRR, LxxLxLxxNxLxxLxxxxFxxLxx and "bacterial"; "S", LRR, LxxLxLxxNxLxxLPx(x)LPxx. Vertebrate TLRs contain 16-28 LRRs (Table 2 and Figures 5, 6, 7, 8, 9, and 10). Bell et al., [60] have proposed that the ECDs of human TLRs comprised 19-25 LRRs including both "T" and "S" LRRs. Each member of human TLRs contain 1-2 times less LRRs than those identified here. Furthermore, in the TLR1 family (TLR1, TLR2, TLR6 and TLR10) the LRRs at the central parts are aligned differently to each other. Such a difference is also seen in TLR4, TLR5, and TLR7. The alignments of TLR3, TLR8 and TLR9 are nearly identical except the first LRR at the N-terminus of the LRR domain and the last LRR at its C-terminus.
The secondary structure prediction of human TLR2 by SSpro4.0 and Proteus Figure 4 The secondary structure prediction of human TLR2 by SSpro4. 0 Figure 7 Sequence alignment of LRR domains within the six families of TLRs. This panel continued in Figure 6 shows jfTLR3 (from LRR25 to CYTOP in the TLR3 family, hTLR4 and dTLR4 in the TLR4 family, hTLR5in the TLR5 family, and mtLR11 (from SIG-NAL to LRR15) in the TLR11 family.  Figure 7 shows mtTLR11 (from LRR16 to CYTOP), mTLR12, mTLR13, and tTLR21 in the TLR11 family.   Figure 9 shows hTLR8 (from LRR13 to CYTOP) and hTLR9 in the TLR7 family, and jlTLRa and cTLR15.

One or two horseshoe domains of LRRs within TLRs
The TLR7 family (TLR7, TLR8, TLR9 and green puffer TLR) have 27 LRRs and an additional 58-73 residues at the end of LRR15 (Figures 9, and 10). Such a long region is also observed in chicken TLR15 ( Figure 10). Gibbard et al., [61] have considered two horseshoe domains of LRRs for human TLR8. That is, LRR15 has been separated into an LRR motif and 40 residues of undetermined structure. Most of the known LRR structures have a cap, which shields the hydrophobic core at the N-and C-terminii of LRRs. We suggest that these 40 residues function as the cap of the horseshoe structure, an intervening of hydrophobic core of LRR with a specific feature in TLRs. Thus, it can be concluded that the LRRs in vertebrate TLRs form one or two distinct horseshoe structures. Future structure determinations should resolve the question.
The TLR1 family (TLR1, TLR2, TLR6, and TLR10) and the TLR4 family share a common feature, the central part of the LRR domain has a more irregular motif compared with those at the N-and C-terminal parts. The LRR structure in the three families of TLR1, TLR4 and TLR7 might show a structural flexibility at the central part. Alternatively, the central part would play a key role in the function.

The LRR arc of TLRs is flat?
The LRR arc structures can be characterized by three parameters-the inner radius of the arc (R), the mean rotation angle about the central axis relating one β-strand to the next ( ), and the tilt angle of the parallel β-strand direction per turn (θ t ). A 3D circle fitting method to calculate these geometrical parameters has been developed [55]. The TLR3-LRR arc yields R = 26.5-26.6Å, = 10.8-10.9° and θ t = 24.5-26.7°. The TLR3-LRR belongs to "typical" type. This R value is comparable to 22-36 Å for the LRR arcs in Slit, FSHr, nogo receptor, decorin, and GPIbα with "typical" LRRs [8,55,58]. In contrast, the θ t value is comparable to only those for Slit (-21°) and FSHr (-40°).
Also the θ t value corresponds to 19-40° for ribonuclease inhibitor and 15° for tropomodulin with "RI-like" LRRs. That is, the TLR3-LRR arc is nearly flat. This indicates that all other TLRs except for the TLR7 family and TLR15 might adopt flat LRR arc.

Super-motif of LRRs in the TLR7 family
The present analysis reveals that the TLR7 family consisting of TLR7, TLR8 and TLR9 and green puffer TLR contains the super-motif consisting of STT. Such super motifs have been observed in various LRR proteins [8,11]. One of them is the SLRP family. The SLRP family forms five distinct subfamilies. Class I consists of biglycan, decorin, and ϕ ϕ Super-repeat of LRRs in the TLR7 family of TLR7, TLR8 and TLR9 Figure 11 Super-repeat of LRRs in the TLR7 family of TLR7, TLR8 and TLR9. asporin. Class II has three subclasses: lumican plus fibromodulin (IIA), PRELP plus keratocan (IIB), and osteoadherin (IIC). Class III consists of epiphycan, osteoglycin and opticin. Class IV is more distantly related and consists of chondroadherin and nyctalopin. Class V consists of podocan. Their classes except for class IV contain the super-motif. Super-motifs, S and T, similar to those in SLRP are also present in asporin-like proteins from human and mouse, mouse fibromodulin-like proteins, biglycan-like proteins from sea lamprey, oligodendrocytemyelin and glycoprotein (OMGP), the FLRT family from human, mouse and Xenopus, and human ECM2 [8,62]. Furthermore, a preliminary analysis indicates that nephrocan, a novel member of the SLRP family [63], contains an STT motif. These observations suggest strongly that "bacterial" and "typical" LRRs evolved from a common precursor.

LRR variants in TLRs associated with diseases
A number of amino acid polymorphisms, which occur in LRRs, have been reported in TLRs. Arbour et al., [64] first identified two mutations of human TLR4, D299G and T399I, which were associated with diminished airway responsiveness to inhaled LPS. Since then, these two mutations have been studied for their association with various infectious and inflammatory diseases; results regarding the effects of these mutations have been inconclusive [65][66][67][68][69][70][71]. D299G and T399I occur in LRR11 and LRR15, respectively ( Figure 7). D299G is near the convex part, while T399I is located on the loop C-terminal to the convex part. Very recently, Ohara et al., [72] reported that one mutation, T135A, was associated with poorly-differentiated gastric adenocarcinomas. T135A in LRR5 occur at position 9 in the HCS part (Figure 7). Such a mutation has been observed in many LRR proteins such as nyctalopin, keratocan, GPIbα, GPIbβ and GPIX, which are associated with human diseases [58]. Position 9 is generally occupied by Asn or Cy and sometimes by Thr or Ser, whose side chains form hydrogen bonds in the loop structure [58]. The T135A mutation may disrupt the hydrogen bond pattern in the loop.
Mouse TLR9 plays a role in defense against systemic mouse cytomegalovirus infection. Mice with the mutation, L499P, are highly susceptible to mouse cytomegalovirus infection and shows low levels of cytokine induction and natural killer activation on viral infection [73]. L499P is located at the short loop that connects the helical structure on the convex part (in LRR17) and the β-strand on concave part (in LRR18) ( Figure 10). That is, L499P in Sequence alignment of super-repeat of LRRs within TLR7, TLR and TLR9 from human and mouse and TLR from green puffer Figure 12 Sequence alignment of super-repeat of LRRs within TLR7, TLR and TLR9 from human and mouse and TLR from green puffer. LRR18 occur at position 1 in the HCS part. The side chain of L499 is completely buried in the LRR arc. Such a mutation is also observed in trk-A and nyctalopin, which are associated with human diseases [58]. The mutation of D543A in human TLR8 abolishes the binding of CpG DNA [61]. D543A in LRR19 occur at position 1 in the VS part. Thus, D543A is located at the edge between the convex and the concave parts of the LRR arc. The Cys-to-Ala mutations in the VS part of LRR9 (C257A, C260A, C267A, and C270A) completely abolish signaling by TLR8 [61].
Hidaka et al., [74] detected one mutation, F303, in human TLR3 in one of three patients with influenza-associated encephalopathy. This was a loss-of-function mutation. F303S in LRR12 is located at position 4 in the HCS part. The side chain of F303 is completely buried in the LRR arc. Two mutations, H539E and N541A, resulted in the loss of TLR3 activation and ligand binding functions [75]. These two mutations occur in LRR21.

Conclusion
The new method of alignment proposed here rationalizes the difference in the repeat numbers of LRRs and their "phasing" within TLRs in different databases and for various species and isoforms. Moreover, the new method indicates that each of the six TLR families is characterized by their LRR motifs, their repeat numbers, and the motifs of cysteine clusters. The repeat number of LRRs is larger than those previously reported in databases. The central part in the LRR domains within the TLR1 family and TLR4 has more irregular motifs compared with the N-and C-terminal parts. Moreover, the TLR7 family contains a region with 58-73 residues in the central part of the LRR domain. The central parts are inferred to play a key role in the structure and/or function of their TLRs. The LRRs in TLRs form one or two horseshoe domains. The LRR arc of TLRs is also predicted to be nearly flat. Furthermore, the LRR supermotif in the TLR7 family suggests strongly that "bacterial" and "typical" LRRs evolved from a common precursor. The present analysis should stimulate and facilitate various experimental studies to understand the molecular mechanism of TLR-ligand interactions.