Four signature motifs define the first class of structurally related large coiled-coil proteins in plants.

Background Animal and yeast proteins containing long coiled-coil domains are involved in attaching other proteins to the large, solid-state components of the cell. One subgroup of long coiled-coil proteins are the nuclear lamins, which are involved in attaching chromatin to the nuclear envelope and have recently been implicated in inherited human diseases. In contrast to other eukaryotes, long coiled-coil proteins have been barely investigated in plants. Results We have searched the completed Arabidopsis genome and have identified a family of structurally related long coiled-coil proteins. Filament-like plant proteins (FPP) were identified by sequence similarity to a tomato cDNA that encodes a coiled-coil protein which interacts with the nuclear envelope-associated protein, MAF1. The FPP family is defined by four novel unique sequence motifs and by two clusters of long coiled-coil domains separated by a non-coiled-coil linker. All family members are expressed in a variety of Arabidopsis tissues. A homolog sharing the structural features was identified in the monocot rice, indicating conservation among angiosperms. Conclusion Except for myosins, this is the first characterization of a family of long coiled-coil proteins in plants. The tomato homolog of the FPP family binds in a yeast two-hybrid assay to a nuclear envelope-associated protein. This might suggest that FPP family members function in nuclear envelope biology. Because the full Arabidopsis genome does not appear to contain genes for lamins, it is of interest to investigate other long coiled-coil proteins, which might functionally replace lamins in the plant kingdom.


Background
Coiled-coil domains are protein oligomerization motifs, which consist of two or more alpha helices that twist around one another to form a supercoil [1]. Peptides with the capacity to form coiled coils are characterized by a heptad repeat pattern in which residues in the first and fourth position are hydrophobic, and residues in the fifth and seventh position are predominantly charged or polar. This pattern can be used by computational methods to predict coiled-coil domains in amino acid sequences [2,3].
Coiled-coil proteins can be grouped into two general classes. The first class is comprised of short coiled-coil do-mains of six or seven heptad repeats, also called leucine zippers. They are frequently found as homo-and heterodimerization domains in transcription factors (e.g. [4]). The second class is defined by long coiled-coil domains of several hundred amino acids, which are found in a variety of proteins involved in structuring cellular processes [1].
One of the three main classes of cytoskeletal proteins, the intermediate-filament proteins, represents a well-characterized group of coiled-coil proteins. A subgroup of intermediate-filament proteins are the nuclear lamins, which are involved in attaching chromatin to the nuclear envelope and have recently been implicated in inherited human diseases [5,6]. Besides structural proteins of the cytoskeleton, the motor proteins that interact with them also contain coiled-coil motifs. Myosin, the actin motor protein, has an extended coiled-coil domain necessary for the assembly of the muscle thick filaments [7]. Kinesin and dynein, two microtubule motor proteins, also contain coiled-coil domains [8,9].
The coiled-coil motif has also been identified in a variety of proteins associated with centromeres, centrosomes, the nuclear matrix, and chromatin [10][11][12][13][14]. TPR, a protein associated with the inner filaments of the nuclear pore complex is an extended coiled-coil protein [15]. The filamentous structures associated with both the outer and inner surface of the nuclear pore complex suggest that additional unknown coiled-coil proteins might be involved in its assembly. Several large coiled-coil proteins were also found associated with the cytoplasmic surface of the Golgi apparatus, and two of them have been shown to function in the docking of vesicles to the Golgi cisternae [16][17][18]. AKAPs are adapter proteins that attach protein kinase PKA pathways to cytoskeletal elements. At least one AKAP, AKAP 450 is a large coiled-coil protein attaching PKA to the centrosome [19].
Despite the multitude of cellular functions in which coiled-coil proteins participate, the general theme is the association of proteins with the solid-state components of the cell. In some cases, as in the attachment of PKA, this function appears to be crucial for the spatial regulation of signal transduction. Importantly, mutations in individual coiled-coil proteins lead to specific developmental or even behavioral phenotypes. The Drosophila mutations Mushroom body defect (a gene involved in the proliferation of neuronal precursor cells) and quick-to-court (a mutation that causes elevated sexual behavior) lie in two tissue-specifically expressed coiled-coil proteins of unknown function [20,21]. In humans and mouse models, specific point mutations in nuclear lamins are the cause for autosomal Emery-Dreifuss muscular dystrophy [22,23].
In contrast to animals and yeast, only a small number of long coiled-coil proteins have been identified from plants. MFP1 contains an extended coiled-coil domain and is located at the nuclear periphery of tobacco suspension culture cells [24,25]. The carrot protein NMCP1 is also located at the nuclear rim and has been shown to migrate to the spindle poles in dividing carrot suspension culture cells [26,27]. The cellular function of these plant proteins is not known.
Several candidates for myosins and kinesins have been identified from plants and their function as motor proteins is under investigation [28,29]. Besides these few examples, nothing is presently known about plant long coiled-coils and their potential function in anchoring and structuring of different cellular events.
In the early nineties, several groups reported lamin-like proteins in different plant species [30][31][32]. A purification protocol adapted from animal lamins and applied to pea nuclei was used to purify four proteins between 49 and 66 kD [31]. These proteins were recognized by antibodies against mammalian intermediate filament proteins, and by antibodies against a peptide derived from lamin B. By immunofluorescence and immunogold labeling, these antigens were located mostly in the internal nuclear matrix, and not predominantly at the nuclear rim. Similar results were obtained by other investigators [32]. Although these early reports were promising, no lamin ortholog has been identified molecularly from plants. In a recent search of all publicly available plant sequences, including the full Arabidopsis genome, no homologs of lamins were found ( [33] and Meier, unpublished results). Similarly, despite earlier reports of lamin-like proteins in yeast [34], the fully sequenced S. cerevisiae genome [35] also contains no lamin genes. It is therefore likely that non-animal eukaryotes have a distinct set of nuclear envelope proteins that functionally replace the lamins.
Here, we have used a tomato protein containing long coiled-coil domains to search the Arabidopsis genome for related sequences. We have identified seven novel Arabidopsis proteins with extended coiled-coil domains, which form a protein family distinct from other Arabidopsis long coiled-coil proteins. They are characterized by four novel motifs, which are highly conserved in sequence and position, and which were not found in proteins outside of this family. In addition, a rice homolog was identified, indicating conservation between dicots and monocots.

LeFPP, a novel plant coiled-coil protein
A yeast two-hybrid screen was performed to identify interaction partners of the tomato nuclear envelope-associated protein LeMAF1 [36]. A prey plasmid (pAD-LeFPP) was  identified that led to activation of the two reporter genes HIS and LacZ after retransformation with the pBD-LeMAF1 bait. We tested the specificity of this interaction by co-transforming pAD-LeFPP with two additional bait plasmids, pBD-AtMAF1 and pBD-AtRanGAP1. AtMAF1 is one of three Arabidopsis homologs of LeMAF1 (Patel and Meier, unpublished results), and has 45% amino acid identity with LeMAF1. AtRanGAP1 is one of the two Arabidopsis Ran GTPase activating proteins. Its N-terminus has 28% amino acid identity with LeMAF1 and 30% with AtMAF1 ( Fig. 1C; [33,37]). Fig. 1A shows that pAD-LeFPP leads to activation of the HIS gene in combination with pBD-LeMAF1 (sector a) and with pBD-AtMAF1 (sector b), but not with pBDAtRanGAP1 (sector c) or the binding do-main vector pBD-GAL4 alone (sector d). This demonstrates that the observed interaction is specific for MAF1like proteins. Fig. 1B shows the quantification of β-Galactosidase activity in the pBD-LeMAF1/pAD-LeFPP strain. Lanes one and two are the positive and negative control provided with the kit, respectively, lane three shows the activity of the pBD-LeMAF1/pAD-LeFPP strain and lane 4 the activity of the pBD-GAL4/pAD-LeFFP strain. Although yeast two-hybrid interactions can not be easily quantified, the very high level of β-Galactosidase expressed by the pBD-LeMAF1/pAD-LeFPP stain suggests a comparably strong interaction in the yeast two-hybrid screen.
The 1927 bp pAD-LeFPP cDNA was sequenced and was shown to contain an uninterrupted open reading frame in frame with the GAL4 activation domain. The conceptual translation of this sequence leads to a protein of 582 amino acids (aa) with a calculated molecular weight of 64.4 kDa and a pI of 4.7. The first methionine is at position 71, and does not have a plant start ATG consensus sequence [38], suggesting that the cDNA is not full length.
A BLAST search with the pAD-LeFPP open reading frame indicated that it represents a novel protein with weak similarity to filament-like proteins from animals and yeast (see below). Secondary structure prediction algorithms showed that it is organized almost entirely in alpha-helical domains and contains extensive stretches of coiled-coil domains ( Fig. 2A and Fig 3). The protein was named LeF-PP for tomato filament-like plant protein. RNA blot analysis showed a single species of mRNA of ca. 2.4 kb. It is present in tomato leaves, fruits, flowers, light-grown seedlings, and dark-grown seedlings (Fig. 2B).

A family of LeFPP-like proteins in Arabidopsis
A BLAST search was performed with the LeFPP amino acid and DNA sequence. In a protein BLAST, sequences with highest scores represented a number of uncharacterized open reading frames in the Arabidopsis genome, followed by sequences for animal filament-like proteins such as myosin, kinesin, ankyrin, etc. While the e-values for the Arabidopsis ORFs were between 2e-56 and 1e-16, the best e-value for a non-plant sequence was 0.003 for an unknown human open reading frame. The Arabidopsis "hits" represented seven unique genes, some of which had repeated entries in GenBank (see Table 1).
A translated BLAST search identified one additional "hit" in the rice BAC H0212B02 of chromosome 4 (GenBank accession number AL442007). The "hit" lay between the annotated open reading frames H0212B02.17 and H0212B02.18 in a region of DNA not predicted to have coding capacity. However, running GenScan [39] on this   Fig. 3 is derived from the "coiled-coil" analysis software in the DNASTAR sequence analysis software package, and was confirmed in all cases by the MultiCoil algorithm [2]. In addition, MultiCoil predicted that all coiled-coil regions have a high probability to form dimers and a low probability to form trimers. The seven Arabidopsis ORFs were named AtFPP1 -AtFPP7 ( Fig. 3 and Table 1). They fall into two groups according to size and extension of coiled-coil domains. AtFPP1, AtFPP2, and AtFPP3 are shorter (603 aa to 779 aa) and contain two extended coiled-coil domains connected by a short non-coiled-coil linker. AtFPP4 to AtFPP7 are longer (866 aa to 1054 aa) and contain a longer N-terminal and a shorter C-terminal coiled-coil domain, separated by a long non-coiled-coil linker domain (Fig. 3). The rice ORF (OsFPP) resembles structurally more the second group of Arabidopsis ORFs, with two shorter stretches of coiled-coil domains separated by a long linker.

Novel sequence motifs define a subfamily of plant coiledcoil proteins
Alignment of all nine amino acid sequences showed the presence of four novel sequence motifs, which are 100% conserved between the tomato and Arabidopsis proteins (Fig. 4A). Motifs II, III, and IV are also 100% conserved in OsFPP, while motif I is present, but slightly less conserved in the monocot sequence. Their location in each protein is indicated in Fig. 3. Their position with respect to the arrangement of the coiled-coil domains is strikingly conserved, despite differences in spacing between the four motifs due to the different length of the proteins and the coiled-coil domains.
Motifs I and II always correlate with the N-terminal cluster of coiled-coil domains, with motif I being located at the beginning and motif II at the end of this cluster. Motif III is always located in the linker between the N-terminal and C-terminal coiled-coil domains. Motif IV is always located at the very end of the most C-terminal coiled-coil domain. This positioning is conserved even in the rice sequence, where the C-terminal coiled-coil domain is extremely short (Fig. 3).
The amino acid identities between the eight dicot proteins are in the range of 20% to 50% (Fig. 4B), with the rice sequence showing between 13% and 26% identity with the dicot sequences. The Arabidopsis protein with the greatest similarity to LeFPP is AtFPP1 (40% identity). Table 1 shows an overview of the genomic organization of AtFPP1 through AtFPP7 and of the features of the predicted proteins. The genes are not clustered, and no two of them are arranged in tandem array. Except for chromosome 5, at least one member of this gene family is present on each chromosome. ESTs have been identified for all genes, indicating that they are expressed (Table 1). Consistent with the ubiquitous expression pattern found for LeFPP, AtFPP expression covers a variety of tissues and developmental stages including shoots, hypocotyls, roots, and siliques.
In order to identify more long coiled-coil proteins in the Arabidopsis genome, the complete list of annotated ORFs was searched with the keyword "myosin" to identify ORFs that had been annotated as myosin-like proteins. Because of the long stretches of conserved heptad repeats, BLAST searches with large coiled-coil proteins usually produce significant e-values with at least some of the large number of myosin heavy-chain sequences in GenBank, frequently leading to this annotation.
The 40 ORFs identified this way were analyzed by Multi-Coil and 37 were found to contain coiled-coil domains of at least 200 amino acids. 3 ORFs showed no coiled-coil motifs and were discarded. AtFPP2 and AtFPP5 had been annotated as "myosin-like" too and were therefore identified in this search. They were discarded to avoid duplication. None of the remaining 35 new coiled-coil proteins contained any of the conserved motifs identified in the FPP family.
The 35 new sequences were aligned with the seven AtFPP sequences in a CLUSTAL analysis. Fig. 5 shows the results represented as a phylogenetic tree. The AtFPP family clearly forms a family of sequences separate from the other coiled-coil proteins (boxed in Fig. 5). The three "short" At-FPPs, AtFPP1, AtFPP2, and AtFPP3 form a sub-family, as do the four "long" AtFPPs AtFPP4, AtFPP5, AtFPP6, and   I  II  III  IV   I  II  III  IV   I  II  III  IV   I  II III  IV   I  II  III  IV   I  II  III  IV   I  II  III  IV   I  II  III  AtFPP7 (see Fig. 3). Interestingly, the two most closely related AtFPP genes, AtFPP1 and AtFPP2 are located in comparable positions on two duplicated segments of chromosome 1, indicating that they might be derived from a recent duplication event.

Discussion
A novel class of plant coiled-coil proteins has been identified that are conserved in dicots and monocots. Seven members of the FPP protein family are present in the Ara-bidopsis genome. They are characterized by the presence of two clusters of coiled-coil domains separated by a linker domain of variable size. Four highly conserved sequence motifs were identified, which are a signature feature of this protein family and which are also conserved in the homologs LeFPP and OsFPP from tomato and rice, respectively. The tomato homolog of the FPP family binds in a yeast two-hybrid assay to a nuclear envelope-associated protein, indicating that they might function in nuclear envelope biology.   Besides the presence of extended coiled-coil domains, the FPP family members have no sequence similarity to the vertebrate lamins. They also do not show significant sequence similarity to other vertebrate nuclear envelope-as-sociated proteins such as LAP1, LAP2, MAN1, and emerin (GenBank accession numbers A55649, S55255, Q9WU40, and XP_048410, respectively). None of these proteins appears to have a convincing ortholog in the Ara-

Figure 5
Phylogenetic tree derived from a CLUSTAL analysis of the full ORF sequences of Arabidopsis coiled-coil proteins. The seven AtFPP sequences (boxed) were aligned together with 35 additional Arabidopsis coiled-coil proteins that had been annotated as "myosin-like". The length of each pair of branches measures the distance between sequences. Units indicate the number of computed residue substitution events. Dotted lines have no unit length and were included for alignment purposes. ORFs were identified by their TAIR gene names, accessible through [http://www.arabidopsis.org] .
bidopsis genome (data not shown), consistent with the hypothesis that the composition of nuclear envelope-associated proteins differs significantly between the plant and animal kingdom.
Interestingly, a recent report about the protein composition of the nuclear envelope of the highly divergent eukaryote Trypanosoma brucei has identified a coiled-coil nuclear envelope protein (TbNUP-1) in this organism that also has no similarity with lamins [40]. TbNUP-1 is a 350 kDa protein with a striking repeat structure. Its full sequence is presently not known but a partial open reading frame of 268 amino acids has been constructed [40]. The near-perfect 144 amino acids repeats contain two coiledcoil domains each. Pairwise alignments of TbNUP-1 with AtFPP1 through ATFPP7 showed no significant sequence similarities beyond the typical low degree of similarity generally observed between coiled-coil proteins (data not shown). The TbNUP-1 open reading frame was also used for a BLAST search of the full Arabidopsis genome. Only comparably weak similarities with other "myosin-like proteins" were found.
The four conserved sequence motifs identified in the FPP family appear to be unique to this group of coiled-coil proteins and are neither shared by other coiled-coil open reading frames present in the Arabidopsis genome nor by coiled-coil proteins such as lamins from other organisms. Their role in the structure or function of the FPP proteins is presently not known, but their high degree of conservation in the tomato and rice sequences imply that they are of relevance. Several functional small conserved motifs have been identified in other families of long coiled-coil proteins. The nuclear lamins contain short, conserved sequences flanking the lamin rod domain, which are phosphorylated by cdc2 kinase [41]. The 90 amino acid PACT domain, which immediately follows the coiled-coil domain of the related proteins pericentrin and AKAP450 and two uncharacterized coiled-coil open reading frames from Drosophila and Schizosaccharomyces pombe, confers centrosomal localization [19]. A conserved domain of ca. 50 amino acids was found in several Golgi-associated large coiled-coil proteins. It has been named GRIP domain [42] or Golgi localization domain (GLD) [18] and is sufficient to specify Golgi targeting in mammalian cells. Nearly all intermediate filament proteins exhibit a highly conserved amino acid motif (YRKLLEGEE) at the C-terminal end of their central alpha-helical rod domain which has been shown to be crucial for the formation of authentic tetrameric complexes and for the control of filament width [43].
No sequences with similarity to the consensus sequences of motifs II, III, and IV were found in the databases that might indicate their potential function. However, a se-quence with some similarity to motif I was identified in the hemaglutinin-esterase fusion glycoprotein from influenza virus C (GenBank accession number S07412). Alignment of this sequence (amino acids 39  The conformational change of the protein necessary for membrane fusion is driven by ionic interactions between residues of the coiled-coil domain, and Glu74, the last amino acid aligning to motif I, is involved in this process [1]. In light of their putative interaction with the nuclear envelope-associated protein MAF1, it will be interesting to investigate if FPPs are associated with membrane systems of the plant cell.

Conclusions
A family of novel long coiled-coil proteins has been identified from plants. They are characterized by two clusters of long coiled-coil domains separated by a non-coiledcoil linker and by four novel sequence motifs. Seven members of the filament-like plant protein (FPP) family are present in the Arabidopsis genome, and one homolog each has been identified from tomato and rice. Thus, this family of proteins appears to be conserved among higher plants, but we have not found convincing homologs from yeast or animals by sequence similarity searches. Tomato FPP was originally isolated in a yeast two hybrid screen with MAF1, a small plant-specific protein associated with the nuclear envelope [36]. Because the full Arabidopsis genome does not appear to contain genes for lamins, it is of particular interest to investigate which other long coiledcoil proteins might functionally replace lamins in the plant kingdom. The FPP family represents an exciting group of candidates for such proteins.

Materials and methods
Yeast two-hybrid screen Total RNA from young tomato leaves (Lycopersicon esculentum) was isolated as described by Wanner and Gruissem [44]. mRNA was purified using the mRNA purification kit (Pharmacia Biotech, Piscataway, NJ). To construct the yeast two-hybrid library the cDNA synthesis kit, the Gigapack III gold packaging extract and the HybriZAP two-hybrid predigested vector kit (all Stratagene, La Jolla, CA) were used according to the protocol of the manufacturer.
The size of the primary library was determined to be 1.5 × 10 6 plaque forming units. The complete cDNA of LeMAF1 (GenBank accession number AF118113) was cloned as an EcoRI/XhoI fragment between the EcoRI and SalI sites of the bait vector pBD-GAL4 (Statagene). Approximately one million primary transformants were screened for growth on histidine dropout plates. All steps of the library screening including the transformation, the isolation of clones, and the verification of the interactions, and the β-Galactosidase assays were performed as described by the manufacturer. pBD-AtRanGAP1 [33] and pBD-AtMAF1 were both constructed by cloning an EcoRI fragment containing the complete open reading frames into the EcoRI site of pBD-GAL4. The AtMAF1 open reading frame was isolated by PCR from Arabidopis genomic DNA using the 5' primer 5'-TCC ATG GCC GAA ACC GAA-3' and the 3' primer 5'-CTA AGT TCA CTT CGA ACT GCT C-3', which had been designed to match the Arabidopsis MAF1 homolog on the P1 clone MGG4 of chromosome 5 (Gen-Bank accession number AB008267).

RNA isolation and RNA blotting
Total RNA from tomato leaves (5-15 mm), fruits (3-8 mm in diameter), whole flowers, and light-grown and dark-grown tomato seedlings was isolated with the Trizol Reagent from Gibco BRL (Gaithersburg, MD). Total RNA (20 µg each) was separated on a formaldehyde gel, blotted to a nitrocellulose membrane, and hybridized with a radioactively labeled fragment corresponding to bp 1140 -1926 of the LeFPP cDNA, essentially as described by [45].

Database searches, secondary structure prediction, and sequence comparison
Sequences with similarity to LeFPP were identified by conducting BLAST (Basic Local Alignment Search Tool; [46]) searches for similarity to sequences contained in Gen-Bank. GenBank data were accessed through NCBI