Medicago truncatula has been chosen as a model species for genomic studies. It is closely related to an important legume, alfalfa. Transporters are a large group of membrane-spanning proteins. They deliver essential nutrients, eject waste products, and assist the cell in sensing environmental conditions by forming a complex system of pumps and channels. Although studies have effectively characterized individual M. truncatula transporters in several databases, until now there has been no available systematic database that includes all transporters in M. truncatula.
The M. truncatula transporter database (MTDB) contains comprehensive information on the transporters in M. truncatula. Based on the TransportTP method, we have presented a novel prediction pipeline. A total of 3,665 putative transporters have been annotated based on International Medicago Genome Annotated Group (IMGAG) V3.5 V3 and the M. truncatula Gene Index (MTGI) V10.0 releases and assigned to 162 families according to the transporter classification system. These families were further classified into seven types according to their transport mode and energy coupling mechanism. Extensive annotations referring to each protein were generated, including basic protein function, expressed sequence tag (EST) mapping, genome locus, three-dimensional template prediction, transmembrane segment, and domain annotation. A chromosome distribution map and text-based Basic Local Alignment Search Tools were also created. In addition, we have provided a way to explore the expression of putative M. truncatula transporter genes under stress treatments.
In summary, the MTDB enables the exploration and comparative analysis of putative transporters in M. truncatula. A user-friendly web interface and regular updates make MTDB valuable to researchers in related fields. The MTDB is freely available now to all users at http://bioinformatics.cau.edu.cn/MtTransporter/.
Medicago truncatula is closely related to an important forage legume, alfalfa. Because of its advantageous characteristics such as small size, short generation time, self-fertility, and diploid genome, M. truncatula has been used as a model species in genomic studies [1, 2]. Arabidopsis thaliana is a model plant whose genome was sequenced by an international consortium and is well annotated. Very high sequence identity exists between genes from M. truncatula and their counterparts from alfalfa (98.7% at the amino acid level for isoflavone reductase and 99.1% at the amino acid level for vestitone reductase), so it serves as a genetically tractable model for alfalfa, which is tetraploid. In addition to alfalfa, M. truncatula can act as a model organism for economically important legumes such as soybeans . Second only to the grass family, the legume family is important to humans as a source of food, feed for livestock, and raw materials for industry . In a symbiotic association with rhizobia, legumes supply their own nitrogen by reducing N2 to NH3. This mutually beneficial association supplies a free and renewable source of available nitrogen for legumes and other crops . By establishing symbiosis with mycorrhizal fungi, legumes also help the plant obtain phosphorous and other nutrients from the soil .
Transporters represent a large and diverse group of membrane-spanning proteins. They deliver essential nutrients, eject waste products, and assist the cell in sensing environmental conditions by forming a complex system of pumps and channels. Differences in membrane topology, energy-coupling mechanisms, and substrate specificities are present. Numerous studies have demonstrated that transporters play indispensable roles in the fundamental cellular processes of all organisms . In addition, transporters provide pathogenic bacteria with resistance to antibiotics and provide cancer cells with resistance to chemotherapies. Systematic studies have been performed to identify and characterize the transporters in a variety of plant species, such as Arabidopsis and rice. With the assistance of databases containing known and characterized transport proteins, transporters in new species are identifiable and classified via sequence similarity. Perhaps the most comprehensive of these databases is the Transporter Classification Database (TCDB), which contains a large group of functionally characterized transporters. It also achieves the purpose of categorizing new transporters into families and subfamilies based on molecular, evolutionary, and functional properties [8, 9].
However, although studies have characterized individual M. truncatula transporters in several databases, there has been no systematic database that includes all transporters in M. truncatula. Extensive cDNA and genomic DNA sequencing of several legume species (e.g., M. truncatula, soybeans, and Lotus japonicas) have been implemented over the past few years and have enabled an interesting model system to analyze whole-genome transporters [10–13]. The genomic sequence of M. truncatula is being annotated by the International Medicago Genome Annotated Group (IMGAG), which described 47,529 genes in its version 3.5v3 of the genome sequence http://www.medicagohapmap.org/downloads_genome/Mt3.5/. Additional resources relevant to Medicago functional genomics include the Medicago genome portal at the Noble Foundation , which provides final annotation analysis results on Medicago genes. To help researchers interested in M. truncatula transport proteins, we report the development of the M. truncatula transporter database (MTDB), which contains information about M. truncatula transporters derived from a comparison to the protein sequences of TCDB and A. thaliana, the most well-studied genetic model plant. This archives 3,665 putative M. truncatula transport proteins belonging to 162 families. This represents 7.5% of all predicted proteins in Medicago and is in line with what has been found in other plant species. For example, transporter genes account for 4.6% of all Arabidopsis genes and 5% of all rice genes [16, 17]. The aim of the MTDB is to present the comprehensive transporter profiles of sequenced M. truncatula, as well as to provide comparative and phylogenetic trees to view, search, and compare the transporter data in an easy-to-navigate format.
Construction and content
Genome sequence data acquisition
Protein sequences of M. truncatula and their annotations were derived from the IMGAG. Transport protein sequences of A. thaliana and their annotations came from TransportDB . Our transporter data were downloaded from the TCDB web site in March 2011 and contained 6,068 transporters. Pfam annotations came from the Pfam database, version 24.0. Three-dimensional (3D) structure annotations were provided by the Protein Data Bank (PDB). Medicago transporter annotations based on the IMGAG V3.5 V3 were derived from the Medicago genome portal at the Noble Foundation.
Identification of putative transporters
In MTDB, we used Basic Local Alignment Search Tool (BLAST) and HMMER  searches in computational predictions to identify putative M. truncatula transporters [Figure 1]. First, we respectively used 1,278 transport protein sequences of A. thaliana from TransportDB and 6,080 transporters in TCDB to conduct a BLASTp search with 47,529 M. truncatula protein sequences provided by the IMGAG. Of the 6,080 transporters in TCDB, 248 were transport proteins of A. thaliana, of which 181 were also found in TransportDB. We set the e-value cut-off at 0.0001 and identity at > 30% when we used Perl http://www.perl.org and BioPerl  scripts to analyze the BLASTp search results. A total of 5,706 (12.0%) M. truncatula proteins were predicted by at least one procedure, of which 1,974 were identified by two procedures. We selected only the top five homologs for easily storage. The 47,529 M. truncatula proteins were then predicted from the genome sequence (IMGAG sequence release version 3.5v3) and analyzed for the presence of a potential transmembrane domain (TMD) using two algorithms: TMHMM  and HMMTOP 2.0. Of the IMGAG-annotated proteins, 17,471 (36.8%) were predicted by one or more programs to contain at least one TMD, of which 8,889 were identified by the two programs. In addition, we used the annotated sequences to conduct a HMMER search with the Pfam annotations that came from the Pfam database, version 24.0. We used Perl scripts to analyze the HMMER search result to obtain all annotated sequences whose pfamID were contained by the TCDB transporters and A. thaliana pfamID sets. In the end, 3,598 (7.5%) putative transport proteins that contained at least one TMD were annotated; they had sequence homology with proteins in TCDB and A. thaliana transporters. We also used SOSUI  web-based software to re-predict the protein transmembrane segment. Of the 3,598 putative transporters, 2,780 were predicted to contain at least one TMD (77.3%). Benedito et al. published a comprehensive analysis of M. truncatula transporters . We compared our analysis with Benedito et al.'s published results, which were based on the IMGAG V2.0 (2,582). We mapped between transporters based on IMGAG V3.5v3 and IMGAG V2. Of the 3,598 putative transporters, 2,507 were assigned to 2,047 published transporters; the overlap rate was 79.3% and the validated rate was 69.7%. In addition, all 3,598 proteins were compared with proteins of the Medicago transporter annotation obtained from the Noble Foundation based on the latest IMGAG V3.5 V3. Of the 3,598 predicted transport proteins, 2,622 (72.9%) were also found in the Medicago transporter annotation at the Noble Foundation. Furthermore, we searched the annotation of the transporters based on IMGAG V3.5v3 occurring in the bioinformatics lab at the Noble Foundation but absent in our predictions using the keywords "transporter" and "transport." Finally, an additional 67 proteins were predicted.
We also mapped Medicago EST data (M. truncatula Gene Index [MTGI] version 10.0) onto the 3,598 putative transport proteins using mutual BLASTn. We set the e-value cut-off at 10-30 when we used Perl scripts to analyze the BLASTn search results. Of the 68,848 Tentative Consensus (TC) sequences and singletons in the EST database, 6,623 encoded proteins similar to our putative transport proteins.
In total, 3,665 putative transporters were annotated and assigned to 162 families according to the transporters in the TC classification system. These families were further classified into seven types according to their transport mode and energy coupling mechanism [Table 1]. All matching information was imported to the MTDB database to facilitate web searches and displays.
Transport proteins in Medicago truncatula transporter database (MTDB) and classification according to Transporter Classification Database (TCDB) classes.
2. Electrochemical potential-driven transporters
3. Primary active transporters
4. Group translocators
5. Transmembrane electron carriers
8. Accessory factors involved in transport
9. Incompletely characterized transport systems
a MT number: The number of putative Medicago truncatula transporters.
b ARATH number: The number of transport proteins of Arabidopsis thaliana came from the TransportDB.
c TCDB number: The number of transport proteins came from the TCDB web site in March 2011.
We constructed and configured MTDB upon a typical LAMP (linux + Apache + MySQL + PHP) platform. The data set was stored in MySQL 4.1 http://www.mysql.com and a web interface was achieved using PHP scripts (PHP version 4.4; http://www.php.net) on Red Hat Linux, powered by an Apache sever. Schema of this database consists of five tables of the current version of MTDB [Additional file 1]. Table pro stores whole genome transport protein predictions and expressed sequence tag (EST) mapping data; table domain stores data related to the protein domain annotation predictions by Pfam; table tmhmm stores data related to transmembrane segment prediction by TMHMM; and table structure stores the experimentally determined 3D structures of membrane transporters. An additional table, express, stores information on the expression of putative M. truncatula transporter genes under stress treatments.
Utility and discussion
We designed a user-friendly web site interface. Users can browse or search different functions of content classes based on various choices. Using the search function, for example, users can search for one type of putative transporter or specify certain information such as gene family, transporter name, or M. truncatula ID (MtID). In addition, a batch search function is achieved, through which users can search the transporter information by inputting a set of MtIDs. A brief summary on the page can provide users with a useful platform for searching. The MTDB supports a comprehensive treeview-like navigation interface. Users can browse the TC numbering system of every type of M. truncatula transporter with a collapsible function. The results are grouped by transporter family. To make browsing more convenient and scientific, we performed phylogenetic tree analysis of each family. Protein sequences in each family were used to generate a midpoint-rooted neighbor-joining tree. The trees were created by Mega 4.0 with the default parameters. Individual members of the families were further clustered into groups based on TC numbering system and evolutionary analysis [Figure 2 and Additional file 2]. Each group contains links to individual protein pages. Each putative transport protein is presented on separate web pages where users can find detailed information such as transporter function annotations; TC classifications; transmembrane segment predictions by TMHMM; genomic locus information; EST mapping results; domain annotation predictions by Pfam; 3D template predictions; expression annotations; and protein, cDNA, and genome sequences [Figure 3]. The protein and CDS sequences in MTDB are readily available for BLAST searches. Users can submit a single peptide or nucleotide sequence in the "BLAST" section. Location distribution maps, expression annotations, and 3D templates can also be browsed quickly using the "Advanced Tools" function [Figure 4].
Comparative tools and references
A map containing gene loci located on the chromosomes was generated and visualized using GenomePixelizer , which gives users a direct view of the distribution of M. truncatula's putative transporter genes on chromosomes and is especially useful in observing tandem duplications [Additional file 3]. The sections are included in the advanced tools function: structure, transmembrane segment, and expression. The "structure" section has been added to MTDB and describes putative transporters that have sequence homology with experimentally determined 3D structures. We used the transport protein sequences of M. truncatula to conduct a Position-Specific Iterative (PSI)-BLAST search with protein sequences provided by the PDB. We set the maximum number of iterations at three and the e-value cut off at 0.001 (PSI-BLAST-based method). In addition, we used the FFAS  tools to filter the (PSI)-BLAST results. Lower (more negative) FFAS scores indicate stronger similarity. FFAS scores lower than -9.5 are expected to contain less than 3% of false positives as indicated by comprehensive benchmarks of known structures. A total of 1,950 putative transporters were represented by structures in the PDB. Links to the PDB and MTDB are also provided.
In the "transmembrane segment" section, users can submit a single protein sequence to the service at http://www.cbs.dtu.dk/services/TMHMM/, and then TMHMM outputs statistics and a list of the locations of the predicted transmembrane helices and the predicted locations of the intervening loop regions. This information can be shown graphically.
Mapping of probe sets onto transporter genes
The consensus sequence of each probe set was provided by Affymetrix . A total of 3,665 predicted transporter coding sequences were matched to probe sets using mutual BLASTn. The annotation of the best match was assigned to the probe set (best BLAST hit method). We set the e-value cut-off as 10-4 and the length of the high-scoring segment pair to be longer than 100 bp then we used Perl and BioPerl scripts to analyze the BLAST search results (Mapping method in MtED). In total, of the 3,665 putative transporters, 2,039 were represented by probe sets on the Affymetrix Medicago GeneChip. Probe sets mapping information for all identified transporters were imported into the MTDB database to facilitate web searches and displays.
Microarray expression value
We further provided a way to explore the expression of putative M. truncatula transporter genes under stress treatments. To explore the expression of M. truncatula transporter genes, we retrieved and categorized microarray expression data from Gene Expression Omnibus (GEO). We picked up two independent GEO series, GSE13921 and GSE27991. GSE13921 was provided by MtED which includes functional category analysis, some querying and maps tools, and tools for the comparison and visualization of expression profiles. We mainly used MtED's data because of its high quality and experimental continuity. MtED collects roots at 0 h, 6 h, 24 h, and 48 h after salt stress for microarray experiments. The expression of probe sets at one time point changed more or less than two-fold versus 0 h and was described as up-regulated or down-regulated, respectively. Based on the result obtained from express information analysis, at 6-h stress, 47.7% of the transporter genes (972) were up-regulated and 52.2% of the transporter genes (1,064) were down-regulated. At 24-h stress, 49.6% of the transporter genes (1,011) were up-regulated and 50.4% of the transporter genes (1,028) were down-regulated. At 48-h stress, 50.3% of the transporter genes (1,026) were up-regulated and 49.5% of the transporter genes (1,009) were down-regulated. Besides, another GEO series, GSE27991, collects expression data of M. truncatula roots treated with auxin transport inhibitors. We made pairwise comparisons within each series grown under same condition respectively and users can directly inspect gene expression values by searching any one of the MtID. Each result contains links to the experiment page, which provides users with the expression curve graphs and other annotation links.
The M. truncatula Gene Expression Atlas (MtGEA) is a comprehensive platform that provides complete transcription profiles of all major organ systems of M. truncatula. We included suitable links to MtGEA on the expression page and transporter detail page so that users can readily examine transcriptome information for their probe set of interest in the MTDB.
In the future, we will continue to incorporate new expression information. Regular update and relative analysis will provide user up-to-data transporter expression information.
MTDB was developed as a relational database for the comprehensive representation of M. truncatula transporter systems. As the M. truncatula genome is currently being annotated by an international consortium, available information on this model legume (including sequences, 3D structures, expression and pathway information) will become more comprehensive and accurate. MTDB will be routinely updated monthly with new annotation information.
In summary, we built a local database called MTDB that was constructed in the PHP scripting language as a MySQL relational database system based on a Linux server. The MTDB is the first convenient web-based index database concerning transporters in the model legume M. truncatula. It will assist searchers in related fields by providing comprehensive information on transporter gene families and members of these families. The MTDB enables the exploration and comparative analysis of putative transporters in M. truncatula. A total of 3,665 putative transport proteins have been annotated and assigned to 162 families according to the TC classification system. These families are further classified into seven types according to their transport mode and energy-coupling mechanism. Both manual management and automated searches were achieved for the identification of putative protein sequences. Extensive annotations referring to each protein were generated, including basic protein function, genome locus, sequence annotations, EST mapping results, 3D template predictions, transmembrane segments, and domain annotation. A chromosome distribution map and text-based and BLAST search tools against known sequences of M. truncatula were also created. A user-friendly web interface and regular updates make MTDB valuable to researchers in related fields. We further provided a way to explore expression of M. truncatula transporter genes under stress treatments. The MTDB is freely available now to all users at http://bioinformatics.cau.edu.cn/MtTransporter/.
We would like to thank Wenying Xu and Zhou Du for their assistance in the construction of the distribution map and for their helpful feedback on other aspects of the MTDB web site. This work was supported by grants from the Ministry of Science and Technology of China (2012CB215300) and the Ministry of Education of China (NCET-09-0735).
State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University
State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University
David G, Barker SB, Blondon Franqois, Yvette , Dattée GD, Essad Sadi, Flament Pascal, Philippe , Gallusci GG, Guy Pierre, Muel Xavier, Jacques , Tourneur JDaTH: Medicago truncatula, a model plant for studying the molecular genetics of the Rhizobium-legume symbiosis.Plant Molecular Biology Reporter 1990, 8:40–49.View Article
Cook DR: Medicago truncatula--a model in the making!Curr Opin Plant Biol 1999,2(4):301–304.PubMedView Article
Bell CJ, Dixon RA, Farmer AD, Flores R, Inman J, Gonzales RA, Harrison MJ, Paiva NL, Scott AD, Weller JW, et al.: The Medicago Genome Initiative: a model legume database.Nucleic Acids Res 2001,29(1):114–117.PubMedView Article
Graham PH, Vance CP: Legumes: importance and constraints to greater use.Plant Physiol 2003,131(3):872–877.PubMedView Article
Udvardi MK, Day DA: Metabolite Transport across Symbiotic Membranes of Legume Nodules.Annu Rev Plant Physiol Plant Mol Biol 1997, 48:493–523.PubMedView Article
Smith SE, Read DJ: Mycorrhizal Symbiosis. San Diego: Academic Press; 2008.
Saier MH Jr: A functional-phylogenetic classification system for transmembrane solute transporters.Microbiol Mol Biol Rev 2000,64(2):354–411.PubMedView Article
Benedito VA, Li H, Dai X, Wandrey M, He J, Kaundal R, Torres-Jerez I, Gomez SK, Harrison MJ, Tang Y, et al.: Genomic inventory and transcriptional analysis of Medicago truncatula transporters.Plant Physiol 2010,152(3):1716–1730.PubMedView Article
Saier MH Jr, Tran CV, Barabote RD: TCDB: the Transporter Classification Database for membrane transport protein analyses and information.Nucleic Acids Res 2006, (34 Database issue):D181–186.
Young ND, Cannon SB, Sato S, Kim D, Cook DR, Town CD, Roe BA, Tabata S: Sequencing the genespaces of Medicago truncatula and Lotus japonicus.Plant Physiol 2005,137(4):1174–1181.PubMedView Article
Young ND, Mudge J, Ellis TH: Legume genomes: more than peas in a pod.Curr Opin Plant Biol 2003,6(2):199–204.PubMedView Article
Sato S, Nakamura Y, Asamizu E, Isobe S, Tabata S: Genome sequencing and genome resources in model legumes.Plant Physiol 2007,144(2):588–593.PubMedView Article
Sato S, Nakamura Y, Kaneko T, Asamizu E, Kato T, Nakao M, Sasamoto S, Watanabe A, Ono A, Kawashima K, et al.: Genome structure of the legume, Lotus japonicus.DNA Res 2008,15(4):227–239.PubMedView Article
Bock KW, Honys D, Ward JM, Padmanaban S, Nawrocki EP, Hirschi KD, Twell D, Sze H: Integrating membrane transport with male gametophyte development and function through transcriptomics.Plant Physiol 2006,140(4):1151–1168.PubMedView Article
Amrutha RN, Sekhar PN, Varshney RK, Kishor PBK: Genome-wide analysis and identification of genes related to potassium transporter families in rice (Oryza sativa L.).Plant Science 2007,172(4):708–721.View Article
Ren Q, Kang KH, Paulsen IT: TransportDB: a relational database of cellular membrane transport systems.Nucleic Acids Res 2004, (32 Database issue):D284–288.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank.Nucleic Acids Res 2000,28(1):235–242.PubMedView Article
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Nucleic Acids Res 1997,25(17):3389–3402.PubMedView Article
Chukkapalli G, Guda C, Subramaniam S: SledgeHMMER: a web server for batch searching the Pfam database.Nucleic Acids Res 2004, (32 Web Server issue):W542–544.
Jones CE, Baumann U, Brown AL: Automated methods of predicting the function of biological sequences using GO and BLAST.BMC Bioinformatics 2005, 6:272.PubMedView Article
Li D, Su Z, Dong J, Wang T: An expression database for roots of the model legume Medicago truncatula under salt stress.BMC Genomics 2009, 10:517.PubMedView Article
Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R: NCBI GEO: mining tens of millions of expression profiles--database and tools update.Nucleic Acids Res 2007, (35 Database issue):D760–765.
Benedito VA, Torres-Jerez I, Murray JD, Andriankaja A, Allen S, Kakar K, Wandrey M, Verdier J, Zuber H, Ott T, et al.: A gene expression atlas of the model legume Medicago truncatula.Plant J 2008,55(3):504–513.PubMedView Article
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.