TRACER: a resource to study the regulatory architecture of the mouse genome
© Chen et al.; licensee BioMed Central Ltd. 2013
Received: 31 October 2012
Accepted: 22 March 2013
Published: 2 April 2013
Skip to main content
© Chen et al.; licensee BioMed Central Ltd. 2013
Received: 31 October 2012
Accepted: 22 March 2013
Published: 2 April 2013
Mammalian genes are regulated through the action of multiple regulatory elements, often distributed across large regions. The mechanisms that control the integration of these diverse inputs into specific gene expression patterns are still poorly understood. New approaches enabling the dissection of these mechanisms in vivo are needed.
Here, we describe TRACER (http://tracerdatabase.embl.de), a resource that centralizes information from a large on-going functional exploration of the mouse genome with different transposon-associated regulatory sensors. Hundreds of insertions have been mapped to specific genomic positions, and their corresponding regulatory potential has been documented by analysis of the expression of the reporter sensor gene in mouse embryos. The data can be easily accessed and provides information on the regulatory activities present in a large number of genomic regions, notably in gene-poor intervals that have been associated with human diseases.
TRACER data enables comparisons with the expression pattern of neighbouring genes, activity of surrounding regulatory elements or with other genomic features, revealing the underlying regulatory architecture of these loci. TRACER mouse lines can also be requested for in vivo transposition and chromosomal engineering, to analyse further regions of interest.
Recent progress in whole genome chromatin profiling has led to the identification of chromatin features that are strongly correlated with gene regulatory elements [21–26], opening ways to obtain a comprehensive catalogue of these elements, and a better annotation of the regulatory genome . Databases that document the in vivo activities of experimentally validated regulatory elements – mostly enhancers – further complement these approaches . Such datasets on regulatory activity can be compared to gene expression data in developing mouse embryos [29–35]. However one cannot reduce gene expression to a catalogue of the many potential regulatory elements present in the genome (from few hundred thousands to millions ). It is equally important to understand the interplay between the different elements present at a locus and how their different inputs are integrated and conveyed to target gene(s). Yet, compared to enhancers, other cis-regulatory elements such as silencers are much more elusive, despite their essential role in gene expression. Similarly important are the mechanisms that define the range and specificity of enhancer-promoter interactions. Indeed, changes in the relative position of genes and regulatory elements by chromosomal rearrangements and structural variations can alter gene expression with dramatic consequences [36–40]. Understanding these situations and the associated mechanisms requires approaches that complement the available catalogues of elements and provide a functional integrated view of the genome regulatory architecture.
For this purpose, we have developed an approach based on the distribution of a regulatory sensor gene throughout the mouse genome  (Figure 1B). The regulatory sensor consists of a LacZ reporter gene, which is driven by a minimal promoter that has no specific activity on its own but responds faithfully to endogenous enhancers. This regulatory sensor therefore uncovers the regulatory potential associated with a given genomic position, which results from the collective action of the different regulatory elements that act on this position. It thus reveals, in an operational manner, the gene regulatory activities within poorly characterized regions, or where annotation for activity is indirect (eg. chromatin profiling) or out of the proper genomic context (eg. transgenic assays). Importantly, the minimal promoter used does not display any obvious tissue- or enhancer-type bias, and the observed expression patterns often overlap with the ones of neighbouring genes . The basic principle of the strategy is analogous to an enhancer-trap ; however, the sensor used in our approach has minimal impact on endogenous gene expression  and therefore reveals regulatory activities without titrating them away from their natural target genes.
This regulatory sensor is carried in a Sleeping Beauty transposon, which can be distributed randomly in the mouse genome, by remobilisation in the male germline . Owing to the efficiency of this in vivo transposition system, we have recovered, identified and characterized a large number of insertions that provide a direct view of the regulatory activities associated with specific genomic regions. Furthermore, as the transposons used also carry a loxP site, the different lines can be used for in vivo chromosomal engineering, to generate mice with targeted deletions or duplications, or segmental aneuploidies [2, 42–44]. The local hopping behaviour of Sleeping Beauty makes each line a potential starting point to scan a region of interest : with our germline-specific transposase transgene, the remobilization rate ranges from 10 to 45%, depending on the starting site, and more than 15% of new insertions are within 1 Mb of the starting point. Thus, a research group with access to a limited number of cages can nonetheless set up a regional screen for its region of interest.
To provide a simple and useful access to the expression patterns and the mouse insertion strains generated with this on-going project, we have designed the Transposon- and Recombinase-Associated Chromosomal Engineering Resource (TRACER) database. This new database is freely accessible at http://tracerdatabase.embl.de/. It constitutes a substantial improvement over the previous one that was established to display the data from a limited pilot screen . The new database comprises novel features that allow users to browse and perform refined searches of insertion sites by position and/or expression patterns. The dataset is also now much larger (4-fold increase, with about 1500 insertions in July 2012), and is growing steadily. This web-based database not only provides information on regulatory activities present along the mouse genome but also gives access to a large collection of mice for engineering chromosomal rearrangements in non-genic intervals.
As well as the external user interfaces described below, the TRACER database has internal interfaces restricted to contributing members and requiring login for authentication. These internal interfaces have all the LIMS (laboratory information management system) components required for uploading data, curation of lines and various administration purposes.
The main internal interface allows lab staff to add all the text annotation, and insert sequence and image files associated with a particular TRACER line. There is also a batch upload interface for multiple insert sequences. The backend code automatically cuts the sequence down to just the insert, verifies the mutagen tag is present and the genomic sequence starts with ‘TA’. The batch sequence submission tool is automatically coupled to the UCSC BLAT service with standard parameters (http://genome.ucsc.edu/) to determine the best alignment and genomic location for each insert. When there are multiple good alignments, user intervention is possible to select the best genomic location. An input form is then populated for the aligned sequence along with any existing data for the line. A similar batch upload interface exists for the parsing of the expression image and annotation files. Internal users can also edit annotations for existing lines using a separate curation interface.
Many of the interfaces utilise a controlled vocabulary of terms to populate the drop-down menus, reducing the number of typos in the database and preserving the integrity of the data stored in TRACER. An administrative database exists to edit these controlled vocabularies.
The external interface allows users to register interest in particular lines, or - if the user’s genomic region of interest is not yet covered - to wish for such a line when it becomes available. These requests are captured in the database and matching lines are displayed for the curators so they can contact the requesting researcher. For user-defined regions of interest, new matching lines are automatically searched every week, or when triggered through the curator interface.
The internal identifier of the mouse line in the TRACER database.
The genomic position of the insertion (chr/position ; based on MGSCv37/mm9 genome assembly).
The orientation of the loxP site in the transposon. “Plus” corresponds to the following orientation: centromere – 5′-ATAACTTCGTATAGCATACATTATACGAAGTTAT- 3′ telomere. For comparison, loxP sites targeted by the International Knockout Mouse Consortium in genes transcribed from the plus strand (http://www.knockoutmouse.org/about/targeting-strategies) have the same orientation than TRACER “plus” loxP. Depending on the specific transposon, the orientation of the other features (transposon ends, reporter gene) varies: they are indicated and represented in the expanded view available by clicking on the “expand” icon.
An icon and text, indicating whether expression analysis has been performed and whether LacZ reporter expression has been detected. The developmental stage(s) for which information is available are indicated in the next column. Expression assay is “positive” if the insertion showed LacZ staining at least at one of the stages assessed.
The status of the insertion, indicating whether animals carrying the insertion are available. Insertions that were identified in F0 embryos, that couldn’t be established from the founder or were discontinued, are labelled as “not maintained”. Insertions “available” for further use or analysis fall under three categories: “alive” (line established with mice available in small numbers), “cryopreserved” (either as embryos or sperm) and “new” (usually corresponding to a new insertion, with only the founder animal). The status of an insertion is dynamic: not all “new” insertions are established, and depending on circumstances, “alive” ones may become “cryopreserved” or “not maintained”.
Transposon type: most of the available lines harbour a simple regulatory sensor with a lacZ reporter and a single loxP site, in one or the other orientation relative to the transposon ends (SB8 and SB9). New transposons with additional features have been constructed (see Figure 2), and lines containing them are being established and will be added to the resource. Detailed maps and sequences of available transposons are available on the Tracer website.
The final two columns display a checkbox to download the complete set of information available for an insertion, and an email link to indicate interest in a specific insertion. The toolbar buttons above the results table can be used to filter the search results, and to show only available lines and/or lines with expression data.
Further details on a given insertion can be seen by clicking the expand icon next to each record (Figure 4B). The first section describes the genomic context of the insertion. It lists whether the insertion is located in a gene desert (a gene-free region larger than 500 kb), intergenic (less than 500 kb-long), intronic or exonic region, specifies the orientation of the reporter gene, and the parental insertion line from which the insertion was obtained. This section also contains a schematic of the transposon construct, the genomic environment and flanking genes in a snapshot from the Ensembl genome browser  along with links to view the insertion point in Ensembl or the UCSC genome browser .
The second section shows the LacZ expression patterns obtained for the insertion, when available. Mousing over each thumbnail image show a zoomed-in, trackable high-resolution view of the image. In addition, the stage and viewpoint of the image is recorded along with annotations using the expression domain categories detailed above. One can switch from one image to another one by clicking on the corresponding thumbnail.
The final section shows details regarding how the genomic position of the insertion was determined, such as the flanking sequence(s) obtained (trimmed to the TA dinucleotide duplicated upon Sleeping Beauty insertion ), and where this sequence mapped where this sequence mapped to genome using BLAT . When available, primers that have been used to genotype embryos and mice for this specific insertion are indicated.
The left hand panel of the expanded section contains an interface that displays lines with insertion points within 5 Mb (or a user-selected range) (Figure 4B). Users can select one or more of these lines, and open a new tab displaying these flanking lines. This feature is particularly useful to compare regulatory activities across large regions, and to delineate the extent of regulatory domains.
Finally, the toolbar below the search interface allows data to be downloaded for the whole TRACER database, the search results, user selected lines or just the lines described in publications referring to the dataset. Additionally, all available images can be downloaded. Requests for higher resolution photos and other questions can be sent to email@example.com. Most LacZ stained embryos has been archived, albeit in limited numbers for each insertion, and may be made available upon request.
The introduction of a “regulatory sensor” in the genome provides a direct operational readout of the activities that can contribute to gene expression, which surround the insertion point. Similar enhancer-trap screens have widely been used in Drosophila and to some extent in zebrafish [55–58], providing information about genes and genomes, as well as a series of useful markers and tools. Their use in mice has been limited [59, 60], in part due to the low throughput of transgenesis, and technical difficulties of generating single-copy insertions. The development of robust and efficient in vivo transposition systems [2, 61–63], as shown here, or the use of lentiviral transgenesis, as recently described elsewhere , open new exciting possibilities to conduct such screens in an efficient and affordable manner.
By querying the database for a gene or a region of interest, one can identify expression patterns and regulatory activities associated with that location and its surroundings. The observed activity may indicate possible developmental or tissue-specific regulation of genes, and shed light on their physiological roles in vivo (Figure 6A). However, we wish to emphasize that the regulatory sensor sometimes reflects only a subset of the expression domains of a given gene . Although the sensor responds accurately to influences from long-range remote enhancers, it is less likely to capture the input of promoter elements that have a limited range of action: tissue-restricted expression of the sensor may therefore represent a tissue-specific modulation of an otherwise broadly expressed gene; yet, this modulation may correspond to important biological functions.
Also, the expression pattern associated with an insertion does not necessarily imply that a corresponding enhancer lies nearby, as illustrated by the shared expression of distant insertions (Figure 6A,C; other examples in ). Instead, the sensor reports the collective input at a given position of both positive and negative regulatory elements. Accordingly, comparing the expression pattern of neighbouring insertions to each other and to known enhancer activities [21, 65] can reveal important regulatory features. These include the range of action of enhancers, the boundaries of expression domains, the presence of silencers or other repressive or insulating elements that modulate enhancer activity and cannot be obtained from other types of datasets and approaches. In essence, TRACER provides an operational view of the regulatory structure of the mammalian genome, and delineates the extent of the large regulatory landscapes that subdivide the genome into functional units. It constitutes a functional counterpart to views obtained by different methods; including, for example, Genome Regulatory Blocks that are delineated by the density of conserved non-coding elements and synteny conservation [66, 67], Topological Associated Domains defined by chromosomal interaction biases [68, 69], and Enhancer-Promoter Units that are revealed by clusters of coincident promoter-enhancer chromatin signatures .
The data present in TRACER identifies genomic positions where an inserted transgene will adopt a highly specific expression profile (Figure 6B). Transgenes that drive the expression of markers to label specific cells (such as fluorescent markers) or of effector genes (for example Cre recombinase) in defined cell-types or embryonic tissues have proven very useful to dissect biological and genetic processes. “Position-effects” (the action of endogenous regulatory elements on transgenes) are usually considered as a problem for transgenic experiments because they lead to partially unpredictable outcomes. With the information displayed in TRACER, one can instead exploit position effects, and select genomic sites that will convey an expression pattern of interest. Importantly, many of these sites are located far from genes, implying that their use would have less functional impact than a gene knock-in. The sensor integrates the inputs of both enhancers and silencers that are acting at its position: consequently, the observed pattern is often more restricted than the one driven by enhancer-only constructs or displayed by the neighbouring genes . Hence, retargeting positions identified in TRACER with a transgene of interest should provide a reliable method to create new tissue- and cell-type specific transgenes. This can be done by homologous recombination in mouse ES cells, but the rapid development of Zinc-Finger or TALE Nuclease-associated targeted transgenesis may offer more efficient alternatives [46, 71, 72].
In addition to maps of genomic “regulatory landscapes”, TRACER provides access to a large and growing collection of mice with different transposon insertions (around 200 in July 2012). Only few insertions are likely to disrupt genes or key/highly conserved regulatory elements directly. Instead, these mice can be used for other purposes, and in particular for engineering aneuploidies and structural variants. Chromosomal aneuploidies are often found in patients suffering developmental malformations and/or neuropsychiatric disorders. In some cases, single gene-knockout can reproduce the phenotypes observed in human patients; however, for numerous other conditions, such as contiguous gene diseases, chromosomal duplications or rearrangements in non-coding intervals, gene-based alleles do not provide accurate models. Because Sleeping Beauty transposons frequently re-insert in the vicinity of their initial position, it is possible to use one insertion in a region of interest to generate additional local re-insertions. These insertions can be (re)combined owing to the associated loxP sites, to produce a series of rearrangements of this locus that model genomic alterations found in human patients, and help determine the causal elements or genes (Figure 6C). Such a use of the TRACER resource and GROMIT strategy can be particularly well suited for large gene clusters (eg. proto-cadherins, KRAB-zinc finger genes, olfactory receptors) or gene-deserts associated with human pathologies, complementing the gene-centric resource provided by the International Knockout Mouse Consortium. Given the growing recognition of the biological importance of genomic structural variants for human diseases, we anticipate that TRACER will be a useful resource to rapidly engineer allelic series of structural variants in mouse orthologous intervals, helping to create novel models of human genomic disorders.
Owing to the dynamic nature of transposon elements, the resource present in TRACER will expand steadily with the number of users. Each lab using this transposon technology to investigate a region of interest by “local” hopping will produce a substantial number of by-products (~ 80% of the new insertions). Even if these insertions may not be useful for the producing lab, they can be of interest for others. TRACER is designed to serve as a central “virtual” repository to share those mice. Further information, including references, detailed maps and sequences of the different transposons and transgenes in use, and protocols for mapping of new insertions are available through the pages of the TRACER website.
To facilitate exchanges, the TRACER database incorporates several features and internal interfaces for contributing groups (automated insertion mapping, annotation and administration). In particular, the “User wish list” feature offers a simple manner to readily “tag” newly generated mice of interest without a major investment or commitment of the producing labs.
The database is accessible at the web addresses:
UCSC Genome Browser: http://genome.ucsc.edu/index.html
The authors thank the members of the EMBL Laboratory Animal Resource, in particular Andrea Schulz, Silke Feller, Michaela Wesch and Klaus Schmitt; the development and maintenance of this mouse resource would not be possible without their dedication and constant support. We thank as well Manuela Borchert and Anne Hermelin for advice and design of the TRACER webpages. The TRACER database and resource is funded by EMBL. The production of some of the strains described in the database was made in the course of projects supported by the European Commission-FP7 (grant Health 223210/CISSTEM) and Human Frontier Science Program (grant RGY0081/2008-C) to FS.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.