Pseudogenes are considered disabled copies of functional genes that were once active in the ancient genome and their identification has been relatively rare until the recent availability of a large number of fully sequenced and annotated genomes and the improvement in detection algorithms [17–19]. Analysis of these genomes demonstrated that pseudogenes are much more common than previously thought and that pseudogenes can represent a significant fraction of the genome [[5, 18, 19]; http://www.pseudogene.org/main.php]. As a result, the coding potential of genomes has been shown to be substantially lower than originally predicted. For example, the human genome contains 16,326 pseudogenes and Escherichia coli K-12 genome, once thought to only possess a few pseudogenes, has been shown to harbor 134 inactivated genes. Mycobacterial species are no exception. M. tuberculosis H37Rv contains 278 inactivated genes http://www.pseudogene.org/cgi-bin/db-gen.cgi?type=Prokaryote and the recently sequenced M. ulcerans genome has 727 pseudogenes (BuruList Web Server: http://genolist.pasteur.fr/BuruList/) . The case of pseudogenes in M. leprae is very dramatic with over 1100 being documented [4, 5, 9]. This represents the largest number of any bacterial genome sequenced to date. These data strongly suggest that genome down-sizing through the accumulation of pseudogenes, as well as gene loss, has resulted in the very specialized requirements for M. leprae growth.
Although the precise mechanism resulting in the formation of this large number of pseudogenes in M. leprae is unclear, several possible mechanisms have been defined. It has been suggested that the loss of dnaQ-mediated proofreading activities of the DNA polymerase III and large-scale rearrangements and deletions arising from homologous recombination events may have contributed to this accumulation of pseudogenes [4, 5]. The loss of sigma factors  and two-component systems  have also been proposed as possible mechanisms in M. leprae pseudogenization. The dynamics of this reductive process in M. leprae has recently been studied by reconstructing the gene content of the last common ancestor of M. leprae and its closest relative M. tuberculosis and comparing it with the present M. leprae genome . Data from this study suggest that the loss of ancestral genes resulted in the loss of functional genes of M. leprae's ancestor and its divergence from M. tuberculosis and that pseudogenization events appear to be recent gradual evolutionary events in M. leprae's lineage (within the last 20 million years).
Pseudogene accumulation might promote adaptive microevolution resulting in transitioning from a free-living to a mutualistic lifestyle [1, 2]; from multiple hosts to specific hosts and ultimately specific host cells. Therefore, pseudogenization of M. leprae's sigma factors  and stress response genes, resulting in limited response to environmental stress conditions , may have contributed at least in part to its adaptive evolution and to its extremely specialized niche within peripheral macrophages [24, 25] and Schwann cells of peripheral nerves in humans .
In general, pseudogenes are considered to be 'junk' DNA sequences that are in the process of being removed from the genome. However, recently we and others have demonstrated the presence of a small number of pseudogene transcripts in M. leprae [11, 12] and other bacterial species [27, 28]. In addition, others have found that transcribed pseudogenes can be functional .
In the present study, further characterization of the overall pseudogene transcriptional profile of M. leprae in the nu/nu mouse foot pad granulomatous tissue by global DNA array and RT-PCR analyses demonstrated that not only does M. leprae possess the highest number of pseudogenes/genome it also possesses the highest rate of bacterial pseudogene transcription documented to date. There was no apparent bias for transcription of pseudogenes in M. leprae based on chromosomal location or functional gene category. Although the highest percentage of transcribed pseudogenes was found in functional category V (hypothetical proteins), this finding was not surprising as this category contains the largest percentage of pseudogenes in the genome . Many pseudogenes belong to gene families that are large in close relatives such as M. tuberculosis but are simplified during the loss of redundancy that takes place after niche specialization . Results of the present study demonstrated that a large number of these degenerated ORFs, which may no longer code for their appropriate functions, were expressed in M. leprae using transcriptional machinery, metabolic resources and energy without potential benefit to this organism. These direct and indirect costs have previously been suggested to select against the expression of pseudogenes in M. leprae by the erosion of sequences involved in transcription initiation . Therefore, even though a large number of M. leprae pseudogenes are transcriptionally active, approximately 60% of M. leprae's pseudogenes are transcriptionally silent, presumably by this or similar mechanisms.
In silico analysis of transcribed pseudogenes suggested potential mechanisms for their transcription. Their positioning within gene clusters (operon-like organizations), or downstream of transcribed ORFs, along with the paucity of intrinsic terminators between functional ORFs and transcribed pseudogenes implies that several pseudogenes are transcribed via a read-through manner. These data support a previous study which demonstrated that ~74% of M. leprae ORFs lacked detectable intrinsic transcriptional terminators . An exception to this was found in the present study when the transcriptional pair ML0180c-ML0179c (pseudogene), containing a strong terminator sequence (ΔG -38.4) within the ML0180c coding region, was found to be transcribed as a single gene transcript product. The question is why is the terminator not functional? Previous work by our group has shown that terminators do not function if they are inside coding regions. There could be various reasons for this, prominently the presence of ribosomes or formation of antitermination complexes. In this case, the terminator is inside the pseudogene coding region and factor(s) which prevent termination functions inside coding regions could come into play. Sequences upstream and downstream of terminators have also been shown to be important in some cases. These could be the reason(s) for its lack of functioning. Also it must be noted that ΔG is an important, but not the sole indicator of terminator efficiency. In fact, our work has also shown that most terminators in M. leprae have a ΔG lower than this value.
The present study also demonstrated that rho (ML1132) and ndk (ML1469c), a nucleoside diphosphate kinase associated with its activity , were among the 1353 genes expressed. However, to date nothing is known about rho-dependent transcript termination in M. leprae and therefore, the significance of this for pseudogene gene expression is unknown. In addition co-transcription of genes of unrelated function has been shown in intracellular species that have undergone massive genome reduction and low selection strength such as Buchnera, where after the elimination of DNA segments that included promoter regions, two unrelated genes ended up physically linked  and were shown by microarray analysis to be co-transcribed . Thus, these imperfect regulatory mechanisms in which promoter-less ORFs or pseudogenes are unnecessarily expressed may not be uncommon in species undergoing low selection strength, such as those under episodes of genetic drift and small population sizes.
However, not all M. leprae pseudogenes appear to rely on read-through transcription as a mechanism of transcription. Putative promoters were identified in silico in the upstream region of M. leprae pseudogenes. When 10 of these were tested for promoter activity in a promoterless reporter E. coli system, all were positive. Therefore, while the selection against the expression of pseudogenes in M. leprae by the erosion of sequences involved in transcription initiation appears to be an effective transcriptional mechanism for "silencing" M. leprae pseudogenes, the presence of functional promoters contributes to pseudogene transcription in M. leprae.
Prokaryotic mRNAs generally contain within their 5'-UTRs an SD sequence that serves as a ribosome-binding site . The loss of functional SD sequences results in the lack of efficient translational capability and therefore results in a reduction or loss of protein production. Recently it has been reported that the SD sequences of M. leprae pseudogenes are highly degraded or degenerate suggesting that translation is impaired in nonfunctional open reading frames (pseudogenes) in this pathogen and that this potentially reduces the metabolic investment on faulty proteins because, although pseudogenes can persist for long time periods in the genome, they would be effectively "silenced" . The present study confirmed these results and further demonstrated that although they have lower ribosomal binding strength than ORFs, transcribed pseudogenes have higher ribosomal binding strength than non-transcribed pseudogenes. Therefore these data strongly suggest that some transcribed pseudogenes are actually translated in M. leprae. To test this hypothesis, the promoter, SD (strong ribosomal binding strength), start codon and partial coding region of the pyrR (ML0531) pseudogene was fused into the gfp gene in a promoterless reporter plasmid lacking a SD site and was transformed into E. coli. Results of this preliminary experiment suggested that the pyrR SD site initiated ribosomal binding and resulted in the translation of the pyrR-gfp fusion protein product yielding the green fluorescent phenotype. Thus, although the results of this study indicate that most pseudogenes have either no recognizable SD or weak SD sequences for binding to the anti-SD sequence of the 3' region of the 16S rRNA, some of the transcribed pseudogenes have intact ribosome-binding sequences of similar strength to the orthologs in M. tuberculosis.
In addition, the current study demonstrated that the majority of transcribed pseudogenes lack traditional prokaryotic translational start codons. It has been shown that alteration of start codons results in loss of translational efficiency [32, 33]. Even though the lack of these sequences in the majority of M. leprae pseudogene transcripts appears to be an effective mechanism for translational "silencing", to date this has not yet been experimentally confirmed.
In-frame stop codons (elementary property that distinguishes a pseudogene from a functioning gene) were present in 95% of transcribed pseudogenes, whether or not they contained start codons. Therefore, if translation of transcribed pseudogenes initiates, a truncated protein product should result from the majority of M. leprae pseudogenes. In rare instances, the protein fragment is still functional as bad codons can also be bypassed or edited at the level of mRNA by recoding mechanisms. Recoding is the reprogramming of mRNA translation by localized alterations in the standard translational rules and recoding products can play critical cellular roles . Typically three classes of recoding are known: 1) frameshift recoding; 2) bypass (hopping) recoding; and 3) codon redefinition involving site-specific recognition (usually but not limited to stop codon). Recoding is utilized in the expression of a minority of genes in probably all organisms and has been documented in M. avium, [Selenocysteine incorporation at the stop codon (UGA) to yield formate dehydrogenase http://recode.genetics.utah.edu/display.cfm#fdh_s_pro_mavi]. To date recoding has not been documented in M. leprae or its close relative M. tuberculosis. However, if recoding does occur in M. leprae, it is unlikely that transcripts would be recoded to yield full length sequences when multiple stop codons occur in a single coding sequence. It is estimated that 80% of transcribed pseudogenes contain at least 3 stop codons within their sequence and 90% of these pseudogenes have < 50% of the predicted full-length protein when compared to the M. tuberculosis homolog due to deletion mutations. Therefore, it is predicted that if translated, these sequences will result in truncated proteins.
Using the non-synonymous to synonymous substitutions analysis as a measure of potential functionality of pseudogenes, we showed that only one third of these genes had similar Ka/Ks ratios to functional genes, regardless of whether they are transcribed or not. As explained above, this is an upper limit because part of the analyzed sequence evolution corresponds to the M. tuberculosis functional orthologs and because the pseudogenization process could be recent for some genes and therefore their Ka/Ks ratios would be close to normal. Therefore, although the number of pseudogenes for which unambiguous Ka/Ks ratios could be obtained was small, and at least one of these with a low Ka/Ks ratio was translated, these data suggest that most transcribed pseudogenes are in the process of degradation. However, this is an upper estimate because of the potential short time passed after pseudogenization and because part of the substitutions correspond to the functional homolog in M tuberculosis taken as reference. Additional support for these conclusions is that even though protein expression data has demonstrated the presence of > 300 proteins in protein extracts from armadillo-derived M. leprae, no pseudogene products were identified [14, 15].