Which of the following is a likely pattern of gene expression for the mammal described above?

  • Loading metrics

Open Access

Peer-reviewed

Research Article

  • David Brawand,
  • Magali Soumillon,
  • Anamaria Necsulea,
  • Angélica Liechti,
  • Frédéric Schütz,
  • Tasman Daish,
  • Frank Grützner,
  • Henrik Kaessmann

Mechanisms and Evolutionary Patterns of Mammalian and Avian Dosage Compensation

  • Philippe Julien, 
  • David Brawand, 
  • Magali Soumillon, 
  • Anamaria Necsulea, 
  • Angélica Liechti, 
  • Frédéric Schütz, 
  • Tasman Daish, 
  • Frank Grützner, 
  • Henrik Kaessmann

x

  • Published: May 15, 2012
  • //doi.org/10.1371/journal.pbio.1001328

Figures

Abstract

As a result of sex chromosome differentiation from ancestral autosomes, male mammalian cells only contain one X chromosome. It has long been hypothesized that X-linked gene expression levels have become doubled in males to restore the original transcriptional output, and that the resulting X overexpression in females then drove the evolution of X inactivation [XCI]. However, this model has never been directly tested and patterns and mechanisms of dosage compensation across different mammals and birds generally remain little understood. Here we trace the evolution of dosage compensation using extensive transcriptome data from males and females representing all major mammalian lineages and birds. Our analyses suggest that the X has become globally upregulated in marsupials, whereas we do not detect a global upregulation of this chromosome in placental mammals. However, we find that a subset of autosomal genes interacting with X-linked genes have become downregulated in placentals upon the emergence of sex chromosomes. Thus, different driving forces may underlie the evolution of XCI and the highly efficient equilibration of X expression levels between the sexes observed for both of these lineages. In the egg-laying monotremes and birds, which have partially homologous sex chromosome systems, partial upregulation of the X [Z in birds] evolved but is largely restricted to the heterogametic sex, which provides an explanation for the partially sex-biased X [Z] expression and lack of global inactivation mechanisms in these lineages. Our findings suggest that dosage reductions imposed by sex chromosome differentiation events in amniotes were resolved in strikingly different ways.

Author Summary

Mammalian sex chromosomes [the X and Y] evolved from an ordinary pair of ancestral somatic chromosomes [the proto-sex chromosomes]. The process that led to emergence of distinct sex chromosomes involved the degeneration of the Y chromosome, leaving males with only one copy of most proto-sex chromosomal genes on their single X chromosome. It has remained unclear whether mechanisms evolved that compensate for this dosage reduction. Here we trace the evolution of sex chromosomal expression levels in all major mammalian lineages and in birds. We find that the X has become globally upregulated in response to the dosage reduction in marsupials, whereas in placental mammals, genes resident on autosomal [non-sex] chromosomes that interact with X-linked genes have instead become downregulated. These mechanisms restore ancestral gene expression balances and also presumably drove the evolution of secondary compensation mechanisms [i.e., female X-inactivation] in these mammalian lineages. In egg-laying mammals and birds, sex chromosomes have become partially upregulated specifically in the heterogametic sex, i.e., in male monotremes [which are XY] and female birds [which are WZ]. This probably explains why the evolution of inactivation mechanisms in the homogametic sexes in these lineages [XX and ZZ, respectively] was not necessary. Our findings suggest that gene dosage alterations associated with the emergence of sex chromosome systems can be compensated in various different ways.

Citation: Julien P, Brawand D, Soumillon M, Necsulea A, Liechti A, Schütz F, et al. [2012] Mechanisms and Evolutionary Patterns of Mammalian and Avian Dosage Compensation. PLoS Biol 10[5]: e1001328. //doi.org/10.1371/journal.pbio.1001328

Academic Editor: Nick H. Barton, University of Edinburgh, United Kingdom

Received: August 9, 2011; Accepted: March 30, 2012; Published: May 15, 2012

Copyright: © 2012 Julien et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This research was supported by grants from the European Research Council [Starting Independent Grant] and the Swiss National Science Foundation to HK. AN was supported by a long-term FEBS postdoctoral fellowship. FG is an ARC Australian Research Fellow. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: RNA-seq, RNA sequencing; RPKM, reads per kilobase of exon model per million mapped reads; XCI, X chromosome inactivation; XCR, X-conserved region

Introduction

In mammals and birds, sex is determined by pairs of heteromorphic sex chromosomes that differentiated from ancestral autosomes [1]. All mammals evolved sex chromosomes with male heterogamety [XY system], but different sets of ancestral autosomes evolved into sex chromosomes in therian [placental/marsupial] and monotreme mammals [Figure 1]. Thus, placental mammals [eutherians] and marsupials share the same X and Y, whereas the multiple X and Y chromosomes of the egg-laying monotremes are distinct and partially homologous to the sex chromosomes of birds [2]–[4], where females are heterogametic [ZW system].

Figure 1. Median male versus female expression levels of mammalian X-linked and avian Z-linked genes in five somatic tissues.

Top: Median male to female [M∶F] gene expression level ratios and 95% confidence intervals for five somatic tissues derived from nine mammals and one bird. M∶F ratio calculations are based on genes expressed in both sexes [RPKM>0]. Values are plotted on a log2 scale to allow for linear and symmetrical patterns [e.g., same distances for two-fold higher expression levels in males or females, respectively]. Statistically significant deviations of M∶F ratios from key reference values [orange/blue boxes]: 0.5 [log2 ratio of −1]; 1 [log2 ratio of 0]; and 2 [log2 ratio of 1], as assessed by one-sample Wilcoxon signed rank tests [Bonferroni corrected p0.75. This threshold was based on the distribution of this index shown in Figure 5B and separates the two distinct populations of genes evident in this plot.

Transcription Modules and X-Autosome Protein Interaction Analyses

In a previous analysis of the data used here [30], we identified groups of genes that show concerted shifts of gene expression levels in subsets of samples [so-called transcription modules]. We then selected transcription modules that showed significant enrichments for X-linked genes and a decreased expression in eutherians [identifiers 421, 618, and 634] or therians [identifiers 507, 521, and 563]. In these modules, we could thus identify 40 X-linked and 413 autosomal genes whose expression levels decreased at the same time in the common ancestor of therians or eutherians [i.e., soon after sex chromosome origination]. We then retrieved protein–protein interaction data for human and mouse from the version 8.3 of the STRING database [51] and identified protein interaction partners for all genes in our set of 5,997 1∶1 orthologs for which protein interaction data were available [3,758 in humans and 3,498 in mouse] [Table 1]. Together, these data allowed us to extract two sets of protein–protein interaction gene sets. The type 1 set contained all X-linked genes whose expression levels dropped in the common therian/eutherian ancestor and all autosomal genes that functionally interact with them at the protein level [24 X-linked genes and 79 autosomal interactors in humans; 19 X-linked genes and 61 autosomal interactors in mouse]. The type 2 set contained all X-linked genes whose expression levels did not drop in the common therian/eutherian ancestor and all autosomal genes that functionally interact with them at the protein level [72 X-linked genes and 391 autosomal interactors in humans; 76 X-linked genes and 315 autosomal interactors in mouse]. We then assessed the proportions of autosomal interaction partners that became downregulated in the therian/eutherian ancestor in the two types of gene sets, which revealed a significant excess of autosomal downregulation in the type 1 gene set [see Table 1 and main text for details].

Patterns of Intrachromosomal Duplications after Sex Chromosome Origination

Mammalian gene duplication data were retrieved from the Ensembl database [release 57]. Using a modification of a previous bioinformatics pipeline [2], we identified intronless retroposed gene copies. We removed these retrocopies from the Ensembl gene duplication data, because we considered them unlikely to have contributed to X dosage compensation [e.g., many retrogenes are not functional, do not preserve ancestral expression patterns, and/or do not originate from the chromosome on which their ancestral precursor genes are located]. Using Ensembl phylogenetic dating information [49], we then identified, for each branch leading to humans, all distinct paralogy groups with at least one duplication event on that branch. Next, we extracted those paralogy groups for which most of the branch-specific duplication events were intrachromosomal [i.e., >50% of the genes currently being located on the same human chromosome] and then computed, for each branch, the ratios of the number of predominantly X-linked and autosomal paralogy groups, normalized by the number of genes on the current human X chromosome and autosomes, respectively [Figure S14]. The ratios of the median protein sequence identity for gene duplicates [based on pairwise identity values extracted from the Ensembl database] on the X or autosomes in the respective paralogy groups were also calculated for each branch [Figure S14].

Old Versus Recent Genes

For all evolutionary analyses, we used the set of 5,997 1∶1 orthologous genes described above. These genes represent “old” genes that were already present in the common amniote ancestor and therefore were already present on the proto-sex chromosomes and proto-autosomes. We also performed separate analyses for the remaining genes [termed “recent” in the main text for simplification], which are thus expected to be enriched for genes that emerged more recently in amniotes through gene duplication or other origination mechanisms, although this set potentially also contains ancient paralogous gene copies for which 1∶1 orthologous relationships cannot be unambiguously determined. To specifically assess the amount of genes that originated by gene duplication since the therian sex chromosome origination on the lineage leading to humans, we extracted from the gene duplication data described in the previous Methods section genes that are part of gene families that experienced at least one duplication event since the separation of the monotreme and therian lineages [sex chromosomes are thought to have originated at some point in the common ancestor of therian mammals, i.e., after the monotreme/therian split] [2],[3]. This analysis shows that 40% of genes in the “recent” set of genes on the human X chromosome are part of families that experienced a duplication event at some point since the divergence of therians and monotremes.

Assessment of Technical Noise

Due to stochastic variation in the RNA-seq procedure, the observed read coverage for a gene may not directly correspond to the read coverage this gene should theoretically have based on its actual expression level in the sample. The extent of the effect of this stochastic variation in read coverage is expected to be negatively correlated with the actual read coverage of a gene [i.e., genes with lower read coverage are more affected by the stochastic variation inherent in the RNA-seq procedure].

To assess the technical [stochastic] variation in our data, we first performed simulation-based analyses. Specifically, we generated a set of 600 hypothetical genes with an expected actual read coverage ranging from 1 to 600 [this range corresponds to the observed range of median number of reads in our biological samples], resulting in a universe of 180,300 reads. We then performed resampling analyses where 180,300 reads were assigned to each of the 600 genes with probabilities proportional to the expected actual read coverage of each gene. For each resampling set, we computed the variation between the simulated value and the theoretical one using the following formula: [|t−s|/t]*100, where t and s represent the theoretical and simulated numbers of reads, respectively. For each gene, we computed the median variation value from the 1,000 simulated values and plotted this variation as a function of the theoretical actual number of reads [Figure S16, left]. Consistent with the expectation, this plot shows that low read coverage leads to a high impact of technical variation, whereas increasing read coverage gradually reduces this impact.

Our different biological samples have median read coverage that ranges from 28 reads to 512 reads for X/Z-linked genes and from 47 to 536 reads for autosomal genes. Our simulated data suggest that the variation expected for these medians ranges from approximately 3%–12% for X[Z]-linked genes [median of this variation: 7%] and from approximately 3%–9% for autosomal genes [median variation: 5.6%] [Figure S16, right]. Notably, the specific ranges of the variation for the eutherian data are very similar [X: 7.1%; autosome: 5.8%]. Overall, these results suggest that technical variation is overall relatively low in our assessments of median gene expression levels.

In addition to these simulation-based analyses, we also assessed the extent of technical variation by assessing differences in X[Z]∶AA ratios among technical RNA-seq data replicates [Figure S17]. This analysis shows that median X[Z]∶AA ratios are very similar and statistically indistinguishable between replicates; thus, consistent with the simulation-based analysis, this analysis further supports the notion that the technical variance in our data and its impact on the various expression level estimates is overall low.

Supporting Information

Figure S1.

Median male versus female expression levels of mammalian X-linked and avian Z-linked genes in five somatic tissues. Median male to female gene expression level ratios for expressed genes are shown for five somatic tissues derived from nine mammals and one bird. Note that values are plotted on a log2 scale to allow for linear and symmetrical patterns. Specifically, male and female expression values were compared for the therian XCR [see Figure 1 for ratios based on entire X], platypus X5, and chicken Z chromosome. Numbers of eutherian XCR genes considered: 209 [human], 193 [chimp], 205 [gorilla], 207 [orang], 212 [macaque], 212 [mouse]. Statistically significant deviations from the reference values [0.5 [log2 ratio of −1]; 1 [log2 ratio of 0]; and 2 [log2 ratio of 1]], as assessed by one-sample Wilcoxon signed rank tests [Benjamini-Hochberg corrected p

Chủ Đề