Université de Lyon, Université Lyon 1, Centre de Génétique et de Physiologie Moléculaire et Cellulaire (CGPhiMC), CNRS UMR5534, F-69622 Lyon, France

Université de Lyon, INSA-Lyon, INRIA, Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), CNRS UMR5205, F-69621 Lyon, France

Laboratory of Biological Modeling, NIDDK, National Institutes of Health, Bethesda, MD 20892, USA

INSERM, Centre Cavaillès, Ecole Normale Supérieure, F-75005 Paris, France

Abstract

Background

A number of studies have established that stochasticity in gene expression may play an important role in many biological phenomena. This therefore calls for further investigations to identify the molecular mechanisms at stake, in order to understand and manipulate cell-to-cell variability. In this work, we explored the role played by chromatin dynamics in the regulation of stochastic gene expression in higher eukaryotic cells.

Results

For this purpose, we generated isogenic chicken-cell populations expressing a fluorescent reporter integrated in one copy per clone. Although the clones differed only in the genetic locus at which the reporter was inserted, they showed markedly different fluorescence distributions, revealing different levels of stochastic gene expression. Use of chromatin-modifying agents showed that direct manipulation of chromatin dynamics had a marked effect on the extent of stochastic gene expression. To better understand the molecular mechanism involved in these phenomena, we fitted these data to a two-state model describing the opening/closing process of the chromatin. We found that the differences between clones seemed to be due mainly to the duration of the closed state, and that the agents we used mainly seem to act on the opening probability.

Conclusions

In this study, we report biological experiments combined with computational modeling, highlighting the importance of chromatin dynamics in stochastic gene expression. This work sheds a new light on the mechanisms of gene expression in higher eukaryotic cells, and argues in favor of relatively slow dynamics with long (hours to days) periods of quiet state.

Background

Although the importance of stochasticity in gene expression has been anticipated more than three decades ago

Many studies have shown that the average expression level of a gene depends strongly on its genomic location

Initially conducted in prokaryotes

In many situations, from prokaryotes to eukaryotes, simple mathematical models describing the transcriptional dynamics as a two-state process have been shown to account effectively for the stochastic expression of a gene

Recently, using a short-lived luciferase protein, Suter

In a preliminary study, our group showed, using isogenic cell populations expressing a fluorescent reporter, that modification of chromatin marks, using chromatin-modifying agents such as 5-azacytidine (5-AzaC) and trichostatin A (TSA), induced significant effects on mean fluorescence intensity (MFI) and normalized variance (NV; that is, the variance normalized by the square of the mean)

To assess the possible influence of chromatin-opening/closing dynamics on the stochasticity of gene expression, the next step was to combine biological experiments with a modeling analysis. For that purpose, we generated a series of clonal isogenic cell populations from chicken erythrocyte progenitors (6C2 cells). These populations were stably transfected with a unique copy of a reporter-gene coding for the red fluorescent protein mCherry, but the reporter was inserted at different chromosomal positions in each clone (Figure

Experimental strategy used for assessing the role of chromatin environment on stochastic gene expression

**Experimental strategy used for assessing the role of chromatin environment on stochastic gene expression**. After generation of cellular clones expressing the fluorescent reporter _{on }_{off}

Our current study supports the view that expression dynamics is strongly driven by short and infrequent transcriptional bursts, as previously described in other models, including mammalian models. However, the major advance of this work is that, whereas the duration and intensity of bursts did not show strong clone-to-clone differences, the time between bursts was found to depend strongly on genomic location and was broadly affected by drug treatments that affect chromatin. Hence, the position-dependent opening dynamics of chromatin emerges as a key determinant of the stochasticity in gene expression.

Results

We generated a series of clones stably transfected with the

**Table S1. Identification by splinkerette PCR of the mCherry genomic insertion sites for six 6C2 cellular clones**.

Click here for file

Exploration of model parameters to explain the observed stochastic gene expression for six cellular clones

**Exploration of model parameters to explain the observed stochastic gene expression for six cellular clones**. **(A) **Relationship between normalized variance (NV) and mean fluorescence intensity (MFI) for six cellular clones (C1 to C17) stably transfected with a unique copy of the fluorescent reporter **(B) **Distributions of the possible chromatin dynamics. For each clone, all 1,087 possible couples of (1/_{on}_{off}_{off}_{on}

An explanation of these observations comes from a previous preliminary study, in which we investigated whether chromatin dynamics are involved in these observed differences

Thus, for this purpose in the current study, we fitted these data to a two-state model of gene expression, and evaluated to what extent chromatin dynamics act on stochastic gene expression. Under the assumption that all parameters but those describing the dynamics of chromatin would be identical in all the clones, we performed an iterative screening of model parameters. This allowed us to find these common parameters, and to characterize the position-specific dynamics of chromatin for each individual clone (Figure

Description of the model

The choice of the model used to analyze our biological data was crucial. Two models are classically used to describe transcriptional stochasticity: 1) a Poisson model, in which the gene has, at each instant, a given chance to produce an mRNA,

Because flow cytometry quantifies protein fluorescence, the model must describe the expression process up to the protein level (including mRNA and protein production and degradation rates) and requires an additional parameter to convert protein quantity into fluorescence intensity (Figure _{on }_{off }_{on }_{off }^{-3}/min (mRNA half-life of 7 hours and 4 minutes) and ^{-4}/min (protein half-life of 65 hours and 47 minutes). The sensitivity of our results with regard to uncertainty in these experimentally determined values will be discussed later. These values are consistent with average mRNA and protein half-lives previously measured in mammalian cells (9 and 46 hours, respectively)

**Figure S1. Determination of the mCherry reporter mRNA and protein half-lives**.

Click here for file

Several methods can be used to find such a parameter set. In particular, there are various optimization methods available, such as simulated annealing. However, because the model-experiment comparisons in our study involved stochastic simulations, the objective functions that have to be minimized (that is, some distance measure between predictions and observations) are only estimated up to a certain error level. Although small, this error level makes most optimization algorithms inadequate. Indeed, these algorithms rely on estimating the gradient or Hessian of the objective function, based on a finite difference procedure (that is, evaluating small variations in the objective function resulting from small variations in its parameters). In a context where successive estimations of the objective function, even for the same parameters, may display random variations, these optimization algorithms are clearly doomed to failure. Overcoming this issue would require both running extremely long and computationally intensive simulations to minimize the error, and using coarse variation steps in the gradient-estimation procedure, which could result in numerical instabilities during the optimization.

For this reason, we decided to conduct a systematic parametric exploration, as this is a procedure that does not require local smoothness of the objective function. In addition, a single evaluation of the objective function represents a heavy computation load; for example, involving thousands of realizations of a Gillespie simulation that are followed over long periods of simulated time (see Methods). In this context, a systematic parametric exploration allows massive parallelization of the computations on a grid. The sequential evaluation imposed by optimization algorithms makes this approach prohibitive. However, because the systematic exploration still requires intensive computations, we used iterative screening of the model parameters to progressively reduce the parameter space that has to be simulated.

This iterative screening was based on three steps in which we successively used analytical derivations on the model (step 1), additional experimental data (step 2), and finally, stochastic simulation (step 3). Thanks to these successive screenings, we were able to reduce by a factor of 30 the number of parameter sets to be simulated, thus making the problem computationally tractable. In the following sections, we describe the three screening steps and the results we obtained from them.

First screening of model parameters, based on mean and variance of fluorescence intensity

Mathematical derivations by Paulsson from the two-state model _{on}_{off }_{on }_{off}

We explored wide ranges of these parameters that included all biologically relevant values _{on }_{off }_{on }_{off }_{on}_{off}_{off}_{on}

The result of this first screening still produced more than 1,000 valid parameter sets, with the values of _{on }_{off }

Second screening of model parameters, based on response to treatments with chromatin-modifying agents

In order to reduce the ranges of solutions, we conducted additional experiments in which we modified the global dynamics of chromatin in both the cells and the model. We first treated three clones with the two chromatin-modifying agents TSA and 5-AzaC. As expected, TSA treatment, which leads to chromatin decondensation

Exploration of model parameters based on treatments with chromatin-modifying agents

**Exploration of model parameters based on treatments with chromatin-modifying agents**. **(A) **Evolution of mean fluorescence intensity following kinetics of treatment with trichostatin A (TSA; solid line) and 5-azacytidine (5-AzaC; dotted line) (0 to 48 hours) for three cellular clones. **(B) **Distributions of the plausible chromatin dynamics. For each clone, all 114 possible couples of (1/_{on}_{off}_{off}_{on}**(C) **This experiment was the same as for (B), except than the transcription rate (_{off}_{off}

Based on these additional data, we could then exclude all transcription-translation parameter sets that did not account for the observed increase in expression levels even if the chromatin was considered as constantly open (see Methods). It is important to emphasize that we made the assumption that the TSA and 5-AzaC treatments affected only the chromatin-dynamics parameters. Using this strategy, we were able to reject 86% of the parameter sets, thus we kept only 114 transcription-translation parameter sets for further analyses. Figure _{on }_{off}_{off }_{on }_{off }_{off }_{off }

Reformulating the sets of (1/_{off }_{on}_{off}_{on}_{off }

Third screening of model parameters, based on full distribution of fluorescence

To select the best parameter set from the 114 remaining sets, we simulated distributions of fluorescence corresponding to the remaining parameter sets, and compared them with the fluorescence distributions measured by flow cytometry. For each parameter set, we used a stochastic simulation algorithm (SSA)

Analyzing the comparison scores (distances) from the Kolmogorov-Smirnov test of the 114 parameter sets, we were able to identify the subsets of parameters, and therefore the corresponding chromatin dynamics, that were the best fit to the distributions measured by the flow cytometer (Figure

Exploration of model parameters based on a comparison of fluorescence distributions and stochastic simulation algorithm (SSA) simulations

**Exploration of model parameters based on a comparison of fluorescence distributions and stochastic simulation algorithm (SSA) simulations**. **(A) **Distribution of parameter set scores. The lowest scores correspond to the better fits. These fits were obtained using values of **(B) **Distribution of chromatin dynamics ('mean burst size' and 'mean closed time'), obtained for the best parameter sets, after distribution comparisons for the six cellular clones. To compare with the possible chromatin dynamics presented in Figure 3B, this figure shows the chromatin dynamics obtained for the best parameter sets (black; score means between 0.07 and 0.107; see panel (A)) and the optimal parameter set for each clone (brown). **(C) **Illustration, for the six cellular clones, of the comparison between the mCherry fluorescence distributions measured by flow cytometry ('FACS'; solid line), and simulated fluorescence distributions ('Modeled'; dotted line) obtained with the best chromatin-dynamics parameter set. **(D) **One run of Gillespie SSA per clone showing the chromatin dynamics (opening and closing chromatin events are shown in black) for one virtual cell of the isogenic population distribution (see panel (C)). Consequences of chromatin open/closed dynamics on mRNA transcription and protein translation are shown in blue and in red respectively. Production (+) and degradation (-) evolutions of mRNAs and proteins are also indicated. (For illustration, Figure S2 (see Additional file

**Figure S2. Exploration of model parameters based on a comparison of fluorescence distributions and SSA simulations**. This figure is similar to the Figure

Click here for file

Figure

**Table S2. mCherry transcription rates and mRNA levels for six cellular clones of the 6C2 cell line**.

Click here for file

Chromatin-dynamics parameters proposed for the six cellular clones.

**Clone**

**1/ k_{on} ^{a}**

_{off}^{b}

C1

756.7

50.9

C3

1420.5

31.2

C5

5197.6

118.9

C7

3267.7

83.6

C11

2271.8

82.8

C17

882.7

30.0

^{a}Mean closed times (1/_{on}

^{b}Mean burst sizes (_{off}

Chromatin dynamics at genomic insertion sites and sensitivity analysis

By combining biological experiments, analytical computations and stochastic simulations, we were able to estimate all the model parameters that best fit the measured flow-cytometry distribution for the different integration sites. We now used some of these parameters (that is,

Inference of burst size and closed time from mean and normalized variance (NV) of protein levels

**Inference of burst size and closed time from mean and normalized variance (NV) of protein levels**. **(A) **At steady states, using the best transcription-translation parameter set (**(B) **Using the same data and equation system as in panel (A), the mean burst size could be calculated from the protein mean and protein normalized variance (red grid). Note that grids of both panels are linked because each value pair (protein mean and NV) corresponds to a single value pair (mean burst size and mean closed time). For both parts, clones C1, C3, C5, C7, C11, and C17 are represented as blue points on the grid, and all axes are on a logarithmic scale.

Finally, we determined how the reported values (Table

Testing and validation of the model following a dynamic evolution of the chromatin state

To test the contribution of chromatin dynamics to stochastic gene expression and the quality of the parameter set we obtained, we used our model to simulate a situation in which the chromatin dynamics were profoundly modified. For this, we used the flow-cytometry data from the TSA-treated clones C5 and C11 (5-AzaC was not tested because it produced less intense effects). During TSA treatment, the distributions of fluorescence, reflecting the expression of the

Model simulation of the perturbation of chromatin dynamics after trichostatin A (TSA) treatment

**Model simulation of the perturbation of chromatin dynamics after trichostatin A (TSA) treatment**. **(A) **Effects of TSA-treatment kinetics on the mCherry fluorescence distributions for two cellular clones, C5 (red) and C11 (blue) measured by flow cytometry. **(B) **New chromatin dynamics (mean burst size (_{off}_{on}**(C) **Simulated mCherry fluorescence distribution evolution obtained for the best new chromatin dynamics (see panel (B)). (Insets) Evolutions of the distribution-comparison scores (comparisons between measured distributions after TSA treatment and the simulated distributions). **(D) **One run of the Gillespie SSA per clone showing the dynamics of the chromatin before and during 48 hours of TSA treatment (opening and closing chromatin events are shown in black) for one virtual cell of the isogenic population distributions (see panel (C)). Consequences of chromatin open/closed dynamics on mRNA transcription and protein translation are shown in blue and in red respectively. Production (+) and degradation (-) evolutions of mRNAs and proteins are also shown. The beginning of TSA treatment is indicated by a vertical blue line. (For illustration, Figure S3 (see Additional file

**Figure S3. Model simulation of the perturbation of chromatin dynamics by TSA treatment**. This figure is similar to the Figure

Click here for file

The evolution of the comparison score between the measured and simulated data (Figure

To illustrate the consequences of the new chromatin dynamics on the transcription and translation induced by the TSA treatment, Figure

Discussion

The importance of stochasticity of gene expression in many key cellular activities was appreciated many decades ago, and is now supported by strong experimental evidence

Analyzing stochastic expression of a stably integrated fluorescent reporter in six isogenic cell populations, differing only in their reporter integration site, this study provides new evidence suggesting that the local chromatin environment (reporter insertion site) influences stochastic gene expression. Our results are in agreement with previous studies on HIV gene expression, where it was shown that the existence of different fates for infected cells correlated with the virus-integration sites

Using a two-state model, we found that the observed NVs and MFIs for each clone alone are not sufficient to identify efficiently, for a specific chromatin environment, a restricted set of parameters that best explain the observed differences between the six clones. We thus used a more complex strategy exploiting the full distribution of fluorescence as measured by flow cytometry. By mixing analytical models, complementary experiments, and stochastic simulations, we progressively identified the parameters that best fit the flow-cytometry distributions. The final set of parameters we obtained was able to reproduce accurately the experimental data for all clones except the unique bi-modal one, C7, for which the simulated distribution fit only the high modality. This bi-modal distribution observed for clone C7 could be due to: 1) specific chromatin dynamics related to the genomic insertion site of the reporter, or 2) a genetic mutation event affecting the reporter-gene integrity and resulting in two genetically distinct subpopulations. In the first case, if the transition rates between active and inactive states are extremely slow relative to transcript and protein degradations, each promoter state would be relatively stable, and this transcription regime could result in bi-modal protein expression

After selection of the best parameter sets and characterization of the chromatin dynamics for each clone, our work provided elements suggesting that the chromatin state is essentially dominated by the closed state, as previously shown

The results presented here also show how, using a two-state model and fluorescence distributions measured by flow cytometry, possible chromatin-dynamics parameters can be identified. In this study, the filtering of promoter activity by mRNA and protein dynamics allows inference of temporal information from a steady-state measurement (that is, fluorescence distributions). In this regard, the mRNA and protein half-lives are the components that define the range of timescales that can be assessed from the experiment. Using destabilized reporters _{off}_{on }_{off}

Finally, using our mathematical model, we simulated a situation in which the chromatin dynamics were directly modified by TSA. As expected, TSA treatment activated the mean reporter-gene expression

Our work suggests that the probability of chromatin entering an open state is a key determinant of gene expression in our system. A recent study in _{off }

Conclusions

In this study, we highlight the importance of the dynamics of chromatin in the control of cell-to-cell variability. Our results suggest that long periods of 'off' time (during which transcription does not occur) followed by brief period of 'open' times (with a strong transcriptional activity) can best explain the observed difference between clones in terms of stochastic gene expression. This paves the way for future studies exploring the role of chromatin dynamics at a more local scale.

Methods

Cell culture

All experiments were performed on 6C2 cells, a chicken erythroblast cell line transformed by the avian erythroblastosis virus (^{6 }cells per ml.

Generation of stably transfected clones

Stably transfected clones, expressing a fluorescent reporter, were obtained as previously described

Molecular and cellular characterization of clones

For each clone, the genomic reporter insertion sites were identified using a splinkerette PCR method as previously described

For characterization of clones and analysis of treatment effects (see below), flow-cytometry analyses were performed (FACSCanto II; Becton-Dickinson) on cells extemporaneously pelleted and resuspended in Dulbecco's phosphate-buffered saline 1× solution (Gibco-BRL). Each sample was analyzed using an acquisition of 50,000 events (gated on living cells), and the positive fluorescence threshold was fixed using non-transfected cells. Possible variability resulting from flow-cytometer calibration was taken into account by systematically analyzing flow-calibration particles (SPHERO™ Rainbow; Spherotech Inc., Lake Forest, IL, USA), as a calibration reference.

Non-transfected cells were used to measure 6C2 native autofluorescence, and the difference between the fluorescence of transfected and non-transfected ones was used as an indicator of the transgene activity (note that autofluorescence was also systematically added to the model's output to compute the distribution distance scores).

For each clone, two indicators were systematically used: MFI (mean fluorescence intensity) and NV (the variance divided by the square mean).

For a given cell, the measured fluorescence _{t }_{a}_{t }_{a }

and

Hence, with MFI and NV being the mean and normalized variance of the true fluorescence, we get:

and

Finally, to compare the theoretical distributions obtained from simulations (which only included the reporter fluorescence) with those obtained from experiments (which also included the autofluorescence), the model's output was first combined with the experimental autofluorescence. This was carried out by summing each simulation result with the value of a randomly selected cell from the autofluorescence distribution. The resulting distribution was the convolution between the theoretical and the autofluorescence distributions, and was then compared with the experimental distributions using a Kolmogorov-Smirnov test.

Determination of

To determine the ^{™ }III First-Strand Synthesis System for RT-PCR (Invitrogen Inc.) in the presence of random hexamers. Quantification of mRNA levels by real-time PCR was performed in 96-well plates using a real-time PCR system (LightCycler 480; Roche Diagnostics, Basel, Switzerland). The measurement was performed in a final volume of 10 µl of reaction mixture (containing 2.5 µl of cDNA template diluted 1 in 5), prepared using a commercial kit (Light Cycler 480 SYBR Green I Kit; Roche Diagnostics) in accordance with the manufacturer's instructions, and with the primer set at a final concentration of 0.5 µmol/l (mCher-For: CCACCTACAAGGCCAAGAA, mCher-Rev: ACTTGTACAGCTCGTCCATG). An internal standard curve was generated using serial dilutions (from 2000 to 0.02 fg/µl) of purified PCR product. The reactions were initiated by activation of

To determine the mCherry protein degradation rate, we used flow cytometry to measure the protein half-life after translation inactivation using cycloheximide treatment. C5 and C11 clones were treated in duplicate for 0, 16, and 24 hours with a final concentration of 100 µg/µl cycloheximide (C4859; Sigma-Aldrich), and for each time point, the fluorescence of the treated cells was measured by flow cytometry. The autofluorescence component was removed as explained earlier. The protein half-life was determined using exponential fit of the fluorescence mean decrease curve, similarly to the procedure used for determining the mRNA half-life.

Treatments with chromatin-modifying agents

To analyze the effect of chromatin state on the stochasticity of gene expression, clones were treated with TSA, a histone deacetylase inhibitor (P5026; Sigma-Aldrich) and 5-AzaC, an inhibitor of DNA methylation (A2385; Sigma-Aldrich). For each clone, kinetic treatment experiments were performed; clones were treated with 500 nmol/l TSA or 500 µmol/l 5-AzaC at five time points (0, 8, 24, 32, and 48 hours). For each time point, 1 × 10^{6 }cells (for 0, 8, and 24 hours) or 5 × 10^{5 }cells (for 32 and 48 hours) were treated with the relevant drug and characterized by flow cytometry.

Model description

The two-state model of gene expression represents the chromatin activity as an 'on-off' process specified through the transition rates _{on }_{off }

where_{on }_{off }_{T }

This model can be simulated with the SSA (see below) to ascertain the behavior of single cells and eventually to compute the fluorescence distributions. It can also be analytically derived to compute the MFI and NV of large cell populations at steady state.

Analytical derivation of the model

Paulsson proposed an analytic expression of the mean quantity and NV of protein in the two-state model, as a function of chromatin-dynamics parameters and transcription-translation parameters

This equation can be used to express _{on }_{off }

Parametric exploration of the analytical model

Because the clonal populations differed only in their insertion points (that is, their chromatin-dynamics parameters), equation 3 enabled us to find the clone-specific parameters from MFI and NV (measured by flow cytometry) and the transcription-translation parameters

Exploring all values of _{on}_{off}_{on}

Comparison between the analytical model and the trichostatin A-treated clones

Equation 1 enabled us to compute the mean mRNA number (

Then, assuming that at

The exact values of

Note that this equation represents an extreme situation, not the exact TSA influence on chromatin.

Introducing equation 6 into the dynamics of equation 5, we were able to compute, for a given transcription-translation parameter set, the maximum rate of protein concentration increase, and thus the maximum increase of reporter fluorescence. For each parameter set, we compared the predicted fluorescence increase under the extreme condition of a fully open chromatin. We then rejected all parameter sets for which the protein number did not increase sufficiently rapidly to account for the fluorescence increase measured experimentally during TSA treatment.

Simulation of the model

The model can be simulated using an SSA, which is an exact continuous-time algorithm that enables simulation of chemical-reaction systems

Simulation of trichostatin A treatment in the model

Using the best parameter set, we simulated 50,000 cells of the two TSA-treated clones C5 and C11 for 30,000 minutes. The chromatin-dynamics parameters were then modified to account for the TSA treatment, and the two clones were simulated for a further 1,152 minutes (48 hours). For each clones, the simulated distributions were computed after 8, 24, 32 and 48 hours, and compared with the experimental distributions using a Kolmogorov-Smirnov test. The best chromatin-dynamics parameters (

(from equation 1), we can use this analytical value to simplify the parametric exploration.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

JV, GK, AC, GB, and OG conceived and designed the research. JV performed all the biological experiments. GK performed all the computational analyses and model simulations. AC provided assistance for the modeling analyses. EV and VM provided technical assistance for the biological experiments. CMP and JJK advised on the design and interpretation of the experiments. JV, GK, AC, GB, and OG wrote the paper. OG and GB co-supervised the project. All authors have read and approved the final manuscript.

Acknowledgements

We thank François Chatelain, Alexandra Fuchs, and Manuel Théry for helpful discussions and support during the early stages of the project. We are grateful to Denis Ressnikoff of the platform Centre Commun de Quantimétrie de Lyon (CCQ) for flow-cytometry cell-sorting assistance. We thank the Centre de Calcul de l'Institut National de Physique Nucléaire et de Physique des Particules de Lyon (CC-IN2P3), and especially Pascal Calvat, for their computing resources. We also thank the interns who worked on this project: Mathieu Gineste, Yoann Ménière, Charles Rocabert. and Balthazar Rouberol. We thank the Andras Paldi group from the Généthon for the constructive and useful discussions on chromatin and stochasticity of gene expression.

This work was supported by funding from the Institut Rhônalpin des Systèmes Complexes (IXXI) and from the Réseau National des Systèmes Complexes (RNSC). Part of the project was supported by an ANR grant (ANR 2011 BSV6 014 01). JV is supported by a CNRS post-doctoral grant and GK is a PhD Fellow from the Region Rhône Alpes and INRIA.