Regulatory regions of the genome

The human genome has 3 billion base pairs of DNA, but only a small fraction contains protein coding information. Most other regions have unknown roles (perhaps in maintaining the structure of the genome), but significant fractions are the regulatory regions directing how genetic information is carried through the stages of gene expression.

EnhancerWe are focusing on two hallmarks of the regulatory regions.

  1. They have high chromatin accessibility.
  2. They can direct transcription, both to its target promoter, and to itself.

We are using combinations of DNase hypersensitivity, transposon tagmentation assay, and nascent RNA sequencing methods to measure these features and characterize the regulatory regions, especially the mechanisms of enhancer activation.

Research focuses

  • How the genetic variation in human population impact gene expression: hierarchical linkages between genetic variation to chromatin accessibility, enhancer activity, and their target gene expression.
  • Identifying the regulatory regions of the cancer genome and exploring the mechanisms of non-coding driver mutations.


RNA in the cytoplasm

Even after transcription, RNA has many more stages of its lifespan. It is co-transcriptionally and post-transcriptionally processed for splicing out introns, and adding poly-A tails. It gets exported out of the nucleus to start its cytoplasmic life, where translation to protein takes place.

In the cytoplasm, various mechanisms are known to control translation. We are interested in novel features of RNA that affects the amount of RNA being translated and tune the level of specific gene expression.

One hypothesis is that RNA can be redistributed into subcellular structures such as processing bodies (PB) and stress granules (SG), where translation is inefficient. This redistribution can happen more extensively under stress conditions to serve as a control mechanism. We have identified hundreds of transcripts that are redistributed during endoplasmic reticulum stress, and further studying the regulation of their expression.

Outstanding questions

  • Which RNA transcripts are the targets of intracellular redistribution in cell stress?
  • Is the redistribution the cause or the consequence of translational repression?
  • What are the sequence features(cis) and RNA binding proteins(trans) regulating this?
  • Which other features of the RNA (e.g. poly-A tail length) contribute to this regulation?
  • What is the fate of the redistributed RNA: decay or recycle?


Single cell

Individual cells can be different from each other, in other words there is cellular heterogeneity. But, most of current methods analyzing gene expression or regulatory features of the genome are performed on populations of cells.

Drop-seq profiles of ferret fetal brains in collaboration with Bae and Walsh labs (Johnson et. al., Nature, in press).

Many single cell methods emerged recently. One of the most powerful single cell gene expression profiling is Drop-seq, a ‘Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets’ developed by the McCarroll Lab in Harvard Medical School. This video illustrates how Drop-seq works.

We adopted this new technology, and are applying it to understand various aspects of gene expression.

Collaborative projects

  • Human retina single cell sequencing to identify causal mutations of macular degeneration
  • Dissecting hepatocyte subpopulations in normal and pathologic liver tissues
  • Brain cortex development in model animals

New technology development

  • Haplotype phasing using single DNA molecule sequencing.
  • Single cell tagmentation assay for chromatin accessibility analysis.
  • Nascent RNA and mRNA sequencing in the same cell.