Mining 3D genome structure populations identifies major factors governing the stability of regulatory communities

2016-05-31 13:36:43

Nature Communications; 31 May 2016: DOI:10.1038/ncomms11549

Chao Dai, Wenyuan Li, Harianto Tjong, Shengli Hao, Yonggang Zhou, Qingjiao Li, Lin Chen, Bing Zhu, Frank Alber, Xianghong Jasmine Zhou


Three-dimensional (3D) genome structures vary from cell to cell even in an isogenic sample. Unlike protein structures, genome structures are highly plastic, posing a significant challenge for structure-function mapping. Here we report an approach to comprehensively identify 3D chromatin clusters that each occurs frequently across a population of genome structures, either deconvoluted from ensemble-averaged Hi-C data or from a collection of single-cell Hi-C data. Applying our method to a population of genome structures (at the macrodomain resolution) of lymphoblastoid cells, we identify an atlas of stable inter-chromosomal chromatin clusters. A large number of these clusters are enriched in binding of specific regulatory factors and are therefore defined as ‘Regulatory Communities.’ We reveal two major factors, centromere clustering and transcription factor binding, which significantly stabilize such communities. Finally, we show that the regulatory communities differ substantially from cell to cell, indicating that expression variability could be impacted by genome structures.


Genome-wide proximity ligation assays, such as Hi-C1 and its variants as well as ChIA-PET5 have significantly expanded our understanding of spatial genome organization. Yet our knowledge of how genome structures are linked to functions is still limited. For example, recent studies have uncovered functional roles for local chromatin looping interactions. However, few studies have tried to decipher the functional implications of inter-chromosomal associations, which are known to play important roles in gene regulation. We can cite three examples that demonstrate the importance of inter-chromosomal interactions in gene regulation: inter-chromosomal contacts are required for co-transcription of the multi-gene complex SAMD4A, TNFAIP2 and SLC6A5; the IFN-? gene on chromosome 10 is trans-activated by regulatory regions of the TH2 cytokine locus on chromosome 11; and INF-beta genes are trans-activated from regulatory regions in different chromosomes.

A challenge arises when using Hi-C data to infer the spatial genome organization linked to inter-chromosomal functional interactions. The chromatin contacts uncovered by Hi-C describe not a single genome conformation, but the average contact frequency over many different genome conformations in a population of cells. The spatial genome organization can vary dramatically from cell to cell, even within an isogenic sample. This variability is especially strong for inter-chromosomal contacts. Therefore, a key challenge for interpreting ensemble-averaged Hi-C data is to deconvolute the observed chromatin contacts into an ensemble of subsets of interactions, where each subset are compatible to co-occur in a single cell.

To address this challenge, we have recently developed a modelling approach that constructs a population of three-dimensional (3D) genome structures derived from and fully consistent with the Hi-C data. By embedding the genome structural model in 3D space and applying additional spatial constraints (for example, all chromosomes must lie within the nuclear volume, and no two chromosome fragments can overlap), it is possible to deconvolute the ensemble-based Hi-C data into a set of plausible structural states. In particular, our approach facilitates the inference of cooperative long-range interactions, because the presence of some chromatin contacts may induce structural features that may make some additional contacts in the same structure more probable and others less likely. In other words, the spatial constraints effectively restrict the conformational freedom of the chromosomes, allowing us to approximate the unobserved true structure population. We showed previously that the structure population determined by this method can reproduce remarkably well independent experimental data and many known structural features of the genome organization that were not directly evident in the Hi-C data.

To Read Full Article, Click Here

Key Words

Biological sciences | Bioinformatics | Molecular biology