January 16, 2025
KYOTO – An international team of researchers has taken an important step toward understanding how gene expression is controlled across the human genome. A new study has comprehensively analyzed “cis-regulatory elements” (CREs), which are the DNA sequences that regulate gene transcription. This work sheds light on how these elements contribute to cell-specific gene expression and how mutations within them may influence health and disease.
CREs, including enhancers and promoters, are essential for controlling when and where genes are turned on or off. While their importance is well established, studying their activity at a large scale has been a challenge. “The human genome contains a myriad of CREs, and mutations in these regions are thought to play a major role in human diseases and evolution,” explained Dr. Fumitaka Inoue, one of the co-first authors of the study. “However, it has been very difficult to comprehensively quantify their activity across the genome.”
To address this, the team used a cutting-edge technology called the lentivirus-based massively parallel reporter assay (lentiMPRA), which the authors had previously developed. This approach enables simultaneous analysis of thousands of CREs by tagging them with unique DNA barcodes that track their activity. Applying lentiMPRA, the researchers examined as many as 680,000 candidate CREs in three widely used cell types: hepatocytes (cells from the liver), lymphocytes (a type of white blood cell), and induced pluripotent stem cells (a type of artificial stem cell made from a normal body cell).
The study revealed several key insights. Across the three cell types, approximately 41.7% of the analyzed CREs exhibited activity. Promoters, which start gene transcription, showed a dependence on sequence orientation but were less specific to cell types. Enhancers, which boost gene transcription, were active regardless of their orientation and exhibited cell-type specificity. These findings highlight fundamental differences in how these two types of CREs function.
In the study, several machine learning models were developed to predict the regulatory activity of CREs based on large-scale experimental data. MPRALegNet, a model trained on the vast lentiMPRA dataset, was found to be the most accurate and efficient in predicting the regulatory activity of any DNA sequence. Its predictions align closely with experimental results, performing as well as experimental replicates in some cases. The model also demonstrated its ability to identify important transcription factor binding motifs—that is, short DNA sequences that determine CRE activity—thus providing insights into how specific factors drive cell-type-specific gene expression. For example, the study identified HNF4 and GATA motifs as crucial for activity in hepatocytes and lymphocytes, respectively.
By enabling the precise identification and quantification of enhancer activity, the study opens avenues for exploring the molecular mechanisms of human diseases. Future research will focus on applying this approach to study genetic polymorphisms, the variations in DNA sequence that contribute to individual differences and disease susceptibility.
“Recently, the nearly complete human genome has been sequenced, but much of its functional regions remain unknown. Our findings link DNA sequence information with its functional roles. We hope that these results will contribute to a deeper understanding of biological phenomena, including human diseases and evolution,” said Dr. Inoue.
This study also contributes a publicly accessible database of CRE activity to the ENCODE portal, providing a valuable resource for researchers worldwide. By integrating large-scale experimental data with machine learning, the work sets a foundation for future discoveries in genomics and personalized medicine. In addition, the use of tools like lentiMPRA and MPRALegNet will help to better equip researchers to unravel the complexities of gene regulation and to explore the vast, uncharted territories of the human genome.
Writing: ThinkSCIENCE, Inc. (Tokyo, Japan)
Agarwal, V*., Inoue, F*., Schubach, M., Penzar, D., Martin, B. K., Dash, P. M., Keukeleire, P., Zhang, Z., Sohota, A., Zhao, J., Georgakopoulos-Soares, I., Noble, W. S., Yardımcı, G. G., Kulakovskiy, I. V., Kircher, M., Shendure, J., & Ahituv, N. (2025). Massively parallel characterization of transcriptional regulatory elements. Nature. DOI: 10.1038/s41586-024-08430-9
*These authors contributed equally to this work