Within Emerging Discipline Of Computational Biology, Yufeng Shen Seeks Discovery Of Novel Genetic Variants That Cause Disease
Yufeng Shen found his passion for science in childhood, but he developed a fascination for both math and physics as his education progressed. In an earlier generation, he would have needed to choose between divergent paths. Instead, he chased his calling within an important emerging discipline.
Shen, who was awarded tenure and promoted to the rank of Associate Professor in the Columbia Department of Biomedical Informatics (DBMI) and Systems Biology this past summer, is among the earliest generation trained in computational biology. Utilizing new methods, he answers long-standing questions that impact health. Specifically, his research has focused on discovering novel genetic variants that cause human diseases.
His current work focuses on developing new computational methods to interpret genome data, identifying genetic causes of human diseases by integrating multiple types of genomic data, and modeling of immune cell populations. That research has led to important findings, including his work on the Deep Genetic Connection between Cancer and Developmental Disorders, published in Human Mutation.
Using innovative sequencing techniques from published studies of cancer and developmental disorders, Shen and his students identified a significant number of genes implicated in both diseases.
“This project allows us to use the larger cancer data to inform analysis in genetic variations of developmental disorders, and to find new risk genes and new risk variance,” Shen said. “It also provides a new perspective on how to optimize care for kids with developmental disorders. There is probably two to three times more risk of developing cancer for kids with developmental disorders than otherwise healthy kids.”
Shen continues this research by sequencing and analyzing samples in collaboration with the NIH Gabriella Miller Kids First Pediatric Research Program, which is developing a large-scale data resource to help researchers uncover new insights into the biology of childhood cancer and structural birth defects.
Most human genes won’t suffer obvious negative impacts by the disruption of one copy (as there are two copies of each gene, one from each parent). However, about one quarter can lead to developmental disorders when one copy is disrupted. Those genes are considered haploinsufficient and critically important to genetic understanding of human diseases and genomic medicine. But they are largely unidentified.
Shen developed a computational model (Episcore) that used epigenomic data to predict other potential haploinsufficient genes. This method is currently being used in genetic studies of birth defects and autism as well.
“Genomic sequencing data and epigenomic data have typically been distinct research fields,” Shen said. “The integration of those fields is very promising, because we are getting much more data.”
Shen has worked with genomes since earning his Ph.D. in computational biology from the Baylor College of Medicine. He produced the first draft genome of a sea urchin, a model organism useful in developmental biology studies. Shen’s transition to human genetics occurred when he worked on the analysis of Dr. James Watson’s genome, the first human genome sequenced using next-generation technologies.
His interest in connecting genotype with phenotype, and how to medically interpret the human genome, was piqued at Baylor. He brought that fascination to Columbia, where he first joined as a postdoc supervised by Dr. Itsik Pe’er, an associate professor in the Department of Computer Science. Pe’er proved to a mentor to Shen in several areas, ranging from his state-of-the-art knowledge in human genetics and computer science to his rigorous approach to research, writing, and presentations.
Respect came from both sides of the collaboration.
“What impressed me the most about him as a scientist then and now is that he truly embraces interdisciplinarity and embodies the ideal conduct of a cross-disciplinary investigator,” Pe’er said. “His ability to delve into the most nuanced mathematical equations on the one end, then seamlessly shift focus to the biochemical details of a laboratory protocol or to the clinical diagnosis of a phenotype, allows him to build off this broad basis as he zooms out and reaches high-level conclusions.”
Shen joined the DBMI and Systems Biology departments as an assistant professor in 2011. Both Columbia and NewYork-Presbyterian Hospital have aided his work.
“There is a big component of clinical informatics that solves real problems at the hospital,” Shen said. “There are very impressive people working on hard-core computational science problems. Both are very helpful to my research. The aspect of solving real problems in medicine, as well as the clinics, is the root of my research. It keeps my research grounded and always keeps my sight on what’s really important, which is improving healthcare and how we treat patients.”
The newly tenured Shen plans to continue developing innovative computational approaches to integrate genomic data in studies of human disease. A paper currently under peer review focuses on a new method (MVP) to improve predictions of pathogenicity — variants that cause disease — using deep learning methods. This work is supported by the eMERGE Grant, which focuses on linking genetics with EHR.
Looking forward to the coming years, Shen believes that important medical advances in the field could be on the horizon, and he is excited to put his passion and expertise to work in finding critical breakthroughs in the study of human diseases.