How Do You Conduct A Genetic Study Without Genetic Data?: Leveraging Electronic Health Records

Any patient who’s new to a hospital is asked to fill out an intake form listing the name, number, and relationship of their emergency contact. It’s protocol all of us are familiar with. But what if that same routine information could be used in an entirely different way, to build a rich collection of genetic data for the large-scale study of disease heritability?

This is the question that intrigued Columbia University researchers Nicholas Tatonetti, PhD; Fernanda Polubriaginof, MD, PhD; David Vawdrey, PhD; Krzysztof Kiryluk, MD, MS; and their collaborators.

They suspected that answering it may revolutionize the way researchers conduct heritability studies. Instead of going through the laborious, time-consuming process of recruiting participants—which typically means finding families with twins—and collecting their data, researchers could tap into information that already exist but have not been used to their fullest potential.

“The idea was to leverage vast electronic health record (EHR) data to identify familial relationships and therefore be able to understand disease risk, heritability, and, in the future, provide better and safer patient care by leveraging this heretofore untapped resource,” says Vawdrey.

A background in working with patients, Polubriaginof says, piqued her interest in discovering new ways to deliver improved care using patients’ own data.

“I was very interested in studying disease risk for populations,” she says. “As a physician by training and a PhD candidate at Columbia, I had done work on high-risk breast cancer patients. I was looking for data, like family history data in electronic health records. But what I found was that this information was often incomplete or even nonexistent.”

This incomplete patient data, the researchers reasoned, could be pieced together with emergency contact data to create broader, inferred information sets.

“This new pedigree and family relationships data, when combined with what’s already available in the electronic health records, can be used to estimate heritability across almost all disease as a fraction of the cost and much faster than previous approaches,” says Tatonetti, the Herbert Irving Assistant Professor of Biomedical Informatics at Columbia.

So he and his colleagues came up with one algorithm to deduce millions of family relationships and another to estimate the heritability of hundreds of traits, then applied those algorithms to the 5.5 million electronic health records of patients and their emergency contacts at NewYork-Presbyterian/Columbia University Irving Medical Center, NewYork-Presbyterian/Weill Cornell Medical Center, and Mount Sinai Health System.

“We found a way of using existing data and conducting a genetic study without using genetic data,” says Polubriaginof.

The results were published in Cell in May 2018 by Tatonetti and 20 co-authors.

“If you can infer relationships, then you can create family trees on a scale that was before now impossible without extensive time and financial resources,” says Vawdrey. “We can do in minutes or hours what other research groups have tried to painstakingly collect in months and years.”

Impact with a purpose

What does all of this mean for healthcare delivery? How does it contribute to the field of precision medicine?

“From the hospital point of view, for why this is important, it boils down to wanting to give the best and safest patient care. It helps us understand if we’re doing a good job screening for high-risk diseases for high-risk populations,” says Vawdrey. “If I have a history of cancer, the computer could clue in my doctor to ask certain questions or order certain tests.”

It’s also enabling more patients to receive the targeted, data-informed, proactive care they deserve — which up until now, had been limited to merely a fraction of the population.

“Medical research has typically only focused on white males, which biases our knowledge, causing disparities in treatment outcomes,” says Tatonetti.

In a diverse EHR landscape like New York, however, its database of patients is much more racially and ethnically inclusive.

“One of the main advantages of using the medical records of a large urban academic medical center is the diversity in our study subjects,” says Tatonetti. “Our study is the first large scale analysis of heritability in a diverse population and can be used to highlight these disparities for future research.”

Protection of patient privacy

Researchers made patient privacy a priority while conducting the study. Access to data was restricted and storage was secured.

“Authorization to access research data is strictly limited to the project personnel, and they have signed the medical center’s confidentiality agreement. Authentication is via user identifiers and passwords,” says Tatonetti. “All access is encrypted using SSH Secure Shell and other services mediated through SSH. Access to the computer is logged and audited.”

In addition, individually identifying information was stripped from the public reports.

“To protect against identifying specific, extreme cases, we only evaluate conditions for which there are at least 1,000 patients diagnosed,” explains Tatonetti. “We mask any counts that we publish when the value is less than or equal to 10. In addition, we break the connections between multiple conditions and individuals by re-assigning random identifiers for each condition. This prohibits identification by a unique set of diagnoses.”

Strength in collaboration

By working across not only Columbia departments but also health networks, researchers were able to collaborate with researchers from a wide range of backgrounds.

“Everyone actively participated in this study and gave us a lot of insight, in terms of improving the method, thinking about how to use these relationships, and validating our research,” says Polubriaginof.

Their study also lays the groundwork for other researchers looking to expand upon their discoveries, applying similar methods and algorithms to different contexts.

“Our study presents a new way to generate hypotheses about the genetics of human disease that wasn’t before available. As these data and methods become more widely used, researchers around the world will have the ability to use their expertise to re-evaluate our data, identifying important questions that may otherwise have been missed,” says Tatonetti.

Opportunities down the road

In the future, these relationships could be used in other research studies, including clinically focused ones, by applying EHR-derived phenotypes to hone in on specific high-risk diseases.

“The diseases we tested were based on simple billing codes in EHRs. We have the capability of producing much more reliable phenotypes using special computational algorithms,” says Kiryluk.

One application he’s currently pursuing is kidney disease. “Our algorithm has excellent diagnostic properties, to predict kidney disease with high fidelity. The output of the algorithm can be used to estimate heritability for different forms.”

Kiryluk continues: “If you think about the fact that you have a large fraction of individuals within EHR linked with pedigrees and these individuals also have genetic data … then you could think about testing for co-segregation of EHR-derived phenotypes with specific genetic markers. That would be a really nice application of this method.”

Ultimately, it’s about “opening new avenues for research,” as Polubriaginof puts it.

Vawdrey echoes: “We want to be breaking ground and discovering impact. We’re discovering things that people have never done before and we think they help us drive impact and provide better and safer patient care, and also open the door to future research opportunities.”

The field of Health Information Technology is booming — but are there enough people to meet the demand?

For skilled professionals working in clinical care, business and management, information technology, and other related fields who are interested in taking their career to the next level or breaking into a new one, Health IT represents an untapped opportunity.

Busy professionals aren’t always in the position to pursue a master’s degree or PhD in informatics, or might not be certain they want to. That’s why the Department of Biomedical Informatics at Columbia University created the Certification of Professional Achievement in Health IT as an expert, immersive program that offers students an effective alternate path.

“For me, Columbia’s HIT Certification was the perfect first step into the world of Health IT,” says Paul Terwilliger, an MD who wanted to change career paths. “It helped me determine that HIT was the direction I wanted to go in.” Paul now works as an Epic Clinical Analyst in upstate New York.

Through a blended online and in-person experience, the two-semester part-time program is flexible enough to accommodate the lives of busy working professionals, but rigorous enough to ensure they’ll hit the ground running upon completion. The instruction is interactive, innovative, and conducted by world-class professors and industry experts.

The HIT Certification teaches not only a conceptual framework and foundation, but also practical workforce skills that emphasize hands-on training, team-based problem-solving, and real-world application. “So much of the day-to-day life of working in healthcare and technology is not published in textbooks or in research papers or taught in seminars,” explains alumnus Bishoy Luka, who works as a Pharmacy Director.

“The field of Health IT is continually changing because both healthcare and technology are always in a state of flux,” explains Program Director Virginia Lorenzi. “Understanding how to integrate the old with the new and use interdisciplinary thinking”, she adds, “is the key to keeping apace and advancing the field.”

For Luka, the experience was life-changing. “This program allowed me to reinvent myself as a pharmacist,” he says. “There is no question it helped me become an indispensable part of the healthcare team, giving me a better understanding of how other disciplines interact with each other and how pharmacy can harness available technological resources.”

Gina Maman had no healthcare experience when she joined the program. After a 13-year career hiatus to raise her children, the program enabled her to update her skill-set and its supportive alumni network helped her successfully land in a high-demand job market. Today, she works at the Columbia University Medical Center’s Physicians & Surgeons Office of Development, where her training is a daily asset.

By meeting in-person at monthly intervals, HIT students are able to be more fully engaged than they would in programs that are strictly online or offer few opportunities for face-to-face interaction. At the in-person sessions, students work together in a highly interactive classroom setting, benefiting from meetings with numerous faculty, field experts, and the program’s supportive alumni community.

And because the students only meet monthly, the program attracts students from outside the city and state. Terrie Hamlin, now Vice President of Nursing Informatics for a school-based EHR vendor, traveled to class from New Hampshire. She felt it was worth the trip. “The wealth of knowledge obtained from the Certificate of Professional Achievement in Health Information Technology program at Columbia is truly amazing. It helped me achieve professional goals I didn’t know I had.”

Trainees come from myriad backgrounds, including technical, clinical, business, legal, and sales. Some trainees are already in the field of Health IT and simply seek formal training to enhance their career opportunities. Diana Kohlberg explained her motivations for enrolling: “I took the Columbia University Health IT program to bridge my gap in knowledge of Health IT that was not covered during my MPH program.” Diana now works in Health IT innovation.

According to Virginia Lorenzi, “The program’s diversity is key to its success. We use a training method called Team-Based Learning (TBL). Students work in interdisciplinary teams to complete quizzes, assignments, and large-scale projects. The students learn from each other’s strengths and develop important soft skills such as interdisciplinary communication and teamwork, which we believe are essential for this field. They also develop innovative ideas by blending their perspectives.”

Alumna Audrey Akins is employed as a principal trainer on Epic EMR software for a consulting company. She provided insight on her team-based learning experience: “The group projects were key in gaining clarity from my peers’ perspectives on how they saw HIT. Some were medical clinicians, while others were in advanced data IT roles. The projects enabled us to work cooperatively and lead by leaning on each other’s strengths to succeed.”

For some students, the Certification Program is just the beginning of their educational pursuits in informatics.

Wilson Ramos joined the program with a background in engineering and no healthcare experience. After completing the certification program, he decided he wanted to study the field in greater depth, so he applied and was accepted to the Master of Arts degree program in Biomedical Informatics at Columbia University. After completing his master’s degree, Ramos went on to work for Columbia University’s ​International Center for AIDS Care and Treatment Program (ICAP), an organization devoted to “delivering transformative solutions to meet the health needs of individuals around the world.” Through his work at ICAP, he traveled to several countries across Africa and used his informatics knowledge and skills to make an international impact.

But for most students, the Certification Program alone is enough to open doors to new opportunities.

Craig Budzynski worked in home healthcare management but wanted a change. He was able to change his career trajectory thanks to the Certification Program. Now he works as a Business Intelligence Architect. “The Columbia HIT program opened my mind and career to the vast arena of technology in healthcare. I would not be where I am right now without the program, and I owe a lot of the happiness in my current career to the program and the passionate instructors.”

Interested in enrolling in the HIT program? Applications are accepted each Spring for the next Fall’s cohort.
Want to learn more? Sign up for the next open house event.

Alum Mary Regina Boland Connects the Dots Between Climate Data and Disease

This post is part of the People of DBMI series.

If you’re thinking about having a baby, you might want to try to steer clear of giving birth in the autumn.

At least in New York City, where Mary Regina Boland, PhD, and her Columbia University colleagues studied the ties between birth month and disease, finding that October and November babies are most correlated with increased risk of neurological, reproductive, and respiratory illnesses.

Today an assistant professor of informatics at the University of Pennsylvania, Boland conducted the research as part of her dissertation as a PhD candidate of Columbia’s Department of Biomedical Informatics. She and her team examined more than 1.7 million patient records at Columbia University Medical Center, looking at climate and seasonality as one environmental factor that plays an important role in pregnancy conditions.

The birth-month study is one in a string of research Boland undertook at Columbia looking at how clinical data like dental and medical records are connected. Also as a DBMI student, she found that periodontitis, or gum infection, is linked to, interestingly enough, diabetes, hypertension, and prostate inflammation.

Boland earned her MA from DBMI in 2012, her MPhil in 2016, and her PhD in 2017.

Throughout her career, Boland has been dedicated to assessing, engaging with, and making sense of data. She’s curious about the effects of different types of environmental exposures on not only prenatal and perinatal development, but also other health outcomes.

In July, Boland published a study of climate impact on hospital performance metrics, which found that, even after adjusting for socioeconomic factors like income, exposure to colder climates resulted in higher 30-day mortality rates. The research gathered scores from more than 4,500 hospitals in over 2,300 counties across the nation.

The challenge now is to apply the data, going beyond mere collection and understanding to use them for actual treatment. It’s about “closing the loop,” as Boland puts it. “In informatics, you try to build models to understand latent factors,” she says, and design methods for integration.

Research like Boland’s could be key to devising better health, as the field of biomedical informatics works increasingly more on the consumer end to develop health tracking and analysis systems. And such evaluations will be better, more robust, and more holistic, taking into account an individual’s entire environment.

“We’re going to have a better assessment of things that contribute to disease risk,” Boland says. “As we continue to move toward the future, there’s going to be less of an emphasis on genetics being the way to solve everything.”