Class of 2021 PhD Profile: Tian Kang

Tian Kang didn’t know the full scope of the biomedical informatics environment before beginning her DBMI journey as a master’s student in 2014, but a first-semester course enlightened her and set her on a path for the next several years at Columbia.

Taught by professor Chunhua Weng, the course “Symbolic Methods” introduced Tian to natural language processing and electronic data in healthcare. Work on a small project around clinical trial eligibility fascinated her, and would eventually help direct her PhD journey. She also learned about the fields of clinical informatics and clinical research informatics around that time, and she transferred to a clinical research informatics track to continue this work.

That clinical trial eligibility project that originally fascinated Tian provided even greater motivation when she converted it into an AMIA presentation. She enjoyed the research, realized the numerous other interesting topics she could consider studying—and the wide breadth of expertise within the faculty—and applied for the DBMI PhD program.

Associate professor Noémie Elhadad also helped shape Tian’s future by advising her in NLP research. Those tools would provide an important foundation for her dissertation work, even if it took her a while to narrow down her focus when she began her PhD journey. She was still working on the clinical trial eligibility during her first PhD year, but it was around her second that the concept of evidence-based medicine started to take shape.

“Evidence-based medicine is a concept that encourages clinical practitioners to apply the best available medical evidence from research to their clinical practice in additional to all the knowledge they learned at school, their practice, and patient care” she said. “It is integrating their knowledge, patient data, patient values and the best available evidence that has been proven in clinical research.”

While that may sound obvious in theory, it is practically daunting. The number of published studies can be overwhelming for researchers, much less clinicians who need to present to their patients. Tian works to contribute an automatic approach to facilitate such a process by using natural language processing to extract the key information from the literature and build a more efficient base to search for evidence.

“During practice, doctors don’t have much time to search for literature online,” she said. “Doctors need to consider effectiveness, safety profile, side effects. This type of synthesis is very time consuming, and it’s basically unrealistic to conduct during practice.”

She continued to work with Weng on this project, who was there to provide assistance, but also gave Tian the room and structure to grow as an independent researcher.

“She has been very supportive, not only as a PhD student, but with a lot of ideas I have,” Tian said. “We had many discussions about the benefits and disadvantages of each, and she has been instrumental in connecting me with other resources, professors and researchers. She encouraged me to become an independent researcher, to plan and schedule my research from the beginning to the end. That helped me grow fully.”

“Tian has been laser-focused on clinical NLP research immediately after she joined DBMI in 2014 as a master’s degree student,” Weng said. “She is passionate towards this fundamental research area in Biomedical Informatics. She has overcome various challenges in her research, including the scarcity of quality annotated corpora, the lack of standard knowledge representations, and the rapidly changing landscape for machine learning and deep learning to name a few.”

“In her dissertation, Tian advanced NLP research for extracting medical evidence from the public PubMed database,” Weng added. “Tian’s vision for medical evidence computing is expansive and bold. I am very proud of her growth into a mature informatician and NLP scientist. I wish her best of luck in the next steps of her career.”

Tian also credited professor Adler Perotte as an important faculty member who mentored her on machine learning for her doctoral dissertation. She felt like the faculty culture of engaging in research discussions for any DBMI students was valuable, and she felt similarly about the camaraderie of her fellow students.

Her research eventually led to a successful dissertation entitled “Towards Unified Medical Evidence Computation from Literature for Evidence-based Medicine.” The abstract is below.

Evidence-based Medicine (EBM) is the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patient. Billions of dollars are spent annually on the conduct of randomized clinical trials (RCT), one type of experiments widely regarded as yielding the most valuable evidence. Yet, the number of studies is growing exponentially, and most experiment results are published only as text-based articles in the medical journal, causing difficulties for both practitioners and researchers in searching, synthesizing, and ultimately, translating the best available evidence to the patient care. To address the problem, I aim to develop a unified information extraction framework for medical evidence, and build novel computational approaches based upon it to make evidence from research more accessible in Evidence-based Medicine. In this dissertation, I (i) present a unified conceptual model and coordinated workflow for evidence representation, (ii) develop open-source NLP tools for supporting EBM tasks (evidence extraction, retrieval, and synthesis), (iii) develop a medical evidence base to cater various information needs, and (iv) present a new machine reading comprehension model for answering clinical questions.

While the pandemic impacted her final year at Columbia, it didn’t impact her love for New York City. She is joining the NYC-based Tempus Labs Inc as a Machine Learning Specialist. She will continue to focus on natural language processing for improving healthcare in domains like oncology and cardiology.