Seeking the cellular mechanisms of disease, with help from machine learning
Caroline Uhler’s research blends machine learning and statistics with biology to better understand gene regulation, health, and disease. Despite this lofty mission, Uhler remains dedicated to her original career passion: teaching. “The students at MIT are amazing,” says Uhler. “That’s what makes it so fun to work here.”
Uhler recently received tenure in the Department of Electrical Engineering and Computer Science. She is also an associate member of the Broad Institute of MIT and Harvard, and a researcher at the MIT Institute for Data, Systems, and Society, and the Laboratory for Information and Decision Systems.
Growing up along Lake Zurich in Switzerland, Uhler knew early on she wanted to teach. After high school, she spent a year gaining classroom experience — and didn’t discriminate by subject. “I taught Latin, German, math, and biology,” she says. But by year’s end, she found herself enjoying teaching math and biology best. So she enrolled at ETH Zurich to study those subjects and earn a master’s of education that would allow her to become a full-time high school teacher.
But Uhler’s plans changed, thanks to a class she took from a visiting professor from the University of California at Berkeley named Bernd Sturmfels. “He taught a course called algebraic statistics for computational biology,” says Uhler. The course title alone may sound like a mouthful, but to Uhler, the class was an elegant link between her passions for math and biology. “It basically connected everything that I liked in one course,” she recalls.
Algebraic statistics provided Uhler with a unique set of tools for representing the mathematics of complex biological systems. She was so intrigued she decided to postpone her dreams of teaching and pursue a PhD in statistics.
Uhler enrolled at UC Berkeley, completing her dissertation with Sturmfels as her advisor. “I loved it,” Uhler says of her time at Berkeley, where she dove deeper into the nexus of math and biology using algebra and statistics. “Berkeley was very open in the sense that you can take all kinds of courses,” she says, “and really pursue your diverse research interests early on. It was a great experience.”
Much of her work was theoretical, attempting to answer questions about network models in statistics. But toward the end of her PhD, her questions took on a more applied approach. “I got really interested in causality and gene regulation — how can we learn something about what is going on in the cell?” Uhler says gene regulation provides ample opportunities to apply causal analysis, because changes in one gene can have cascading effects on the expression of genes downstream.
She carried these causality questions forward to MIT, where she accepted a role as assistant professor in 2015. Her first impressions of the Institute? “The place was very collaborative and a hub for machine learning and genomics,” says Uhler. “I was excited to find a place with so many people working in my field. Here, everyone wants to discuss research. It’s just really, really fun.”
The Broad Institute, which uses genomics to better understand the genetic basis of disease and seek solutions, has also been a good fit for Uhler’s academic interests and her cooperative approach to research. The Broad announced last month that Uhler will co-direct its new Eric and Wendy Schmidt Center, which will promote interdisciplinary research between the data and life sciences.
Uhler now works to synthesize two distinct types of genomic information: sequencing and the 3D packing of DNA. The nucleus of each cell in a person’s body contains an identical sequence of DNA, but the physical arrangement of that DNA — how it kinks and winds — varies among cell types. “In understanding gene regulation, it’s becoming clear that the packing of the DNA matters very much,” says Uhler. “If some genes in the DNA are not used, you can just close them off and pack them very densely. But if you have other genes that you need often in a particular cell, you’ll have them open and maybe even close together so they can be co-regulated.”
Learning the interplay of the genetic code and the 3D packing of the DNA could help reveal how a particular disease impacts the body on a cellular level, and it could help point to targeted treatments. To achieve this synthesis, Uhler develops machine-learning methods, in particular based on autoencoders, which can be used to integrate sequencing data and packing data to generate a representation of a cell. “You can represent the data in a space where the two modalities are integrated,” says Uhler. “It’s a question I’m very excited about because of its importance in biology as well as my background in mathematics. It’s an interesting packing problem.”
Recently, Uhler has focused on one disease in particular. Her research group co-authored a paper that uses autoencoders and causal networks to identify drugs that could be repurposed to fight Covid-19. The approach could help pinpoint drug candidates to be tested in clinical trials, and it is adaptable to other diseases where detailed gene expression data are available.
Research accomplishments aside, Uhler hasn’t relinquished her earliest career aspirations to be a teacher and mentor. In fact, it’s become one of her most cherished roles at MIT. “The students are incredible,” says Uhler, highlighting their intellectual curiosity. “You can just go up to the whiteboard and start a conversation about research. Everyone is so driven to learn and cares so deeply.”