Dealing with the limitations of our noisy world
Tamara Broderick first set foot on MIT’s campus when she was a high school student, as a participant in the inaugural Women’s Technology Program. The monthlong summer academic experience gives young women a hands-on introduction to engineering and computer science.
What is the probability that she would return to MIT years later, this time as a faculty member?
That’s a question Broderick could probably answer quantitatively using Bayesian inference, a statistical approach to probability that tries to quantify uncertainty by continuously updating one’s assumptions as new data are obtained.
In her lab at MIT, the newly tenured associate professor in the Department of Electrical Engineering and Computer Science (EECS) uses Bayesian inference to quantify uncertainty and measure the robustness of data analysis techniques.
“I’ve always been really interested in understanding not just ‘What do we know from data analysis,’ but ‘How well do we know it?’” says Broderick, who is also a member of the Laboratory for Information and Decision Systems and the Institute for Data, Systems, and Society. “The reality is that we live in a noisy world, and we can’t always get exactly the data that we want. How do we learn from data but at the same time recognize that there are limitations and deal appropriately with them?”
Broadly, her focus is on helping people understand the confines of the statistical tools available to them and, sometimes, working with them to craft better tools for a particular situation.
For instance, her group recently collaborated with oceanographers to develop a machine-learning model that can make more accurate predictions about ocean currents. In another project, she and others worked with degenerative disease specialists on a tool that helps severely motor-impaired individuals utilize a computer’s graphical user interface by manipulating a single switch.
A common thread woven through her work is an emphasis on collaboration.
“Working in data analysis, you get to hang out in everybody’s backyard, so to speak. You really can’t get bored because you can always be learning about some other field and thinking about how we can apply machine learning there,” she says.
Hanging out in many academic “backyards” is especially appealing to Broderick, who struggled even from a young age to narrow down her interests.
A math mindset
Growing up in a suburb of Cleveland, Ohio, Broderick had an interest in math for as long as she can remember. She recalls being fascinated by the idea of what would happen if you kept adding a number to itself, starting with 1+1=2 and then 2+2=4.
“I was maybe 5 years old, so I didn’t know what ‘powers of two’ were or anything like that. I was just really into math,” she says.
Her father recognized her interest in the subject and enrolled her in a Johns Hopkins program called the Center for Talented Youth, which gave Broderick the opportunity to take three-week summer classes on a range of subjects, from astronomy to number theory to computer science.
Later, in high school, she conducted astrophysics research with a postdoc at Case Western University. In the summer of 2002, she spent four weeks at MIT as a member of the first class of the Women’s Technology Program.
She especially enjoyed the freedom offered by the program, and its focus on using intuition and ingenuity to achieve high-level goals. For instance, the cohort was tasked with building a device with LEGOs that they could use to biopsy a grape suspended in Jell-O.
The program showed her how much creativity is involved in engineering and computer science, and piqued her interest in pursuing an academic career.
“But when I got into college at Princeton, I could not decide — math, physics, computer science — they all seemed super-cool. I wanted to do all of it,” she says.
She settled on pursuing an undergraduate math degree but took all the physics and computer science courses she could cram into her schedule.
Digging into data analysis
After receiving a Marshall Scholarship, Broderick spent two years at Cambridge University in the United Kingdom, earning a master of advanced study in mathematics and a master of philosophy in physics.
In the UK, she took a number of statistics and data analysis classes, including her first class on Bayesian data analysis in the field of machine learning.
It was a transformative experience, she recalls.
“During my time in the U.K., I realized that I really like solving real-world problems that matter to people, and Bayesian inference was being used in some of the most important problems out there,” she says.
Back in the U.S., Broderick headed to the University of California at Berkeley, where she joined the lab of Professor Michael I. Jordan as a grad student. She earned a PhD in statistics with a focus on Bayesian data analysis.
She decided to pursue a career in academia and was drawn to MIT by the collaborative nature of the EECS department and by how passionate and friendly her would-be colleagues were.
Her first impressions panned out, and Broderick says she has found a community at MIT that helps her be creative and explore hard, impactful problems with wide-ranging applications.
“I’ve been lucky to work with a really amazing set of students and postdocs in my lab — brilliant and hard-working people whose hearts are in the right place,” she says.
One of her team’s recent projects involves a collaboration with an economist who studies the use of microcredit, or the lending of small amounts of money at very low interest rates, in impoverished areas.
The goal of microcredit programs is to raise people out of poverty. Economists run randomized control trials of villages in a region that receive or don’t receive microcredit. They want to generalize the study results, predicting the expected outcome if one applies microcredit to other villages outside of their study.
But Broderick and her collaborators have found that results of some microcredit studies can be very brittle. Removing one or a few data points from the dataset can completely change the results. One issue is that researchers often use empirical averages, where a few very high or low data points can skew the results.
Using machine learning, she and her collaborators developed a method that can determine how many data points must be dropped to change the substantive conclusion of the study. With their tool, a scientist can see how brittle the results are.
“Sometimes dropping a very small fraction of data can change the major results of a data analysis, and then we might worry how far those conclusions generalize to new scenarios. Are there ways we can flag that for people? That is what we are getting at with this work,” she explains.
At the same time, she is continuing to collaborate with researchers in a range of fields, such as genetics, to understand the pros and cons of different machine-learning techniques and other data analysis tools.
Happy trails
Exploration is what drives Broderick as a researcher, and it also fuels one of her passions outside the lab. She and her husband enjoy collecting patches they earn by hiking all the trails in a park or trail system.
“I think my hobby really combines my interests of being outdoors and spreadsheets,” she says. “With these hiking patches, you have to explore everything and then you see areas you wouldn’t normally see. It is adventurous, in that way.”
They’ve discovered some amazing hikes they would never have known about, but also embarked on more than a few “total disaster hikes,” she says. But each hike, whether a hidden gem or an overgrown mess, offers its own rewards.
And just like in her research, curiosity, open-mindedness, and a passion for problem-solving have never led her astray.