Offered under: MAS.S60, 6.S985
Term(s): Spring only
Level: Graduate
Units: 12
Prerequisite: 6.390 or equivalent
Instructors: Paul Liang (MAS and EECS), Dimitris Bertsimas (MIT Sloan), Sang-Gook Kim (MechE), Jinhua Zhao (DUSP)


Artificial Intelligence (AI) holds great promise to enhance digital productivity, physical interactions, overall well-being, and the human experience. To enable the true impact of AI, these systems will need to be grounded in many real-world data modalities, from language-only systems to holistically integrating vision, audio, sensors, medical data, music, art, smell, taste, and more. This course introduces the principles of multimodal AI that can process many modalities at once, such as connecting language and images, music and art, sensing and actuation, and more. We will cover AI methods to (1) represent and fuse heterogeneous and interconnected data sources, (2) align data across different views, (3) reason over multiple steps with many modalities, (4) generate new multimodal content, (5) transfer knowledge from high-resource to low-resource data, and (6) quantify the principles of multimodal AI for safe, ethical, and human-aligned deployment.