Mira Murati: Multimodality and the Modern AI Product Stack

Multimodality changes the interface

When models can accept and produce multiple modalities (text, images, audio), the product becomes an interface layer that coordinates inputs/outputs, tool calls, and user expectations. Capability is no longer a single benchmark score; it’s end-to-end task success.

Reliability as an engineering problem

Shipping assistants at scale requires managing hallucinations, latency, cost, and policy. Many improvements come from evaluation, careful post-training, and product guardrails rather than architecture alone.

Connections

For image-generation roots, see DALL·E. For assistant shaping methods, see RLHF and Post-Training. For the broader organizational timeline, see OpenAI.