Multimodality changes the interface
When models can accept and produce multiple modalities (text, images, audio), the product becomes an interface layer that coordinates inputs/outputs, tool calls, and user expectations. Capability is no longer a single benchmark score; it’s end-to-end task success.
Reliability as an engineering problem
Shipping assistants at scale requires managing hallucinations, latency, cost, and policy. Many improvements come from evaluation, careful post-training, and product guardrails rather than architecture alone.
Connections
For image-generation roots, see DALL·E. For assistant shaping methods, see RLHF and Post-Training. For the broader organizational timeline, see OpenAI.