Microsoft has introduced MAI-Image-2, its latest in-house text-to-image model designed to generate more realistic and usable visuals for creative work. The release marks the company’s second major step in building its own image-generation technology, following MAI-Image-1.
Microsoft is positioning MAI-Image-2 as a tool built for real-world creative workflows rather than just visual experiments. According to the company, the model focuses heavily on photorealism and usability.
“MAI-Image-2 is built for creatives who want images that feel like they exist in the world, with natural light, accurate skin tones, environments that feel lived-in,” Microsoft AI wrote in a blog post. The company adds that this approach is meant to reduce editing time, allowing creators to “spend less time fixing in post-production and more time making.”
Fixing a long-standing AI problem: text in images
One of the biggest upgrades in MAI-Image-2 is its ability to generate readable, consistent text within images, a known weak point for many AI models.
Microsoft says the model can reliably produce visuals that include text elements such as posters, infographics, slides, and diagrams, with “little lost between direction and creation.” This improvement opens up more practical use cases, especially for designers and marketers who rely on text-heavy visuals.
Unlike earlier models that focused heavily on technical benchmarks, MAI-Image-2 was developed with feedback from photographers, designers, and visual storytellers.
Microsoft says these contributors helped identify key gaps in existing tools, particularly around realism, text accuracy, and the ability to create complex or cinematic scenes. The model is also designed to handle more imaginative outputs, including surreal concepts and highly detailed compositions.
Climbing the leaderboard but not leading yet
MAI-Image-2 has quickly climbed the rankings on Arena.ai. Microsoft says the model has pushed its MAI family into the top three globally among text-to-image labs.
However, rankings show it still trails competitors like Google’s Gemini models and OpenAI’s GPT-Image systems. Even so, the jump is notable: MAI-Image-1 debuted at a much lower level, making this a clear step forward for Microsoft’s in-house AI efforts.
“Our team has been pushing immensely hard for this release, and we are now among the top models out there,” said Mustafa Suleyman, CEO of Microsoft AI, in a post on X.
Rolling out across Microsoft’s ecosystem
MAI-Image-2 is already being integrated into Microsoft’s product lineup. Users can test it in the MAI Playground today, while rollout has begun for Copilot and Bing Image Creator.
Enterprise access is also underway, with select customers able to use the model via API. Wider developer availability is expected soon through Foundry.
This release lands as Microsoft doubles down on building its own AI models in-house, rather than relying solely on its close partnership with OpenAI, which previously powered much of Bing and Copilot’s image generation. It also follows a leadership shift. Suleyman moved in November 2025 to lead the AI Superintelligence team full-time, and this is the first public model to drop since that change.
“Really proud of our progress on models and products – stay tuned for new releases and come join us on our Superintelligence mission,” Suleyman added in his X post.
Also read: AI image tools are improving quickly, but the best option still depends on whether you need photorealism, speed, editing controls, or cleaner text rendering.


