New Apple model combines vision understanding and image generation with impressive results

Apple researchers have published a study about Manzano, a multimodal model that combines visual understanding and text-to-image generation, while significantly reducing performance and quality trade-offs of current implementations. Here are the details. more…