SDXL 1.0: Better quality, more functionality, and easier generation on “the largest open image model” to date
Stable Diffusion XL adds quality and functionality improvements over previous versions. Perhaps most important are (i) the model’s ability to generate richer and better quality images with higher levels of photorealism, and (ii) the lower requirement on prompting, allowing users to generate complex and creative images with simpler prompts. These can be seen below, in images created against the prompt “child eating ice cream in park.”
The chart below, published by Stability AI, highlights SDXL winning over 80% of human votes in a human-evaluated test comparing outputs against previous versions. This is validated in discussions across multiple forums over the past few weeks comparing SDXL to previous versions. The paper further highlights user preference evaluations where SDXL outperforms Midjourney v5.1 in four of six image generation categories.
Source: SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
The SDXL paper highlights the core architectural changes introduced with SDXL that enable these improvements - including a larger neural network for the base model; use of a combination of OpenCLIP ViT-bigG and CLIP ViT-L as text encoder; fine tuning to better support broader (non-square) aspect ratios; and the use of a second refinement stage to further improve image fidelity.
With these enhancements, the SDXL 1.0 base model has 2.6B parameters, compared to 860M in 1.5 and 865M in 2.1. This makes SDXL over 2.5 times larger in size compared to its predecessors, and correspondingly more expensive due to the larger hardware footprint requirement. The higher size and cost of the model are likely to make SDXL 1.0 initially appealing only to highly quality-sensitive use cases. At the same time, we believe that the quality and openness of SDXL, combined with the ecosystem of technologies forming around it, have the potential to unblock adoption of GenAI image generation for a broad range of new organizations and use cases - including gaming, marketing asset creation, and entertainment.
Image Generation on OctoAI
Starting with the announcement of our accelerated Stable Diffusion 2.1 model earlier this year, OctoAI has been actively expanding its toolkit of features and services for developers to build image generation applications. Shortly after our launch, we added the ability to load different styles/checkpoints into Stable Diffusion 1.5, as well as the Automatic1111 Web UI for Stable Diffusion for developers to easily experiment with different LoRAs, checkpoints, and extensions. We also added Stable Diffusion on AWS Inferentia2 (private preview) and Stable Diffusion fine tuning on OctoAI (private preview). OctoAI’s earliest adopters, Civitai and Extropolis AI - both innovative trailblazers bringing image generation to a broad audience, are a testament to OctoAI’s image generation strengths. And we’re building on this today with the addition of SDXL 1.0 on OctoAI.
Get Started with SDXL on OctoAI today!
You can take SDXL 1.0 for a spin today with a free trial on OctoAI.
You’re also welcome to join us on our Discord server to engage with the team and community, and to share your creative images. We look forward to hearing from you on our channels!
The past couple of years have seen a meteoric rise of text-to-image models such as OpenAI's DALL-E 2, Google Brain's Imagen, Midjourney, and Stable Diffusion.
At OctoML, we are on a mission to deliver affordable AI compute services for those who want control over the business they are building. That’s why we built a new compute service, available now in early access. It delivers AI infrastructure and advanced machine learning optimization techniques that you can only find in large scale AI services like OpenAI, but gives you the power to control your own API, choose your own models and work within your AI budget.
OctoAI is a compute service to run, tune (or customize), and scale your generative AI models. The service builds on the expertise and technologies around AI/ML systems optimization at OctoML and abstracts away details of model execution and hardware optimization from developers.