Forget DALL-E 2. Stable Diffusion is the new hotness.
Startup Stability AI this week announced the release of a new open-source AI to researchers that has no public figure filter
A new text-to-image AI system called Stable Diffusion is generating all the buzz this week. Opened for beta access to researchers by London- and Los Altos-based startup stability.ai, Stable Diffusion is a breakthrough in speed and quality, and unlike OpenAI’s DALL-E 2, it doesn’t have a filter for public figures.
The code for Stable Diffusion is available on Github, and the model card is available on Hugging Face. The model weights are currently only available to academics, research institutions, and independent researchers.
“This model builds on the work of many excellent researchers and we look forward to the positive effect of this and similar models on society and science in the coming years as they are used by billions worldwide,” said Emad Mostaque, CEO, Stability AI.
Of course, the fact that filters are more open creates a number of responsible AI questions and potential biases. For example, the model is primarily limited to English descriptions, and it was not trained to show factual or true representations of people or events.
Its limitations:
It does not achieve perfect photorealism
It cannot render legible text
It does not perform well on more difficult tasks that involve “compositionality”
Faces and people may not be generated properly
It was trained mainly with English captions
The auto-encoding part of the model is lossy
The model was trained on a dataset that contains adult material and is not fit for product use without additional safety mechanisms and considerations
Here are some of my favorite photos so far on r/StableDuffusion:
Resources: