Generative Models by Stability AI

News

June 22, 2023

We are releasing two new diffusion models for research purposes:
- SD-XL 0.9-base: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The base model uses OpenCLIP-ViT/G and CLIP-ViT/L for text encoding whereas the refiner model only uses the OpenCLIP model.
- SD-XL 0.9-refiner: The refiner has been trained to denoise small noise levels of high quality data and as such is not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.

If you would like to access these models for your research, please apply using one of the following links:

SDXL-0.9-Base model, and SDXL-0.9-Refiner.

This means that you can apply for any of the two links - and if you are granted - you can access both.

Please log in to your HuggingFace Account with your organization email to request access.

We plan to do a full release soon (July).

The codebase

General Philosophy

Modularity is king. This repo implements a config-driven approach where we build and combine submodules by calling instantiate_from_config() on objects defined in yaml configs. See configs/ for many examples.

Changelog from the old `ldm` codebase

For training, we use pytorch-lightning, but it should be easy to use other training wrappers around the base modules. The core diffusion model class (formerly LatentDiffusion, now DiffusionEngine) has been cleaned up:

No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial conditionings, and all combinations thereof) in a single class: GeneralConditioner, see sgm/modules/encoders/modules.py.
We separate guiders (such as classifier-free guidance, see sgm/modules/diffusionmodules/guiders.py) from the samplers (sgm/modules/diffusionmodules/sampling.py), and the samplers are independent of the model.
We adopt the "denoiser framework" for both training and inference (most notable change is probably now the option to train continuous time models):
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers); see sgm/modules/diffusionmodules/denoiser.py.
- The following features are now independent: weighting of the diffusion loss function (sgm/modules/diffusionmodules/denoiser_weighting.py), preconditioning of the network (sgm/modules/diffusionmodules/denoiser_scaling.py), and sampling of noise levels during training (sgm/modules/diffusionmodules/sigma_sampling.py).
Autoencoding models have also been cleaned up.

Installation:

1. Clone the repo

git clone git@github.com:Stability-AI/generative

-models.git
cd generative-models

2. Setting up the virtualenv

This is assuming you have navigated to the generative-models root after cloning it.

NOTE: This is tested under python3.8 and python3.10. For other python versions, you might encounter version conflicts.

PyTorch 1.13

# install required packages from pypi
python3 -m venv .pt1
source .pt1/bin/activate
pip3 install wheel
pip3 install -r requirements_pt13.txt

PyTorch 2.0

# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install wheel
pip3 install -r requirements_pt2.txt

Inference:

We provide a streamlit demo for text-to-image and image-to-image sampling in scripts/demo/sampling.py. The following models are currently supported:

Weights for SDXL :

If you would like to access these models for your research, please apply using one of the following links:

SDXL-0.9-Base model, and SDXL-0.9-Refiner.

This means that you can apply for any of the two links - and if you are granted - you can access both.

Please log in to your HuggingFace Account with your organization email to request access.

After obtaining the weights, place them into checkpoints/.

Next, start the demo using

streamlit run scripts/demo/sampling.py --server.port <your_port>

Invisible Watermark Detection

Images generated with our code use the

invisible-watermark

library to embed an invisible watermark into the model output. We also provide

a script to easily detect that watermark. Please note that this watermark is

not the same as in previous Stable Diffusion 1.x/2.x versions.

To run the script you need to either have a working installation as above or

try an experimental import using only a minimal amount of packages:

python -m venv .detect
source .detect/bin/activate

pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark

To run the script you need to have a working installation as above. The script

is then useable in the following ways (don't forget to activate your

virtual.

Blog

Generative Models by Stability AI

StackFoss

Generative Models by Stability AI

News

The codebase

General Philosophy

Changelog from the old `ldm` codebase

Installation:

1. Clone the repo

2. Setting up the virtualenv

Inference:

Invisible Watermark Detection

Join Our Newsletter. No Spam, Only the good stuff.

Related

Generative Models by Stability AI

StackFoss

Generative Models by Stability AI

News

The codebase

General Philosophy

Changelog from the old ldm codebase

Installation:

1. Clone the repo

2. Setting up the virtualenv

Inference:

Invisible Watermark Detection

Join Our Newsletter. No Spam, Only the good stuff.

Related

Changelog from the old `ldm` codebase