Basic concepts on AIGC
  • About the course materials
  • General Course Format and Strategies
  • Introduction
  • Foundations for AIGC
    • Computers and content generation
    • A brief introduction to AI
      • What AI is?
      • What ML is?
      • What DL is?
      • Discriminative AI vs. Generative AI
  • Generative AI
    • Introduction to Generative AI
      • Going deeper into Generative AI models
  • Deep Neural Networks and content generation
    • Image classification
    • Autoencoders
    • GAN: Generative Adversarial networks
    • Transformers
    • Diffusion models
      • Basic foundations of SD
  • Current image generation techniques
    • GANs
  • Current text generation techniques
    • Basic concepts in NLP in Large Language Models (LLMs)
    • How chatGPT works
  • Prompt engineering
    • Prompts for LLM
    • Prompts for image generators
  • Current AI generative tools
    • Image generation tools
      • DALL-E 2
      • Midjourney
        • More experiments with Midjourney
        • Composition and previous pictures
        • Remixing
      • Stable diffusion
        • Dreambooth
        • Fine-tuning stable diffusion
      • Other solutions
      • Good prompts, img2img, inpainting, outpainting, composition
      • A complete range on new possibilities
    • Text generation tools
      • OpenAI GPT
        • GPT is something really wide
      • ChatGPT
        • Getting the most from chatGPT
      • Other transformers: HuggingFace
      • Other solutions
      • Making the most of LLM
        • Basic possibilities
        • Emergent abilities of LLM
    • Video, 3D, sound, and more
    • Current landscape of cutting-edge AI generative tools
  • Use cases
    • Generating code
    • How to create good prompts for image generation
    • How to generate text of quality
      • Summarizing, rephrasing, thesaurus, translating, correcting, learning languages, etc.
      • Creating/solving exams and tests
  • Final topics
    • AI art?
    • Is it possible to detect AI generated content?
    • Plagiarism and copyright
    • Ethics and bias
    • AI generative tools and education
    • The potential impact of AI generative tools on the job market
  • Glossary
    • Glossary of terms
  • References
    • Main references
    • Additional material
Powered by GitBook
On this page
  1. Deep Neural Networks and content generation

Diffusion models

Diffusion Models are generative models, which means that they are used to make data that looks like the data they were trained on. Diffusion Models work by destroying training data by adding Gaussian noise to it over and over again, and then learning how to get the data back by reversing this process of adding noise.

What is Stable Diffusion?

Stable Diffusion is an algorithm developed by Compvis (the Computer Vision research group at Ludwig Maximilian University of Munich) and sponsored primarily by Stability AI, a startup that aims to be the driving-force behind a grass-roots, open-source AI revolution. The algorithm itself builds on ideas from Open AI’s DALL-E 2, Google’s Imagen and other image generation models, with a lot of optimisations on top.

Stability AI employ Katherine Crowson as their lead coder. If you haven’t heard the name, Katherine is one of the main driving forces behind the AI art explosion in the last 1.5 years. She was the first to combine VQGAN with OpenAI’s CLIP, and then she went on to develop the CLIP-Guided diffusion method underpinning Disco Diffusion, NightCafe and various other AI image generation websites.

Stable Diffusion is a bit different to those algorithms in that it is not “CLIP-Guided”. Instead, a version of CLIP is “Frozen” and embedded into the generation algorithm itself. This is an idea borrowed from Imagen, and makes stable diffusion a LOT faster than it’s CLIP-guided ancestors.

Incredibly, compared with DALL-E 2 and Imagen, the Stable Diffusion model is a lot smaller. While DALL-E 2 has around 3.5 Billion parameters, and Imagen has 4.6 Billion, the first Stable Diffusion model has just 890 million parameters, which means it uses a lot less VRAM and can actually be run on consumer-grade graphics cards.

PreviousTransformersNextBasic foundations of SD

Last updated 2 years ago