August 30, 2023
InvokeAI for Stable Diffusion
I researched the topic of Stable Diffusion (SD) models by browsing multiple Reddit discussions related to SD models, their usage, and improvements in the technology. The sources I found were quite relevant and provided a range of perspectives on the topic. There were numerous suggestions and recommendations for using SD models, with some consensus on the key aspects of the technology. However, I noticed a few differences in opinions and experiences among users, which may lead to some uncertainty regarding the best practices for SD models.
Have an opinion? Send us proposed edits/additions and we may incorporate them into this article with credit.
Words
511
Time
5m 21s
Contributors
777
Words read
76.2k
Stable Diffusion Models and Customization
Finding and Downloading SD Models
Distilled Stable Diffusion and Speed Improvements
Community Resources and Getting Started with Stable Diffusion
Training and Fine-tuning Stable Diffusion Models
OpenAI's Consistency Technique for One-shot Image Generation
InvokeAI for Enhanced User Experience and Features
Jump to top
Research
"Introducing Consistency: OpenAI has released the code for its new one-shot image generation technique. Unlike Diffusion, which requires multiple steps of Gaussian noise removal, this method can produce realistic images in a single step. This enables real-time AI image creation from natural language"
- OpenAI has released the code for its new one-shot image generation technique called Consistency on the r/StableDiffusion subreddit.
- It can produce realistic images in a single step without requiring multiple steps of Gaussian noise removal.
- The method enables real-time AI image creation from natural language.
- Github link to the Consistency code: https://github.com/openai/consistency_models
- Paper link to Consistency technique: https://arxiv.org/abs/2303.01469
- A user named u/Automatic1111111 on the r/StableDiffusion subreddit is asking how to use Consistency on Automatic1111111.
- User u/235k likes the mention of a well-maintained fork on the r/StableDiffusion subreddit, which is actively looking for people to help as the owner created it for himself.
- User u/ztchatl suggests that Consistency would not work with Stable Diffusion as it is a completely different approach. According to the user, it is a continuation of DALL-E.
- User u/pooysyncky suggests that there are no large pretrained models other than Imagenet and toy models by today’s standards. They also believe that convincing someone to drop six or seven figures on training and then releasing an open model could be a challenge.
- User u/gazyskinquesierrier posts a link through which people can check the open pull requests for AUTOMATIC1111/stable-diffusion-webui.
- User u/rodjjo introduces a new project called diffusion-expert and mentions that he intends to add lots of features to it.
- There is a discussion among users about the need to move away from Automatic1111111 being the default standard and to consider finding a good fork to move to.
- The AUTOMATIC1111 webui has 63k Github stars, which is considered insane by user u/Necropierre.
- There is a comment thread between users u/maslumbriel and u/237OPS discussing the possibility of using React to generate waifus.
"How do I generate art like this with Guided Diffusion, been fascinated by one artist but haven't found any good collections or tutorials on how this is done?"
Relevant: True Importance: 7 Notes:
- Text to Image (TTI) art and how to generate it with Guided Diffusion is the topic of the webpage.
-
A user with experience in AI art gave some recommendations for generating art.
- Suggested using Google Colab with TenskorFlow and Keras frameworks for generating TTI art.
- Recommended using the VQGAN+CLIP notebook as a starting point before using the Diffusion notebook.
- Visons of Chaos software is suggested if you have a good Nvidia GPU.
- Only about 1 in 10 outputs are even worth saving as 90% of what the AIs make is complete garbage.
- A user asked how to prevent text and “shutterstock” type text from appearing in generations and another user posted links to other Google Colabs and non-Colab web apps that deal with CLIP-guided diffusion systems.
- Centipede Diffusion is mentioned as a program that generates garbled text over the image, under it, or on the artwork subject.
- Users also posted links to a YouTube video tutorial about VQGAN and another to one about ruDALL-E.
- One user mentioned that the SelfieWiz app on the app store is the best AI filter app.
"High Fidelity Image Generation Using Diffusion Models"
Not used in article
"Introduction to Diffusion Models for Image Generation - LearnOpenCV"
Not used in article
"Dreamer's Guide to Getting Started w/ Stable Diffusion!"
- /r/StableDiffusion is a community for AI art generated with Stable Diffusion.
- The Stable Diffusion Discord is a place for the community to discuss AI art.
- The community resources provided on the webpage are purely community operated and not endorsed, vetted, nor provided by Stability AI.
- Rules for posting on the subreddit include that all posts must be Stable Diffusion related, be respectful, and follow Reddit’s content policy.
- DreamStudio is a website for Stable Diffusion AI art generation. New users can get 200 free credits to spend on the site.
- Tips and Tricks for DreamStudio are provided on the webpage.
- Guides for local installation of Stable Diffusion include Stable Diffusion Installation Guides, Stable Diffusion Basujindal Installation Guides, and Easy Stable Diffusion UI. Simple instructions for installing the CompVis repo of Stable Diffusion are also provided on the website.
- Information is provided on how to run Stable Diffusion on different types of machines. Stable Diffusion requires a 4GB+ VRAM GPU to run locally. Nvidia cards are recommended and the only cards officially supported. AMD support is available unofficially. Apple M1 Chip support is also available unofficially. Intel based Macs currently do not work with Stable Diffusion.
- Information about the NSFW filter for Stable Diffusion is provided. DreamStudio does not allow the disabling of the NSFW filter at the moment. Only local or remote installs may remove the filter currently.
- Different resources for prompt development are provided, including Prompt Modification Studies, Prompthero, and Krea.
- Tips for prompt development include researching, finding well-known artists, researching certain painting techniques, sculpting prompts to represent desired concepts and including descriptive details.
- A list of style guides is provided for prompt development.
- Information is provided on how to remove the safety filter from diffusers.
- Information is also provided on how to use Stable Diffusion, including how to generate high-resolution images, the maximum prompt length, and how to get solid results.
- The webpage includes a FAQ with additional information about running Stable Diffusion.
"How to run Stable Diffusion to make awesome AI-generated art"
Not used in article
"How to use Stable Diffusion to create AI-generated images"
Not used in article
"How to train a Stable Diffusion"
- Relevant: True
- Importance: 7
Notes:
- Stable Diffusion (SD) uses natural language processing (NLP).
- Users ask if SD can speak or generate code like GPT-3. It seems that if people want to isolate the NLP part of SD, they’re basically playing with GPT-2/3.
- SD is a latent diffusion model that uses clip embeddings based on GPT-2/3. Parts of the neural network are sandwiched by layers taking in a math remix of the prompt.
- Stable diffusion assumes that the random noise seeded is a super noisy version of what is described.
- Latent means that it’s “de noising” a compressed version of the image that represents the qualitative aspects better than the raw pixel grid. At the end the decoder converts it into an actual image.
- To train SD, it’s necessary to collect thousands of relevant images.
- A commenter shared a post on training an unconditional diffusion model with minimal coding.
- Discussions also mention textual inversion, which adds things to the dataset, and a tutorial for fine-tuning SD using only some specific data.
- A user notes that there are no known pipelines to validate the output of a text generator that exist within the generation process.
- It’s claimed that discriminators of human language and expression would be too difficult compared to a visual image.
- Some users shared links to articles and YouTube videos relevant to the topic.
- There are discussions about a word to video model and text-based GAN models.
"GitHub - invoke-ai/InvokeAI: InvokeAI is a leading creative engine for ..."
Not used in article
"Getting Started With Stable Diffusion: A Guide For Creators - jonstokes.com"
Not used in article
"Noob's Guide to Using Automatic1111's WebUI"
Not used in article
"New Distilled Stable Diffusion with 20x speed-up (from 5.9 s to 0.9s) to be presented at NeurIPS by Stability AI"
- The webpage is a Reddit post, with a URL that points to a thread discussion about a new technique called “Distilled Stable Diffusion.”
- The post title refers to the presentation of the technique at the NeurIPS conference.
- One user mentions the incredibly time-consuming nature of generating a single image with stable diffusion. They are expressing amazement and longing for the 20x speed-up that Distilled Stable Diffusion reportedly offers.
- Some users provide recommendations on how to optimize the use of Stable Diffusion using better hardware or Google Collab.
- A user shares their experience with Google Collab, suggesting that it works well for generating images as well as providing a link for a repo on how to use Stable Diffusion with Google Collab.
- A few users discuss the speed differences of image generation using Stable Diffusion with different GPU cards. One user even shares a time breakdown of how long it took for them to generate a batch of 80 images at different sizes.
- The same user also provides a link to a YouTube video that explains the concept of Distilled Stable Diffusion.
- Other users show excitement about the development of Distilled Stable Diffusion.
- Finally, the post author provides links to the paper on Distilled Stable Diffusion as well as the page of its author, Chenlin Meng.
"Stable Diffusion Algorithms: A Comprehensive Guide"
Not used in article
"Introduction to Diffusion Models for Machine Learning - AssemblyAI"
Not used in article
"Understanding Stable Diffusion from "Scratch" | Binxu Wang"
Not used in article
"Diffusion Models: A Practical Guide | Scale AI"
Not used in article
"Blog post "How Diffusion Models Can Achieve Seemingly Arbitrarily Large Compression Ratios""
Not used in article
"ELi5: What are SD models, and where to find them"
- A Stable Diffusion model is a database of numbers that provides descriptions to the AI on how to produce images.
- Given the limited Video RAM, the model is restricted to roughly 890 million parameters.
- The original vanilla / base SD 1.5 and 2.1 models generate mediocre images.
- Custom models were trained on top of the vanilla/base SD 1.5 and SD 2.1 models to improve image quality.
- To produce custom models, a large set of desirable images is used to train the model.
- Mixed/combined/merged models were created to generate images with a variety of styles and genres.
- Besides checkpoint models, other types of models are LoRA, embedding, etc.
- LoRA is a small, specialized database that contains descriptions of a single concept or subject.
- Textual Inversion is used to train one using photos of their face to generate images of themselves in different situations and costumes.
- SD models come in two formats, .ckpt and .safetensors, with .safetensors being the newer format that loads faster and doesn’t contain Python code.
- Most models are now released in .safetensor format.
- Many custom checkpoint models come in different file formats or sizes: fp16 (half-precision floating-point, 2 bytes) and fp32 (full-precision floating-point, 4 bytes).
- Most models only need the smaller (usually 2 GiB) fp16 version.
- To use different SD models, download and install them into the specific folder/directory of the program.
- The two most popular places to find and download models are CivitAI and Hugging Face.
- CivitAI provides examples of images that a model can produce with a prompt that is shown by clicking on the (i) icon in the bottom right corner of a generated image.
- Prompt data can be copied to the clipboard by clicking “Copy Generation Data” to test on Automatic1111.
- Other places to find more sample prompts are mage.space, lexica.art, openart.ai, and playgroundai.com.
- Most sites use their own custom model, so users may not be able to reproduce the image on their local setup.
- A list of popular models is provided.
- The webpage outlines the difference between SD 1.5 and SD 2.1 models, recommended sticking with one or two models to get familiar with them, and importance of experimenting.
- Insights and opinions of other Reddit users interested in Stable Diffusion models are
"Invokeai vs. automatic1111 ?"
Not used in article
"AI in HR: The good, the bad, and the ugly - eFront Blog"
Not used in article
"InvokeAI 2.1 Release - Inpainting, Advanced Prompt Syntax, & More"
- The InvokeAI 2.1 release is discussed on a webpage with a video on youtube.
- The update includes features such as inpainting, advanced prompt syntax, prompt-blending, model switching, text masking, outcrop and cross-attention and weighting.
- The interface is available through a browser and supports seed/subseed blending.
- Commenters mention that InvokeAI’s WebUI is more responsive than AUTOMATIC1111’s.
- The command line interface is available for batch processing.
- Some users mention that InvokeAI release has fewer bugs and better support for Mac machines.
- There is a link to InvokeAI’s Github page, which provides detailed documentation and links to other resources such as a Discord channel for troubleshooting.
- A user offered help with installing the software through a tutorial.
- InvokeAI’s outpainting feature is mentioned as upcoming.
- The software development strategy is thoughtful, with features intended to fit into what will become a “core” workflow for certain professions.
- Users suggest additional features such as the ability to use a separate negative prompt text box and a dropdown box for selecting preset prompts.
- Model switching is available under settings in the InvokeAI UI.
- The installation process is generally smooth, but some users had issues installing dependencies and running the colab notebook.
- The GFPGAN and CodeFormer models need to be downloaded manually and placed in specific folders.
- A user suggests copying the models.yaml.example to models.yaml to fix an error.
- A user suggests checking the home/[username]/opt/InvokeAI/configs/models.yaml file to diagnose another installation error.
- Overall, users report a positive experience with InvokeAI, with some comparing it favorably to AUTOMATIC1111 for its ease of use and solid feature set.
"List of arguments for InvokeAI"
Not used in article
💭 Looking into
An overview of the successes and limitations of applying AI to achieve stable diffusion
💭 Looking into
A description of how InvokeAI can be applied to achieve stable diffusion