My beginnings with Stable Diffusion in Python made me to test different models. I test models from HuggingFace, and recently from Civitai. Models are pretrained or in form of LORAs, the last one I prefer more because of flexibility and possibility of merging them into one pipeline. Still I’m far from get the correct results, but I’m on it for 2 days or so.
Two parameters are very important to understand:
The guidance scale is a parameter that controls how closely the generated image adheres to the text prompt.
– Higher values (e.g., 7.5 or 15.0) make the image more closely match the prompt, but may result in less creative or diverse outputs.
– Lower values (e.g., 1.0 or 2.0) allow for more creative freedom but may produce images less related to the prompt.
Inference Steps -this parameter determines the number of denoising steps in the diffusion process.
– More steps (e.g., 50 or 100) generally result in higher quality images but take longer to generate.
– Fewer steps (e.g., 20 or 30) are faster but may produce lower quality or less detailed images.
I started to use 3 schedulers so far:
LCMScheduler (Latent Consistency Model Scheduler):
- Designed for faster inference with fewer steps.
- Can produce good quality images with as few as 4-8 inference steps.
- Trades some quality for significantly faster generation times.
EulerAncestralDiscreteScheduler:
- Based on the Euler method with ancestral sampling.
- Often produces high-quality results with a good balance of detail and coherence.
- Generally requires more steps than LCM but fewer than some other schedulers.
DPMSolverMultistepScheduler:
- Uses dynamic programming to solve the diffusion ODE.
- Can produce high-quality results in fewer steps compared to some other schedulers.
- Often provides a good balance between speed and quality.
Other schedulers I have read that can be good solution are: DDIMScheduler (Denoising Diffusion Implicit Models), PNDMScheduler (Pseudo Numerical Methods for Diffusion Models), UniPCMultistepScheduler.
The choice of scheduler can significantly impact both the speed of generation and the quality of the output. LCM is often the fastest but may sacrifice some quality, while schedulers like EulerAncestral and DPMSolver often provide a good balance. The best choice can depend on your specific use case and the model we’re using.
What I had a problem also was aspect, by default images were generated as squares 512 x 512, or 1024 x 1024, but on some topics on Reddit I found probably obsoleted rank of resolutions:
SD 1.5 1:1, 512*512
SD 1.5 3:2 768*512
XL 1:1 1024*1024
XL 3.2 1216*832
XL 4:3 1152*896
XL 16:9 1344*768
XL 21:9 1536*640
Probably with time I will get some for Stable Diffusion 3 (if recent problems with tripods will be fixed).
Next day I will probably test ComfyUI, though in my tests I don’t need the UI.