I’ve always loved what the art community has dubbed A.I. (Artificial Intelligence) art. I’m sure you’ve seen some of the images seemingly created just from a set of words. I decided to take a dive into A.I. art myself and am having a ton of fun with it. Below is my experience using Disco Diffusion and specifically trying to figure out what settings change what the program will generate.
Full disclosure. I am not a scientist, nor am I an A.I. expert, and I have less than a month of experience using the tools to create art using A.I. I am having a lot of fun though!
The main thing a user starts with when creating A.I. art is the “prompt,” or string of words used to define the type of image. Some people freely share these prompts, and others don’t. Prompt sharing is somewhat of a hot topic because it’s possible to use a prompt and default settings to get some really cool images. There is no right or wrong here, in my opinion. Some artists across all mediums freely share all of their techniques, others prefer to keep them proprietary. This is a debate that has been going on for a LONG time, and will continue to go on for every new artistic technique. To me, these programs are just another tool in the artistic toolbox.
For this experiment, the first prompt I started with was “forest”, and I provided a starting image. I wanted to guide the A.I. towards a general composition with a blue sky, green trees, and brown dirt. I wanted a more realistic image, like a photograph of a forest. Of course, the A.I. doesn’t know that, and “forest” is a super vague description, but I wanted to see what would happen. Here is the starting image.
I kept the settings relatively close to the defaults, so they shouldn’t have a big effect on the results. I used the seed generated from the first image for each subsequent image so the results were consistent. The main settings were clip_guidance_scale:10000, tv_scale:250, init_scale:500, steps:300, skip_steps:50, seed:3826651605, and a width/height:1024/576. Here is what I got as my first image.
Not bad for a children’s book, or a concept illustration, but definitely not what I was looking for. I really like it though! If I wanted a more realistic image, I was definitely going to have to get more descriptive. I changed my prompt to “landscape pine trees photograph” and kept all of the other settings the same. Here is what I ended up with.
Well, this just looks like some fake trees you would buy at the local craft store. Definitely not what I was looking for, but a little bit closer to what I wanted.
At this point, I was wondering if the order of the words made a difference. I know some people say the prompt should be a sentence describing what you want. This doesn’t make a lot of sense to me because I think the A.I. just acts like a search engine. It will break the sentence down, looking for the prominent words while filtering out most of the words that are unnecessary in the search. To be fair, I don’t know if this is the case or not. So my next prompt was ”a landscape photograph of pine trees” which gave me this.
Looks like the depth of field changed a little bit, but that could be the algorithm handling the noise in the image slightly different for this round. Not a significant enough of a change for me to say composing a sentence is better or worse than just using key words.
Another thing you can do is weight the specific key words, essentially telling the A.I. what you want to emphasize in the image. For the next prompt, I used “”landscape:1, photograph:1, pine trees:2" indicating I wanted the pine trees to be the most important thing in the image while giving equal weight to the words landscape and photograph. Here is what I got.
Not a big difference here from the previous two prompts. Maybe I didn’t emphasize the pine trees enough? The next time, I tried “landscape:1, photograph:1, pine trees:5” and got this.
I guess it thought it was making the trees important enough already! Still not what I was looking for. At this point, I started to wonder what types of images would the A.I. be pulling up to reference by using these words. Without getting too much into the details, the A.I. is “trained” on a specific set of images. If I could search that set of images using my keywords, I might be able to see what the A.I. is “thinking” or “looking at” when I use those words. The closest reference I found was https://knn.laion.ai. This may not be an accurate database of images it was modeled on, but it’s what I could find at the time.
Once I found that, I entered my prompts into the search bar to see what was returned. Using the existing prompts, I was getting a lot of illustrations or advertising images that weren’t landscape photographs. By throwing different search terms into that site, I came up with “pine forest mountains” which returned quite a few photographs close to what I was looking for. Here’s what I got back when I used that as a prompt.
This is pretty close to what I was looking for composition wise, but the image is terrible. I decided to deviate from my existing searches a little bit and enter a specific artist to reference.
This seems to be a pretty big factor in the type of image the A.I. will generate. When you enter an artist name as part of your search term, I believe that gives the A.I. a smaller, more specific set of images to reference. Instead of using a large selection of images that can vary widely in style, the artist name filters that down to a very specific style, assuming the artist has a consistent style, and a lot of images online.
Another highly recommended term to add to images is “trending on artstation”. Artstation is a web site for digital artists to post their work and has a wide variety of digital art. This also leads me to believe that when you use those terms, the images the A.I. will reference won’t be advertisements, or clip art that isn’t necessarily good as a reference.
Because of the above, my next search term was “pine forest mountains, painted by ivan shishkin” which gave me this.
This is definitely closer to the composition I was looking for, but is very painterly. My goal is to make this image less like a painting, and more like a photograph. That is proving pretty hard to do though, because it feels like the A.I. is crappy at composing images from photos (photo-bashing), but is pretty good and “painting” images. After searching for a bit, I found another artist that painted some pretty realistic landscapes. I tried ”pine forest mountains, sky photo, asher brown durand” and got this.
A lot closer to my initial composition and what I was looking for, but still pretty painterly. This is understandable because the image is based on the work of a painter.
Another suggestion I had read about is always having at least two artists to reference in the prompt. This time, I figured I would add realism to the prompt as well to try and get paintings that were obviously more realistic. So, I tried “realism landscape, sunset, asher brown durand, ivan shishkin” and got this.
Despite looking a bit desolate, and like a bomb went off, I really like this. The sun is setting behind the trees, the sky has a nice transition, etc. It’s getting away from my original vision, but at this point, I like what it’s churning out so I really don’t care that much. Ha ha ha!
Trying to get back on track and use a different approach, I tried removing the painters and adding in landscape photographers. Unfortunately when searching a landscape photographers name, I usually got a profile photo of the photographer. That wasn’t going to work, so I added some words to hopefully get their work, and not their profile pic. My next prompt was ”pine forest mountains, sky photo, thomas morse landscape photography, Massimo Pelagagge landscape photography, Marcin Sobas landscape photography” and I got this.
Ugh! I really like the clouds and sky here, but the trees are terrible. One step forward and two steps back I guess. This just reinforces my thinking about the A.I. being terrible at photo-bashing, which is totally understandable.
After that attempt and more searching on my own, I decided to cut down the prompt a bit and get more specific with “alaska forest photo, sky photo, matt Payne landscape photo” and here is what I got.
The sky is terrible here, and while the trees look realistic, I hate the composition and overall look. At this point, I gave up on trying to shoot for photorealistic. I’m just not skilled enough in any aspect of this yet, but will continue to research.
I decided to go back to a previous prompt, and adjust the settings to see if I could make the image have a little less contrast. I used “realism landscape, sunset, asher brown durand, ivan shishkin” again, but set the clip_guidance_scale super low to 200. That was obviously not enough to create enough detail in the image because I got this.
Using the same prompt, I upped the clip_guidance_scale to 1000 and got this.
The overall image did not get to a point I liked, especially the sky here. I again used the same prompt, but upped the clip_guidance_scale to 2500 and got this.
This was as close as I could get to something I felt was visually pleasing given my initial image, and a good starting point for working in Photoshop. Photoshop will allow me to make small adjustments here and there that would just take too long to try and make by rendering out a whole new image each time. This is not a photorealistic image which is what I was initially trying to achieve, but I don’t have the skills either in the settings or choosing the right prompts to get there right now. But it sure is fun trying!
Continuing on, I decided to see what the A.I. would come up with on its own without an initial image directing the composition. I left all of the settings the same, used the same prompt, but only removed the initial image and skip steps. Here is what the A.I. spit out.
Wow! I really like this! There is quite a bit of noise, the contrast is high, and a few other things, but I really like what this created. Like I said in the beginning, you can get some awesome results by just using prompts, which is why some people are reluctant to give them out.
From this point forward, I kept modifying different settings to try and remove some noise, lower the contrast, etc., but nothing was making any significant changes that really stood out. I felt the small changes like that could be made more easily in any photo editing program without having to tweak then render, tweak then render, again and again. So that is where I decided to stop for the day.
Hopefully this article gave you a little better understanding of Disco Diffusion and the changes prompts can make in an image. I am constantly learning something new every day and hope you do as well. Thank you for reading!
James