Use the Structure of an Image to Guide Gen AI Output
My focus in this recent series of articles on Behance has been on the practical and ethical application of the Generative AI technology from a photographer's perspective, revolving around the photographer's creative input and decision making. The technology I focus on is Adobe Firefly.
Building off Your Creative Work Using Structure References
I am a photographer who is also a frustrated illustrator; simply put, I can't draw my way out of a paper bag. I have great respect for those who can make amazing photographs and paint or draw beautiful artwork, like Victoria White.
So, when the new Structure Reference feature went live in Adobe Firefly, I was bolstered by the content I saw others creating, even from simple pencil sketches or line art. What does it do? Well, it helps you create new imagery based on your own imagery as the baseline, generative matching, if you will.
Using a simple slider, you control how closely Firefly sticks to your original image as it riffs off variations based on your text prompt. Let's take a look at a couple examples.
From Misty Marsh to Spooky Ghost story, and Back Again
This is an image I'm very proud of. So much so, it's in my current photo exhibition. But I wondered, using Structure Reference, what else could this become?
I started off with a basic prompt: A Ghost Story. The first set of results are always based on the prompt and what Firefly considers appropriate in terms of content type; photo or art.
But as soon as that first batch of images is displayed, I get access to the full control panel in Firefly, including Structure Reference. I can choose to start with a reference image from the gallery (figure 1). But I want as much of my vision as possible in these new creations, so I chose to upload my photograph (figure 2). Before I can make that first upload in this web session, I have to agree to the conditions of use (figure 3). Then, after I upload the Structure Reference and tap Generate again, Firefly uses my image as the context for the text prompt (figure 4).
As much as I was impressed by the default art type suggested by Firefly, it was critical to me that the results be more photographic (there's that human element, guiding this new creative process). Making one important, but easy to activate decision (changing the Content Type from Art to Photo), significantly changed my results, while keeping both the spirit of the prompt as well as major visual elements of my reference image.
You might be asking, "What major elements? Your photo was of a marsh, with trees and grasses and mist. This is an attic!"
And you'd be right to ask. This is image is a result of the Strength slider for the reference image. Notice in Figure 4 (above) that the strength is about 50%. This slider tells Firefly to incorporate the structure as a loose guide. Instead of angled trees, we get old timbers on the left of the image, baskets of dried grasses instead of a field, and mist floating through the window around the ghost. That slider is a simple but exceptionally powerful tool.
When I shift the slider all the way to the right, I'm telling Firefly to stick closely to the original image while keeping the text prompt relevant to the output. The level of accuracy and detail - in my opinion - is staggering. Yet, at the same point, I'm being presented with options I might not ever have thought of; moving to black and white, for example.
Sticking to the Plan
After the above exercise, I decided to see what Firefly could produce with a text prompt that was in keeping with the actual subject matter. I wrote: a spooky marsh with trees and fog in the background, setting the Structure Strength slider to 100 (all the way to the right). Note how the foreground trees and even the background was pretty faithfully rendered. It did add more water to the scene, but I liked the look.
Lastly, I went for an illustrative variation. I did not change the prompt, but I selected several visual effects and altered the color and tone controls.
Chicken Coop - Style Variations on a Theme
In this last exercise, I chose a less complex image, The Great Escape.
For this image I wanted to stay very close to the original image, but experiment with styles and add a bit more to the story. My prompt was chickens in a barn with a view of a lake.
I started with Photo as my Content Type, this image as my Structure reference (100%) and chose an old master acrylic and oil style. Like Structure, Effects also has a strength slider. For these examples I left the slider at the default of 50%.
I was delighted when Firefly isolated the two chickens and converted them to a framed painting, while leaving the third chicken as photo-realistic. This was totally unexpected, but I loved it. Are they perfect? No. As the human in this creative process, I have to recognize the flaws and decide whether they are worth fixing (one chicken has too many toes), or starting over.
Then, by merely changing the content type to Art, I had a completely different style representation. Switching the style to more of a watercolor effect gave me this ink and watercolor version, where not just the look, but the color scheme from the style was applied.
Changing from Photo to Art Type
Choosing a 3D style generated some very fun results.
Changing the style reference to a 3D/claymation style
From Stick Figures to Concept Art
You may recall, near the beginning of this article, I referred to my lacklustre drawing skills. Case in point:
I know, I know... avert your eyes if you need to, but this is the sad state of affairs when it comes to me creating a hand-rendered illustration. However, even a sketch this primitive can be used as a structure reference in Firefly.
I altered my prompts slightly and also applied different styles as my sketch "evolved."
Using the most basic of prompts - peaceful campsite - Firefly still recognized major elements from the structure reference: mountains, a tent, a big tree and a waterfall.
Throughout the examples, the general shape of the tent is maintained; not a dome style common today, but a more traditional A-frame style. Even the mountain peaks echo the line art pretty well.
I was even able to get a couple decent photo-realistic images, although I must confess, I used Generative Fill to add in the hiker in the last image. My stick-figure person was beyond the ken of Firefly, producing a wide variety of weirdness, especially in photos.
My prompts:
Peaceful campsite (too short - Firefly will nag you about brevity)
Peaceful campsite with mountains
Peaceful campsite with mountains in a magical forest and green grass during summer
Peaceful campsite with mountains in the winter
Side bar - I created these examples using Firefly in a browser on my iPad, while watching Survivor. You can tell where my head was at...
The Many Faces of Me
Another fun project that was only made possible with Structure Reference, was the ability to riff off my own portrait. Using a simple prompt and multiple Styles, I create more than a dozen versions of a recent photo of me. Then I pulled them into Adobe Express and made the short video you see below.
To Infinity...and Beyond!
The combinations and possibilities are quite literally, endless. But none of these results would exist without:
A) my original photograph or sketch
B) my creative input
C) my discernment regarding the direction and appropriateness of the generated content
I want to leave you with this; while Firefly created some amazing starter art, it would have created nothing without my intervention. It can only produce content at my behest, and only delivers better results as I refine my inputs, be it the text prompt, Structure or Style reference.
Don't run away from the technology. Don't ignore it. It's not going away. Instead, spend the time to learn what it can do and what its limitations are. Don't let what you could have done...haunt you later in life...