Preface: AdVenture Media is hosting a digital advertising summit in NYC on November 10th. It’s called DOLAH, and the event landing page is here. At this very moment, I should be preparing DOLAH content, but I am too damn excited about this topic that I’m making an exception. Please do me a favor: check out DOLAH and consider joining us in person or on the live stream.
AI is a funny buzzword. For the last 40 years—going back to around the time of the original Terminator movie—we’ve talked about AI as if it’s something in the distant future. As Ben Thompson of Stratechery often says, every time an AI product becomes real, we stop calling it AI.
Go back in time and talk to someone from 40 years ago. A technological optimist might have fantasized about an AI that retains all of the world’s information and can answer any of your questions immediately.
Oh, you mean Google Search?
…or perhaps, an AI that knows where you’re headed and can help you avoid traffic, tolls, or road closures along the way.
Oh, you mean Waze?
… or perhaps, an AI that builds your very own Blockbuster video store for you, stocked with your favorite classics and personalized recommendations based on your interests.
Oh, you mean Netflix? We have the same thing for music too; it’s called Spotify.
We don’t think of these things as AI; I’m not sure why. One theory is that the UI of these products hides the flaws and the optimization taking place in the background. So long as Google Search eventually gives us our answer, or we finally settle for that Netflix rerun, we quickly forget any lackluster or irrelevant suggestions that we passed over in the process.
But a picture is worth a thousand words. The popularization of text-to-image AI has made AI incredibly tangible for even the most passive bystanders. For the first time ever, we can all see the AI working, learning, and optimizing. Like a peanut gallery, we can point to its flaws and ogle at its wonder.
I’ve been obsessed with these text-to-image products and believe they’re a true harbinger of how all of us will work with technology in the future.
Nat Friedman was an executive at Microsoft and the CEO of Github. During a recent interview with Ben Thompson and Daniel Gross, he mentioned the light-bulb moment when his entire perspective on AI changed.
Engineers at Github were attempting to build an AI to assist programmers in writing Python code. Programmers often have two or three pages of code in their heads as they work—the time it takes to dump all of those ideas from their heads and into a computer screen is really just busy work.
Busy work is a textbook use case for AI.
…what if we can build an AI that makes it easier and/or faster for a programmer to get their ideas written out as code?
Friedman described the early beta as either “spooky or kooky.” It was spooky when the AI could accurately predict what the programmer was attempting to write.
But when it was wrong, it was really wrong. Not even close. Kooky.
And as it goes with many machine learning programs—especially in the early stages—it was wrong a lot more often than it was correct.
Friedman found that the kooky results often frustrated the human programmers and caused them to distrust the potential of the AI altogether. The programmers felt as if they were delegating tasks to an incompetent junior programmer, which eventually wears on you and forces you to think: screw it, I’ll just manually write this out myself!
David Gross, who previously headed up machine learning projects at Apple, offered a poignant comparison:
“[it’s more like] a schizophrenic software engineer that’s sometimes brilliant and sometimes actually writes Japanese instead of Python.”
Python code is either correct or incorrect; there is no almost. If it’s incorrect, it shouldn’t matter whether it’s missing a single semi-colon or writing the entire code in Japanese. Wrong is wrong.
It’s easy to see how a human programmer might become frustrated or lose interest. However, even if the AI could generate the correct result some of the time, it could still be useful. (And in theory, its success rate should improve over time, even if by a small margin.)
No AI will ever be correct 100% of the time; therefore, we should level our expectations and understand that when it’s wrong, it might be kooky wrong… and that’s OK.
Friedman realized that the programmers were having a dialogue with the AI as if they were sitting on opposite sides of the table: they would tell the AI to do something, and more often than not, it would do that thing incorrectly.
He later realized that the relationship between the two was fundamentally broken. Humans shouldn’t be on opposite sides of the table from the AI: they should be working together as co-pilots.
The Copilot idea is the opposite. There’s a little robot sitting on your shoulder, you’re on the same side of the table, you’re looking at the same thing, and when it can, it’s trying to help out automatically. That turned out to be the right user interface.
The Github engineers used this logic to build a useful AI. They called it Copilot.
Then it really wasn’t until February of the next year that we had the head exploding moment when we realized this is a product, this is exactly how it should work… So now, it’s very obvious. It seems like the most obvious product and a way to build, but at the time, lots of smart people were wandering in the dark looking for the answer.
This happened pretty recently, by the way. Copilot was released in June of 2021, and the smartest people at Github and Microsoft took a while to crack the code.
We’re currently living through an exciting technological revolution, and we’re all still figuring it out together.
The Copilot example is on one end of the extreme. For an AI to be useful in writing code, it needs to generate perfect code—not every time, but at least some of the time.
If 50% of the time, it produced kooky results, and the other 50% of the time it produced almost correct results that still included typos, then it would still have a 100% fail rate. If the AI produced perfect code just 10% of the time, then it would have a 10% pass rate.
Art is a different story. A piece of art—whether an image, music, or poetry—does not need to be perfect and is not judged as pass/fail. Art can have flaws; it can be good enough. The range of possibilities for AI-generated art, then, is massive.
Three primary text-to-image AI products are dominating the headlines today: DALL-E, Midjourney, and Stable Diffusion. DALL-E was released in January 2021 by an organization called OpenAI. Midjourney and Stable Diffusion were born in the summer of 2022; they’re still babies.
This image, taken from the DALL-E Wikipedia page, was produced from the text prompt:
…Teddy bears working on new AI research underwater with 1990s technology
Not perfect, but … holy cow!
I downloaded Midjourney six days ago. I had no idea what I was doing. I also do not have a background in design or photography and suddenly found myself staring at a blank canvas completely at a loss for words.
Midjourney is managed through a platform called Discord, an instant messaging platform popular with gamers. The Midjourney product feels like a giant chatroom where you can simply watch artists suggest sophisticated prompts to the AI and edit their visions in real-time. It’s fascinating.
Separate from the chatrooms for designing are the chatrooms reserved for open discussion. This is where artists collaborate on how to work best with the AI. Here is a thread where artists share their suggestions on generating realistic images of tornados:
I lost myself in an endless stream of prompts and forums and began picking up on a few themes. Artists would use descriptive context clues to describe a vision and then iterate or optimize the image with multiple variants. The results would often be random, but by changing your prompt or generating similar variants of a previous iteration, they would quickly find something that they were satisfied with.
Inspiration struck, and I wanted to generate an image of my favorite author, Kurt Vonnegut, sitting in one of my favorite places, the Adirondack mountains. As a complete newbie, it took nearly 30 minutes of revisions and Discord research to properly convey my vision into a prompt.
… Kurt Vonnegut sitting in an Adirondack Chair on a dock in the Adirondack mountains at golden hour, his back to the camera, feet on the dock, looking over at a group of ducks on the right side of the scene
This is, more or less, what I was envisioning. I was instantly hooked.
Throughout the week, I was constantly thinking of images to create; it was clear that I was becoming more efficient with my prompts and the details I was feeding into it. My experimentation included the following:
… teenagers on a rollercoaster taking a selfie with a GoPro camera
… Malcolm Gladwell playing chess in a kosher restaurant
And perhaps more symbolic:
… a hopeful phoenix rising above ominous ashes, dystopian and symbolic
It is tempting to jump to crazy extremes about what this technology might do: Whoa! Is this the beginning of the end of graphic design?!?
Text-to-image AI is another AI that has the potential to sit on our shoulders and help us go about our lives with more creativity and/or productivity, but it won’t necessarily replace an entire industry of jobs tomorrow.
But I would agree that this does change the game. This technology will not only alter how we work with digital imagery; it is a harbinger of how marketers should view and work with any artificial intelligence technology.
We are on the precipice of a technological renaissance. Tools like DALL-E, Midjourney, and Stable Diffusion are putting powerful AI in the hands of anyone, making beautiful artwork and creativity accessible to those that would have been deemed unartistic and uncreative up until this very moment!
This isn’t just democratizing for the end-users; it’s also democratizing for visionaries, creators, and entrepreneurs. DALL-E, Midjourney, and Stable Diffusion are not just three distinct organizations—they’re three distinct organizations that have been born separate from the likes of Google, Meta, Amazon, Apple, Microsoft, or anything loosely related to Elon Musk.
In the aforementioned Stratechery interview, Ben Thompson highlighted the significance of Stable Diffusion as the third player to successfully enter the space:
That was the big change. People had to wake up to the fact that it could be democratized. Stable Diffusion might end up amounting to nothing, but it will be one of the greatest products ever just because it will have changed so many people’s minds about what was possible.
David Gross added:
Stable Diffusion is that woman in the 1984 ad who just smashed the screen and everyone realized, “Wait a minute. This is for the taking for everyone.”
In a separate article titled The AI Unbundling, Thompson describes the steps, or value chain, that it takes from an idea to turn into something that is consumed by an end user. The value chain includes five parts: Creation, Substantiation, Duplication, Distribution, and Consumption.
Technology has the power to bundle or unbundle different components of the value chain.
Take, for example, newspapers. An editor Creates an idea for a story; they use a typewriter to Substantiate that story into a written format; a printing press allows for Duplication; a paper boy (or girl!) uses a bicycle or an automobile to Distribute that paper to the final consumer.
The internet bundled much of this together. Twitter is a platform that allows anyone to become a writer and Substantiate (write) their ideas, which can be instantly Duplicated, Distributed, and Consumed by anyone with access to the internet.
The bottleneck still exists between Creation and Substantiation. For someone to gain a following on Twitter, for example, they need to have great ideas and be a wordsmith who can craft those ideas into 280 characters. These are two different skills that are not mutually exclusive.
With digital imagery, a successful artist needs to have both great ideas and the proficiency of a tool like Photoshop to Substantiate those ideas into reality.
There is no shortage of great ideas, whether artwork, app design, or solutions to climate change; the challenge is Substantiation.
AI can help fill the gap. In some cases, it can do the Substantiation itself (as is the case with Copilot). In other cases, it can aid the transition between Creation and Substantiation.
I’ll be the first to admit that my Midjourney artwork is mediocre at best; however, was it successful in translating my idea into something visual? Absolutely!
My wife happens to be an extremely talented baker—she designs and decorates sugar cookies (check her out @aubscanbake). It’s currently spooky season, and we turned to Midjourney for inspiration in turning everyone’s favorite Disney witches into delightful desserts.
… Hocus Pocus Sanderson sisters as sugar cookies
(She hasn’t had the time to make these yet. I’ll update the post if she does.)
Is Midjourney perfect? No. Is it helpful? Yes!
Such is the story with many AI tools!
If you’ve followed my writing or read my book, you’ll know that I am a technological optimist. I have long argued that marketers have a responsibility to learn how to work with AI instead of simply expecting it to work for them.
Before you can determine if a particular solution is useful or not, you must first learn about the technology with an open mind and consider the possibility that an eventual failure might be your fault, and not necessarily that of the AI.
Nat Friedman’s Copilot analogy—AI should work with us, not for us—and Ben Thompson’s description of AI’s role in bridging the gap between Idea Creation and Idea Substantiation are two frameworks that can apply throughout the marketing community and beyond.
For example, Google’s latest ad products obfuscate search term data. Marketers hate this, but I am not at all surprised by this trend.
Marketers running Google Ad campaigns often cite two primary use cases for search term data transparency:
The latter use-case is rooted in fear of kooky results!
In my opinion, an ad click is pass/fail, like Copilot—it either contributes to a future conversion event or it does not. Any ad click that does not contribute to a conversion, regardless of how kooky that search query may have been, is an object failure. The AI will learn and improve over time, and the only thing that should matter is the final result.
Practically speaking, Google’s decision-makers might be afraid that a handful of kooky search term results will cause you to turn off your Performance Max campaigns or take your budget out of Google Ads entirely, even in cases where your campaigns are generating profitable results. (This is just my opinion, of course. Google would never publicly comment on such an issue.)
I'm not suggesting that marketers are better off without search term data, nor do I think that the final result (performance) is really all that matters. I also wish I had that data!
However, I do recognize Google's position and understand that this might be one of several contributing reasons why this data is obfuscated (other reasons might include privacy, the massive costs associated with cleaning, storing, and presenting that data, etc).
If Text-to-Image AI can help break down those barriers and help marketers understand more of the nuances behind AI as a whole, then it will show progress and be a win for everybody involved.
Now, this really changes everything. Creative development has always been a struggle for many digital marketers, and I’d argue that good creative matters much more now than it has any point over the last 15 years.
Substantiation has always been our primary struggle. A creative strategist might have a great idea, but it takes a great graphic designer to bring that idea to life. Even if you have access to both of those things, these people still need to be able to work well together and communicate effectively. The strategist must properly convey their vision in a way the designer understands; direct feedback is crucial.
If you’ve ever filled either of these roles, you’ll know that the back-and-forth communication can be frustrating. What’s more, it’s incredibly time-consuming.
… What if an AI could quickly translate the vision of a creative, which could then be handed to a graphic designer to turn into reality?
We have a client selling tennis equipment; I'd like to create a new set of ads under the theme: "The future of tennis!".
I envisioned a tennis court in the middle of a futuristic city, with players using our client’s equipment. Seems simple, right?
But I have a specific aesthetic in my mind, so I went to Midjourney. After just 10 minutes, I was able to illustrate my vision:
(Note: This is not even close to a final product; this simply illustrates my vision for what the ad should look and feel like.)
When you read “a tennis court in the middle of a futuristic city,” maybe this was what you had in mind; maybe not. In my experience working with graphic designers, the first Substantiated draft is rarely close to what you, the Creator of the idea, originally had in mind.
That wasn’t the case with Midjourney either! Here’s how Midjourney’s AI interpreted my original prompts, which had to be altered and optimized with my feedback over time:
By the last image, you can see my final vision coming into view, but it took a few weird twists and turns through soccer stadiums, the Star Wars universe, and what looks like the inside of a server room before we actually got there.
A human graphic designer may have followed a similar path, and I wouldn’t blame them for it.
The difference is that I could arrive at my final image within 10 minutes. If I had used a graphic designer for this part of the process, it likely would have taken several days and several frustrating conversations.
I sent the image of my vision (along with a brief write-up) to one of our graphic designers a few days ago. They understood the direction, were thankful for the help, and were several steps ahead of turning that vision into a great advertisement.
Thanks to the buzz and sudden accessibility of this technology, mass adoption is spreading like a virus. Those that are already artistically-inclined will rise above the rest and create the most stunning images, while tons of newbies like me will become addicted to the idea of creating anything. Many will find new ways to use this technology to creatively solve problems irking their daily lives, as I have already seen over the last week.
There will be debates and pushback, and someone is bound to get in trouble for generating something off-color or offensive, but I am confident that this technology will be a net-positive for all of us. Time will tell.
In the meantime, I’m too excited about all of this to sit still. This morning, I woke several hours before my alarm, and after tossing around a bit, I staggered out of bed into the kitchen. I opened the shades and glanced outside to our red canna-lillies, remembering that I need to cut them down before the weather gets too cold.
I poured a coffee, opened my laptop, and started Substantiating all of my excitement into this blog post. The scene looked something like this:
… Man sitting at kitchen table, hovering over laptop while typing a blog post at sunrise, coffee mug on the table, in the background through the large window is a patio with red flowers blooming
I downloaded Midjourney just six days ago. This is only the beginning…
We'll get back to you within a day to schedule a quick strategy call. We can also communicate over email if that's easier for you.