Prompt engineering your boss
Veith Röthlingshöfer
Posted on December 21, 2023
Our designer started playing around with Midjourney to generate caricature stickers of our team in Octomind’s app trademark colors during one of his breaks. These were intended to be a fun addition to our Slackmojis, FigJam stickers, and Graphite’s meme collection.
We usually lean into the useful side of generative AI when working on our AI-powered UI testing tool. But hey, it's the festive season, so why not have some fun with it and share it with you guys.
(DISCLAIMER: At Octomind, we pride ourselves on having a friendly work environment where we get along pretty informally. This might not be the ideal Christmas present for your boss!)
Here are a few examples to give you an idea of what I’m talking about:
Our designer would upload a reference image and provide small prompts like <image link> <core emotion>, cartoon sticker style, mint green and dark lavender
. He would then iterate through numerous attempts until he arrived at something that looks about right. Some of the iterations were downright terrifying, but that's all part of the process.
I, for one, went through several iterations before arriving at a sticker that somewhat(?) resembles me. It created gems like these:
Side note: Midjourney has a penchant for generating plants and other random items in the background. I suppose it's because the embedding distance from "lavender" to "plant" is quite short.
All in all, it’s been a lot of fun, so we thought our boss, the co-founder & CTO, might enjoy it too.
What do you say, Daniel R.? Are you up for some caricature stickers of yourself?
He turned out to be a tough nut to crack. Whatever Midjourney generated, it never resembled him at all. Daniel became known as ‘the Unmidjourneyable’.
After playing around with it for a bit on my train ride back home from our Christmas party I found a workflow that produced decent results. It took several tools and little tricks I’ll share with you here.
Midjourney vs. Dall-E 3
We use Midjourney to generate octopus imagery within our brand style. The typical Midjourney style is usually getting us closer to visual consistency. OpenAI’s Dall-E 3 is, however, a lot better at understanding complex prompts than Midjourney is. So I knew that I would probably want to start there.
Simply combining the image and prompt in Dall-E wasn't producing the results I wanted. The model seemed to struggle with the requested "mint green and dark lavender color palette" more than Midjourney did.
I also had to be a bit careful with my wording here. GPT doesn’t want you generating images of other people and stops the prompt.
Style transfer with Dall-E
First, I needed a way to generate something in a similar style to what we want. We already have a few example stickers where Midjourney worked well (shout out to my colleague Fabian’s face for being very prompt-able!), and I already knew how from a private project.
You can ask Dall-E to analyze an image and generate a detailed description of the art style. Then, I used that description as a guideline for the art style in a secondary image generation step. This technique also allows you to keep the style relatively consistent between new chats and image generations!
The prompt I used was:
Me: Describe the style of this image. Do not focus on the actual content, only on the style, colors and other artistic elements. Name some art styles and artists that fit this image. Be as detailed as possible.
Response: The image uploaded features a caricature-style illustration with exaggerated facial features commonly found in modern caricature art. The subject's head is larger in comparison to the body, emphasizing the facial expression, which in this case is one of anger or annoyance. The use of bold, dark outlines to define the shapes is reminiscent of comic book art, particularly the styles found in graphic novels or mature-themed comics.
The colors are vivid but not overly bright, suggesting a more subdued palette that enhances the serious tone of the expression. The use of shading and highlights on the face gives the image a three-dimensional feel, adding to the impact of the expression.
This style of illustration might be influenced by artists like Mort Drucker or Tom Richmond, known for their work in MAD Magazine, where such exaggerated features and expressions are a hallmark. However, it also contains elements seen in the works of street artists who often employ bold lines and vivid colors to create impactful imagery.
In terms of art styles, it aligns with Pop Art in its boldness and accessibility, as well as with Expressionism in its focus on representing emotional experience rather than physical reality. The style does not fit directly into the classic fine art categories but has a contemporary edge that is often seen in digital art, graphic design, and illustration.
Next, read the response carefully and adjust it as desired. Dall-E sometimes sneaks in phrases you do not want to have in your style description. In this case, we want to use it as a style guide later, so I replaced The image uploaded features […]
with The style should resemble […]
. Additionally, I didn’t want Daniel’s sticker version to be angry, so I replaced [...] in this case is one of anger or annoyance
with [...] in this case is one of hype and excitement
.
I added some context to the following prompt template to generate our sticker:
You are an expert in creating amazing caricature sticker designs based on a textual description of a person and a desired style. Generate a sticker based on the description: <add description> And the style: <add style>
I will refer to this as “the template” from now on. Next, I had to make sure that the sticker looks similar to our co-founder.
Playing police
The only way to interact with either image generation model is through a textual description of what you want to see. Prompts like “make it look like the photo” don’t work that well as seen above, and I am not that great at describing a person’s looks that I would trust it to be an accurate and reproducible description.
I thought: “Who is better at drawing someone based on a description than a police sketch artist?”
Using the following prompt, I fed Daniel’s image to ChatGPT to get a description:
Me: You are an expert police sketch artist. You know everything about describing people to recreate a face from a textual description. Generate the perfect description for this man. Go over each aspect that makes him unique, describing as many individual physical features as possible. Use terms that set each feature in relation to how that feature looks on other people. Do not focus on the background. Make sure you could recreate this man based on the description you provide.
Response: The individual in the image is a male appearing to be in his mid-30s to early 40s. He has a fair complexion with a warm undertone, and his face shape can be classified as oval, with a slightly elongated chin that gives a hint of a heart-shaped contour. His hair is short and styled in a casual, slightly tousled manner, exhibiting a peppered ash brown color, indicative of the onset of graying primarily at the temples.
His forehead is broad and visibly unlined, leading to well-arched, medium-thickness eyebrows that are neatly groomed, slightly lighter than his hair color, suggesting they may lighten further with age. His eyes are almond-shaped, with a noticeable palpebral fold, and the irises are a cool hue, possibly blue or grey, framed by average-length lashes that do not appear particularly thick.
The nose is straight with a narrow bridge and a rounded, average-sized nasal tip, showing no signs of deviation or prominent bumps. The nostrils are symmetrical and of moderate size. His cheeks are relatively flat, with a subtle hint of cheekbones just below the mid-eye level. The man has a gentle smile, which slightly elevates the cheeks without revealing the teeth. His lips are of medium thickness with the upper lip slightly thinner than the lower, and the lip color is a healthy pink, in harmony with his overall complexion.
His ears are of average size, proportionate to his face, with the lobes attached and no prominent helix or antihelix features. His jawline is defined, though not sharply angular, leading to a well-proportioned chin that complements his face without being overly prominent.
He wears no visible jewelry or piercings and his attire suggests a practical outdoors orientation - he's wearing a dark olive green hooded jacket with the brand "Columbia" visible on the right side of the chest in white lettering, indicating a preference for functional, perhaps outdoor or athletic clothing.
The overall impression is of a man who takes care in his appearance without it being overly groomed or styled, presenting a natural and approachable demeanor.
Then, I used this description and the template to generate a first sticker version:
This is already closer than anything we managed to do with the picture and a short prompt, but still not really recognizable. I’d need a feedback loop!
Continuing in the chat where I generated the first description, I asked it to analyze the generated image, the photo and the description and update the description to clear up any misconceptions, but GPT-4 got confused about which image is which and just generated a new description for the generated image.
I needed a new trick.
Playing professor
Actually, I’ll pull not only one, but two tricks out of my hat.
It was very obvious that GPT is confusing which image I’m referencing at what point. I was referring to them as “the first” and “the second”, or “the photo” and “the illustration”, but that wasn’t working, especially over the course of several interactions.
First hack was to combine the photo and the illustration into one image using the open-source image editing tool GIMP.
Now, I could refer to them as “the photo on the left” and “the illustration on the right”, which really seems to reduce the confusion.
The second trick is to adjust the context of who's describing the image and generating the prompt. A police sketch artist may be great at drawing a person based on a description, but is not necessarily good at generating that description in the first place. So I turned it into a very critical professor instead, which produced a way more detailed result:
Me: You are an expert police sketch artist. You are currently giving a lecture on how to describe people so that a police sketch artist can accurately reproduce their face. To teach people, you are letting them perform the following task: "Write a description of a photo of a man that I have given you. Another police sketch artist will draw the man based on your description. I will criticize and improve your description after we have seen the resulting illustration". You are now criticizing the description one of your students generated. The description he generated is: .
Response: The description provided by the student is quite detailed and paints a clear picture of the individual in question. However, to refine it for increased accuracy in a police sketch, we can focus on a few key areas:
Facial Structure: The description mentions an oval face shape with a hint of a heart-shaped contour due to an elongated chin. To improve, we could specify the ratio of forehead to lower face to better capture the balance of facial features.
Eyes: While 'almond-shaped' is a good start, we could provide the relative distance between the eyes and the exact shape and orientation of the eyelids.
Nose: Instead of 'straight with a narrow bridge,' we might describe the exact length of the nose in relation to the width of the face and the shape of the nasal tip more precisely.
Mouth and Smile: The description of the smile is good, but the shape of the mouth when smiling can vastly change the perception of the face. Is the smile wide, does it extend beyond the eye line, etc.
Hair: More specifics on the distribution of graying and the texture of the hair would be helpful.
Complexion: We can clarify 'fair with a warm undertone' by giving examples or comparing with common complexion types.
Clothing: The description is adequate, but the position of the jacket zipper, the hood, and how the clothing sits on the body could help.
Here's an improved description: ... a slightly better description ...
It is generally a good idea to let LLMs break down the task into several steps. In this response, it did provide some very good criticism, but didn’t actually apply it in the improved description. The result using this description looked like this one below. The image was generated in the same chat as the previous version, without supplying the style instructions again.
So, I asked it to apply the criticism and generate a more detailed description.
Response: The subject is a Caucasian male estimated to be in his late 30s. His complexion is fair with a neutral undertone, leaning slightly towards a peachy warmth. The face shape is an even blend of oval with a subtle hint of heart-shaped, primarily due to the slightly wider forehead and a gently tapered chin which is softly rounded at the tip, avoiding sharpness.
His hair is cut short, with a texture that suggests a thickness in the strands. The color is a rich ash brown with a prominent sprinkling of gray, more concentrated around the temples and blending into the natural color towards the crown. There is a subtle side parting on his left, and the hair follows a natural growth pattern with a slight elevation at the front, giving a casual, tousled appearance.
The forehead is broad, occupying a third of the facial length, smooth, with no visible furrows or frown lines. His eyebrows are natural in shape with a soft arch, groomed but not sculpted, slightly lighter than the hair on his head, hinting at a sandy brown shade. They are of medium thickness, neither bushy nor sparse.
Eyes are of a size that balances with the other facial features, set at a moderate distance from each other. The shape is a classic almond, with upper eyelids having a prominent but not heavy palpebral fold. The eyes are reminiscent of the color of the summer sky, a clear blue with a grayish tint. The lashes are of average density and length, framing the eyes in a way that is noticeable but not striking.
The nose is well-proportioned to his face, straight, with a bridge that is of moderate width. The tip is rounded and slightly bulbous, which gives character without dominating his facial features. The nostrils are symmetrical and proportionate to the size of the nose.
His cheeks are smooth, with a very subtle definition of cheekbones aligned with the center of the eyes. They do not protrude but give structure to his face. The cheeks transition into a gentle smile that lifts the area modestly, causing a small crinkle at the outer corners of the eyes, which adds a friendly warmth to his demeanor. His lips are naturally contoured, the upper lip being slightly thinner than the full, more pronounced lower lip. Both have a healthy blush pink tone, and the smile does not stretch broadly across the face but is enough to show a hint of dimples.
The ears are in proportion with the facial size, with the tops aligning with the brow line and the bottoms with the base of the nose. The lobes are attached, and there are no distinctive markings or piercings.
His jawline is defined without being overly angular, leading to a chin that complements his oval face without becoming a focal point. There is a slight indentation below the lower lip, providing a subtle division between the chin and the rest of the face.
The attire, a dark olive green hooded jacket with a visible 'Columbia' logo on the right chest, suggests an affinity for outdoor wear. The jacket is worn in a relaxed manner, zipped to mid-chest, with the hood resting flat against the back. The fabric appears to be a durable, water-resistant material suitable for outdoor activities.
Overall, the man has a relaxed, congenial appearance, with well-maintained but not overly styled features, conveying an impression of practicality and an easygoing nature.
Now, this reads like something usable! Another good thing about this process is that you can iterate it until you are happy (someone should automate that cough, cough).
The final push
Using this last description, I got the following image from DALL-E:
It looks relatively recognizable, but is not as exaggerated as the original sticker, and it also lacks the mint green and dark lavender color palette. And as it turns out, Daniel is Barney Stinson’s long lost half-brother!
But we already know that Midjourney is quite good at molding images into caricatures and following a color scheme. Yet, the complexity of generating the sticker has been greatly reduced.
At the beginning, it received a photo and needed to abstract it into a sticker, a caricature, change the displayed emotions, change it to an illustration and switch to an unrealistic color palette. Now, it only needs to exaggerate the emotions and switch the color palette.
And it did a great job! Feeding this image in Midjourney resulted in what I was going for.
I used the prompt <image url> exaggerated hype and excitement, mint green and dark lavender color palette, sticker design, caricature --iw 1.75
I played with the image weight parameter –iw a bit, and found that 1.75 strikes a balance between sticking to the source material (--iw 2) and allowing the changes I described in the text prompt (--iw 0).
I also found that it helps to not only specify “mint green and dark lavender”, but to explicitly add the word “color palette”, which significantly reduces the amount of random plants in the background.
The only thing left to do is to cut out the sticker from the background with GIMP. Even if the background is not uniformly colored, it’s pretty easy thanks to the white outline of the person.
Voilà!
I added a little quote for more pizzazz.
Prompt engineering your family
With the holiday season approaching fast, I couldn't help but think this might be a unique gift idea. When I introduced the recipe to my team, one of my colleagues immediately began prompting her family for stickers as a Christmas gift.
Why not share this with people who supported Octomind in 2023?
I created this blog to share with Octomind friends, users and fans. This is our special, genAI way of saying: thank you.
Veith Röthlingshöfer
ML engineer at Octomind
Posted on December 21, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.