Scenario 5: Avatars
In the Stock Photos chapter, I introduced the img2img method which can solve some stock photo issues and is also useful for other scenes.
This chapter focuses on avatar prompts using this method. We'll also start expanding our prompt structure to make it more complete.
Important Notesβ
Before introducing img2img applications, I want to emphasize:
- Don't upload your own photos to public Discord Midjourney servers! This exposes them to all server members. I recommend using the Midjourney bot instead.
- Also, images generated by non-pro members are viewable by all members. You can delete images after generating your avatar. See the Basics section if you don't know how to use the bot or delete images.
- I won't elaborate on the img2img process here. Check the Basics and Stock Photos chapters if you need a refresher.
3D Cartoon Avatarsβ
First I'll share cartoon avatars. Let me emphasize:
- I've looked at almost all Chinese and English avatar generation tutorials, tried them myself, and discussed with the Midjourney community. My understanding is that with V5's current capabilities, no matter how you tweak the prompt, randomly generating an image very similar to the original is mostly luck. Even using the techniques I'll share, you just increase the probability. If you know a reliable method to generate highly similar avatars, please share via the GitHub issues - I'll credit you by name and share it with everyone.
- During your learning process, if the generated images don't resemble the original, don't be discouraged. This is normal.
- You can use the methods I share to create avatars that capture the spirit of the original, but they definitely won't be highly similar.
In the prompt, include the original image link (ID photos or simple backgrounds work best - higher success rate). Then design a prompt using my framework::
Prompt | Explanation | |
---|---|---|
What is the type? | Portraits / Avatar | For an ID photo you can include "portrait" or "avatar" |
What is the subject? | smiling cute boy, undercut hairstyle | Optional - you can initially leave this blank, then add descriptive words if the output doesn't match, like gender, appearance, hairstyle, accessories (glasses, earrings), expression. Note - focus on distinctive traits, getting these right means higher similarity. |
What is the background? | white background | I kept the white background like an ID photo, but you could add real backgrounds like restaurants. |
What is the composition? | null | We provided the image already so no need to specify. |
What lens? | soft focus | Soft focus means using a soft focus lens for a sharp yet soft, dream-like effect. Often used for portraits. I added this to soften the image, but you can omit it. |
What is the style? | 3d render,Pixar style | Since we want a 3D avatar I added "3d render" and my preferred Pixar style. |
Parameters | βiw 2 | iw sets image vs text weight. Higher values favor the image. Details in the Advanced Parameters chapter. |
Finally, three additional techniques:
- If the generated images don't match your provided image, pick the closest one, click V (Variation) to have the model generate more variations, pick the next closest, and repeat until you get a good match.
- Strangely, if the above doesn't work and the outputs still don't match, try adding "wear glasses" to the prompt. This matches much better for me. If your original has glasses, try βno glasses to get less similarity.
- Use multiple parameters together - I'll expand on this in Technique 8.
At the end of this chapter I have an example using my ID photo.
Tip 7: Using Multiple Parametersβ
When generating avatars with img2img, I found the issue was "text weight is higher than image weight", so the outputs didn't match the original. With iw, V5 caps image weight at 2. So I tried using the s parameter, and it improved results a lot.
If the image still doesn't match, you can try adding βs 200 to βiw 2. Note no comma between parameters. I've found adding s makes it much more similar - my guess is using s and iw together reduces the text weight.
Higher s values make the output more stylistic and abstract. So if it still doesn't match, try increasing s - e.g. to 500.
I want to demonstrate that combining parameters can have a synergistic effect, further enhancing the model's capabilities. Consider potential combinations when new parameters are introduced.
Anime Avatarsβ
The modifications are mainly to the image style:
Prompt | Explanation | |
---|---|---|
What is the type? | Portraits / Avatar | Same as 3D Cartoon Avatars |
What is the subject? | smiling cute boy, undercut hairstyle | Same as 3D Cartoon Avatars |
What is the background? | white background | Same as 3D Cartoon Avatars |
What is the composition? | null | Same as 3D Cartoon Avatars |
What lens? | null | Omitting soft focus lens since this is anime style |
What is the style? | anime, Studio Ghibli | Target is anime avatar, so added anime and Studio Ghibli style |
Parameters | βiw 2 βs 500 | Don't comma separate when using multiple parameters |
Cyberpunk Avatarsβ
One of my favorite styles - just modify the style and background:
Prompt | Explanation | |
---|---|---|
What is the subject? | cyberpunk robot face, holographic VR glasses, holographic cyberpunk clothing | Added descriptors for the face, VR glasses, cyberpunk clothes |
What is the background? | neon-lit cityscape background | Added a neon cityscape for a cyberpunk vibe |
What is the style? | Cyberpunk, by Josan Gonzalez | Added cyberpunk style and one of my favorite cyberpunk artists, Josan Gonzalez |
Here are the avatars I generated. To be honest, I don't have many distinctive features or good looks, so the outputs all look vaguely Southeast Asian. Midjourney still doesn't seem great with Asian faces! π
Tip 8: Modifying Images Using Seedβ
Note: I think this technique has potential but is currently unreliable in Midjourney. The official help docs also mention seeds are very unstable in V5. See my Midjourney FAQ chapter.
You may encounter situations like:
- You input a prompt and get 4 images
- One image looks decent but the rest unsatisfactory, so you tweak the prompt
- But now all the new outputs are unsatisfactory, which is frustrating
- You wish you could modify one of the initial outputs
In theory you should be able to provide a seed from initial outputs for the model to further modify.
For example, with the cyberpunk avatar I first generate images using the original prompt. Then in Discord, click the emoji button in the top right of the message (Image 1 below), type "envelope" in the text box (Image 2), and click the envelope emoji (Image 3). This makes the bot send you the seed number.
Then I modify the prompt to change the background to Chinatown. Note:
- The new prompt includes the original, only the background portion is changed.
- Add the seed parameter.
Original prompt:
{img url} avatar, cyberpunk robot face, holographic VR glasses,holographic cyberpunk clothing, neon-lit cityscape background, Cyberpunk, by Josan Gonzalez --s 500 --iw 1
New prompt (replace seed with your actual number):
{img url} avatar, cyberpunk robot face, holographic VR glasses,holographic cyberpunk clothing, China Town background, Cyberpunk, by Josan Gonzalez --s 500 --iw 1 --seed 758242567
Here are the results - the background did change, but the face also changed a bit! π
The effect isn't great, but I think it's worth exploring as this could improve iterative refinement:
Tip 9: The Mysterious Blend Featureβ
To be honest, I hesitate to call this a technique - it's a very unstable Midjourney feature. But it's quite important so I'll introduce it.
It's simple to use - in Discord, type /blend then click the blend menu:
Your input will change to this:
Click the boxes to select two images from your computer, then hit enter to generate the blend.
Then Midjourney generates this awesome result with Fusion Iron Man on the left and Fusion Buzz Lightyear on the right:
I was shocked when I first saw this - it's like fusing monsters together in Yu-Gi-Oh! Unfortunately it's very unstable and only works if multiple unknown conditions are met.
I wanted to use this for avatars but found blending my own photos with other styles rarely works well. It seems best with celebrity photos, likely because Midjourney has been trained extensively on those.
But I think it has great potential for easily creating unique avatars by blending your photo with something else. Unfortunately it's not very usable yet.
It also has many other applications which I'll cover later.