Scenario 1: Stock Photo
I have rewritten the Text Prompt section of the document three times and I am still not satisfied with the results. In the first two attempts, I tried to teach everyone how to write a Text Prompt for Midjourney using a few versatile templates. However, after numerous tries, I realized that:
- Stock photos contain a huge amount of information, and while the templates can help you generate an acceptable image, it is difficult to generate a completely satisfying image.
- To create a high-quality image, you need more than just prompt engineering; you also need some aesthetic knowledge. The parameters used in different scenarios are not the same, so sometimes using templates can be a bit rigid.
Therefore, I do not want to directly teach you so-called universal templates. Instead, I want to teach everyone how to write a good text prompt for different scenarios, because the information you provide to the model varies depending on the scenario.
Besides, I believe that knowing why is far more important than knowing how. When you know why, you can deconstruct the requirements of a prompt for new scenarios, instead of blindly applying templates.
Lastly, there are a wide variety of image scenarios, and I will do my best to share some that you can use in your daily work. I hope AI can improve your work efficiency and not just be a disposable toy.
Let's start by introducing a significant improvement in the stock photo scenario for Midjourney V5.
What is a stock photo?β
A stock photo is a photograph available in a stock library. You can usually find these photos on various stock image websites, and they are often taken by photographers or designers. Some of these images require payment if you want to use them due to copyright reasons.
The majority of users who use stock photos are design companies or advertising agencies. You've probably seen these types of images frequently, such as the classic handshake photo:
I believe AI-generated images have had a significant impact on stock image libraries, and the V5 version of Midjourney has mostly met my stock photo needs.
Tip 1: Imitatingβ
I think the best way to learn image prompts, similar to learning to draw, is not to use templates directly. Instead, use real images or images generated by others to imitate. If your English is not good, you can first write in Chinese and then have ChatGPT translate it. After imitating several images, you will gradually understand how to create similar images.
Let's take the handshake image above as an example. Let's carefully observe what elements are in the image:
- The main subject is two hands shaking together, and they appear to be two Asian men.
- Both men are wearing suits.
- The background looks like the main entrance of an office building. The two men seem to be shaking hands to say goodbye. The background is intentionally blurred, or it was taken with a camera.
Summarizing the key information:
- Subjects: Two Asian men in suits shaking hands to say goodbye
- Scene: Entrance of an office building
- Image style: Stock photo, taken with a camera, blurred background
With this, we can try to write a prompt (if you don't feel confident about your English, try using a translation tool):
stock photo of two Asian men in suits shaking hands,say goodbye in front of the main entrance of the office building,taken with Canon
Here is the result generated by Midjourney:
Hmm π€ This doesn't look like what we expected. Don't panic, this is normal when first starting with Midjourney. The key is to keep trying.
Let's analyze why Midjourney generated this image. Looking back:
- For the main subject, we want the "handshake", not the two people.
- Camera mode doesn't seem to be able to blur the background? That's more of an image style, like an old photograph in image 4.
Let's adjust the prompt, adding keywords for the focus and background blur:
stock photo of two Asian men in suits shaking hands,say goodbye in front of the main entrance of the office building, focus on two hands, taken with Canon, background bokeh
The new results are much better. Images 1 and 4 basically meet the requirements. Cropping images 2 and 3 could also work. Note that Midjourney still has some issues generating hands - in images 2 and 4, one person has 6 fingers! But I expect this will improve over time.
Let's summarize the prompt structure:
- Part 1 (red): Describes the main subject
- Part 2 (blue): Describes the background/environment
- Part 3 (yellow): Specifies the image focus
- Part 4 (green): Describes the image style or special requirements
By summarizing, you have a prompt template! π
Tip 2: Experimentingβ
The above example also demonstrates a second technique - experimenting.
When you get unexpected results, don't panic. Analyze the issues, then systematically adjust the prompts. Be patient. In the prompt above, there's one part I'm not sure you noticed - the "stock photo" at the beginning. What happens if you remove those words?
two Asian men in suits shaking hands,say goodbye in front of the main entrance of the office building, focus on two hands, taken with Canon, background bokeh
The results still meet the requirements, and the number of fingers is correct too. This shows "stock photo" doesn't have a big impact on the model.
Tip 3: Using Image2Imageβ
For stock photos, there is a very powerful technique. At first I didn't want to teach this because it has a big impact on stock photo sites π
But in the spirit of tool neutrality, and since this technique can be useful in many cases (like generating avatars), I think it's still worth sharing.
When using stock photos, you may encounter these issues:
- The photo has copyright and can't be used commercially, or requires payment.
- Some photos have been used by many people, so they are easily recognizable as stock photos.
- The content generally meets the needs, but details don't match - for example the two Asian men shaking hands could be changed to a man and woman, with one person being African American.
The best way to address these issues is to have the AI modify the original photo. You can do this using the Image2Image (or Blend) feature:
Later I saw a similar tutorial in a book, and realized people call this technique "priming". International users seem to call it Image2Image or img2img.
- Send the stock photo you want to modify to the Midjourney bot. I'll use the handshake photo as an example.
- Right click to copy the image URL, then paste it into the text box.
- Add a space after the URL.
- Then describe the changes you want, like changing one hand to African American and one to Asian female:
one Afican-American hand and one Asian woman hand
Here is the result. In the prompt I didn't mention anything about suits or the background. I just said I wanted one African American hand and one Asian female hand:
Aside from the 6 fingers issue, this is a very efficient technique right? Note that I've found the blend feature (which I'll cover later) works better for merging two images rather than image + text, so you'll need some patience and experimentation.