惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Fox-IT International blog
Recent Announcements
Recent Announcements
D
Docker
IT之家
IT之家
B
Blog
Jina AI
Jina AI
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
博客园 - 【当耐特】
Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
量子位
C
Check Point Blog
Microsoft Azure Blog
Microsoft Azure Blog
罗磊的独立博客
博客园 - 司徒正美
李成银的技术随笔
美团技术团队
Blog — PlanetScale
Blog — PlanetScale
雷峰网
雷峰网
The GitHub Blog
The GitHub Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
T
The Blog of Author Tim Ferriss
酷 壳 – CoolShell
酷 壳 – CoolShell
MongoDB | Blog
MongoDB | Blog
P
Proofpoint News Feed
L
LangChain Blog
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
Y
Y Combinator Blog
大猫的无限游戏
大猫的无限游戏
有赞技术团队
有赞技术团队
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
Visual Studio Blog
T
Tailwind CSS Blog
H
Help Net Security
Engineering at Meta
Engineering at Meta
小众软件
小众软件
B
Blog RSS Feed
Stack Overflow Blog
Stack Overflow Blog
月光博客
月光博客
M
Microsoft Research Blog - Microsoft Research
宝玉的分享
宝玉的分享
人人都是产品经理
人人都是产品经理
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
GbyAI
GbyAI
H
Hackread – Cybersecurity News, Data Breaches, AI and More
Last Week in AI
Last Week in AI
Martin Fowler
Martin Fowler
Stack Overflow Blog
Stack Overflow Blog

OpenAI Developers

Realtime prompting guide OpenAI Developers plugin for Codex API deployment checklist | OpenAI API Codex for (almost) everything Sora 2 Prompting Guide Codex Prompting Guide Introducing the Codex app Codex in JetBrains IDEs Docs MCP | OpenAI Developers Gpt-image-1.5 Prompting Guide GPT-5.2 Prompting Guide Transcribing User Audio with a Separate Realtime Request Modernizing your Codebase with Codex Codex code review Build beautiful frontends with OpenAI Codex Building with Open Models Using OpenAI Codex CLI with GPT-5-Codex OpenAI Codex in your code editor Context Engineering & Coding Agents with Cursor Shipping with Codex Sora, ImageGen, and Codex: The Next Wave of Creative Production Live Demo Showcase: Tools That 10x Your Codebase GitHub - openai/openai-sora-sample-app: Sample app to get started using the Video API with Sora GitHub - openai/openai-apps-sdk-examples: Example apps for the Apps SDK GitHub - openai/openai-chatkit-advanced-samples: Starter app to build with OpenAI ChatKit SDK GitHub - openai/openai-chatkit-starter-app: Starter app to build with OpenAI ChatKit + Agent Builder Agentic Commerce Protocol | OpenAI Developers Rate limits guide Web search guide Getting Started with Evals Prompt Optimizer Verifying gpt-oss implementations How to run gpt-oss locally with LM Studio Fine-tuning with gpt-oss and Hugging Face Transformers How to run gpt-oss locally with Ollama Function calling guide OpenAI models page Reasoning best practices Reasoning guide Background mode guide Batch API guide Conversation state guide File search guide Flex processing guide MCP guide Code interpreter guide Quickstart - OpenAI Agents SDK Balance accuracy, latency, and cost Build hour — agentic tool calling Build hour — built-in tools Keep costs low & accuracy high Graders Evals Best Practices Working with the Evals API Guardrails - OpenAI Agents SDK Latency optimization guide AI Techniques (Foundations): Evaluating LLM Applications LLM correctness and consistency Model distillation overview Agent orchestration - OpenAI Agents SDK Production best practices Prompt engineering guide RAG technique overview Realtime guide Realtime intro Realtime translation guide AI Techniques (Foundations): Responses API Tools & Features Responses guide Responses vs. chat completions guide Speech-to-text guide Supervised fine-tuning overview Tracing - OpenAI Agents SDK AI Techniques (Foundations): Introduction to Agentic Workflows Vision fine-tuning overview Audio & speech guide GitHub - openai/openai-cs-agents-demo: Demo of a customer service use case implemented with the OpenAI Agents SDK Model distillation overview Voice applications intro Fine-tuning best practices Realtime translation guide 4o image generation intro GitHub - openai/openai-agents-python: A lightweight, powerful framework for multi-agent workflows GitHub - openai/openai-agents-js: A lightweight, powerful framework for multi-agent workflows and voice agents Building agents guide Built-in tools guide Codex intro Computer Use API guide GitHub - openai/openai-cua-sample-app: Learn how to use CUA (our Computer Using Agent) via the API on multiple computer environments. DevDay — distillation breakout DevDay — realtime breakout GitHub - openai/openai-testing-agent-demo: Demo of a UI testing agent using the OpenAI CUA model and the Responses API. DevDay — structured outputs breakout Image generation guide An Introduction to MCP Model optimization guide New audio models intro GitHub - openai/openai-fm: Code for openai.fm, a demo for the OpenAI Speech API Predicted outputs guide GitHub - openai/openai-realtime-console: React app for inspecting, building and debugging with the Realtime API Building Voice Agents
Generate images with GPT Image
2025-04-23 · via OpenAI Developers

In this cookbook, you’ll learn how to use GPT Image, our new large language model with image generation capabilities.

This model has world knowledge and can generate images leveraging this broad understanding of the world. It is also much better at instruction following and producing photorealistic images compared to our previous-generation image models, DallE 2 and 3.

To learn more about image generation, refer to our guide.

%pip install pillow openai -U
import base64
import os
from openai import OpenAI
from PIL import Image
from io import BytesIO
from IPython.display import Image as IPImage, display
client = OpenAI()
# Set your API key if not set globally
#client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))
# Create imgs/ folder
folder_path = "imgs"
os.makedirs(folder_path, exist_ok=True)

GPT Image 1 is great at instruction-following, meaning you can prompt the model to generate images with very detailed instructions.

prompt1 = """
Render a realistic image of this character:
Blobby Alien Character Spec Name: Glorptak (or nickname: "Glorp")
Visual Appearance Body Shape: Amorphous and gelatinous. Overall silhouette resembles a teardrop or melting marshmallow, shifting slightly over time. Can squish and elongate when emotional or startled.
Material Texture: Semi-translucent, bio-luminescent goo with a jelly-like wobble. Surface occasionally ripples when communicating or moving quickly.
Color Palette:
- Base: Iridescent lavender or seafoam green
- Accents: Subsurface glowing veins of neon pink, electric blue, or golden yellow
- Mood-based color shifts (anger = dark red, joy = bright aqua, fear = pale gray)
Facial Features:
- Eyes: 3–5 asymmetrical floating orbs inside the blob that rotate or blink independently
- Mouth: Optional—appears as a rippling crescent on the surface when speaking or emoting
- No visible nose or ears; uses vibration-sensitive receptors embedded in goo
- Limbs: None by default, but can extrude pseudopods (tentacle-like limbs) when needed for interaction or locomotion. Can manifest temporary feet or hands.
Movement & Behavior Locomotion:
- Slides, bounces, and rolls.
- Can stick to walls and ceilings via suction. When scared, may flatten and ooze away quickly.
Mannerisms:
- Constant wiggling or wobbling even at rest
- Leaves harmless glowing slime trails
- Tends to absorb nearby small objects temporarily out of curiosity
"""

img_path1 = "imgs/glorptak.jpg"
# Generate the image
result1 = client.images.generate(
    model="gpt-image-1",
    prompt=prompt1,
    size="1024x1024"
)
# Save the image to a file and resize/compress for smaller files
image_base64 = result1.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Adjust this if you want a high-quality Glorptak
image = Image.open(BytesIO(image_bytes))
image = image.resize((300, 300), Image.LANCZOS)
image.save(img_path1, format="JPEG", quality=80, optimize=True)
# Show the result
display(IPImage(img_path1))

Customize the output

You can customize the following output properties:

  • Quality can be low, medium, high or auto (default value)
  • Size can be 1024x1024 (square), 1536x1024 (portrait), 1024x1536 (landscape) or auto (default)
  • You can adjust the compression level (from 0-100%) for JPEG and WEBP formats
  • You can choose to generate an image with a transparent background (only available for PNG or WEBP)
prompt2 = "generate a portrait, pixel-art style, of a grey tabby cat dressed as a blond woman on a dark background."
img_path2 = "imgs/cat_portrait_pixel.jpg"
# Generate the image
result2 = client.images.generate(
    model="gpt-image-1",
    prompt=prompt2,
    quality="low",
    output_compression=50,
    output_format="jpeg",
    size="1024x1536"
)
# Save the image to a file and resize/compress for smaller files
image_base64 = result2.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

image = Image.open(BytesIO(image_bytes))
image = image.resize((250, 375), Image.LANCZOS)
image.save(img_path2, format="JPEG", quality=80, optimize=True)
# Show the result
display(IPImage(img_path2))

Transparent background

You can use the background property to request a transparent background, but if you include in your prompt that you want a transparent background, it will be set to transparent by default.

prompt3 = "generate a pixel-art style picture of a green bucket hat with a pink quill on a transparent background."
img_path3 = "imgs/hat.png"
result3 = client.images.generate(
    model="gpt-image-1",
    prompt=prompt3,
    quality="low",
    output_format="png",
    size="1024x1024"
)
image_base64 = result3.data[0].b64_json
image_bytes = base64.b64decode(image_base64)
# Save the image to a file and resize/compress for smaller files
image_base64 = result3.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

image = Image.open(BytesIO(image_bytes))
image = image.resize((250, 250), Image.LANCZOS)
image.save(img_path3, format="PNG")
# Show the result
display(IPImage(img_path3))

GPT Image can also accept image inputs, and use them to create new images. You can also provide a mask if you don’t want the model to change a specific part of the input image.

You can use a maximum of 10 input images, and if you use a mask, it will be applied to the first image provided in the image array.

prompt_edit = """
Combine the images of the cat and the hat to show the cat wearing the hat while being perched in a tree, still in pixel-art style.
"""
img_path_edit = "imgs/cat_with_hat.jpg"
img1 = open(img_path2, "rb")
img2 = open(img_path3, "rb")
# Generate the new image
result_edit = client.images.edit(
    model="gpt-image-1",
    image=[img1,img2], 
    prompt=prompt_edit,
    size="1024x1536"
)
# Save the image to a file and resize/compress for smaller files
image_base64 = result_edit.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

image = Image.open(BytesIO(image_bytes))
image = image.resize((250, 375), Image.LANCZOS)
image.save(img_path_edit, format="JPEG", quality=80, optimize=True)
# Show the result
display(IPImage(img_path_edit))

You can also provide a mask along with your input images (if there are several, the mask will be applied on the first one) to edit only the part of the input image that is not covered by the mask. Please note that the model might still edit some parts of the image inside the mask, but it will avoid it.

Important note: the mask should contain an alpha channel. If you’re generating it manually, for example using an image editing software, make sure you include this alpha channel.

Generating the mask

For this example, we’ll use our model to generate the mask automatically for us. The mask might not be exact, but it will be enough for our purposes. If you need to have an exact mask, feel free to use an image segmentation model.

img_path_mask = "imgs/mask.png"
prompt_mask = "generate a mask delimiting the entire character in the picture, using white where the character is and black for the background. Return an image in the same size as the input image."
img_input = open(img_path1, "rb")

# Generate the mask
result_mask = client.images.edit(
    model="gpt-image-1",
    image=img_input, 
    prompt=prompt_mask
)
# Save the image to a file and resize/compress for smaller files
image_base64 = result_mask.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

image = Image.open(BytesIO(image_bytes))
image = image.resize((300, 300), Image.LANCZOS)
image.save(img_path_mask, format="PNG")
# Show the mask
display(IPImage(img_path_mask))

Creating an alpha channel

This step is optional, if you want to turn a black & white image into a mask with an alpha channel that can be used in the Image Edit API.

# 1. Load your black & white mask as a grayscale image
mask = Image.open(img_path_mask).convert("L")

# 2. Convert it to RGBA so it has space for an alpha channel
mask_rgba = mask.convert("RGBA")

# 3. Then use the mask itself to fill that alpha channel
mask_rgba.putalpha(mask)

# 4. Convert the mask into bytes
buf = BytesIO()
mask_rgba.save(buf, format="PNG")
mask_bytes = buf.getvalue()
# Save the resulting file
img_path_mask_alpha = "imgs/mask_alpha.png"
with open(img_path_mask_alpha, "wb") as f:
    f.write(mask_bytes)

Editing with the mask

When using a mask, we still need the prompt the model describing the entiring resulting image, not just the area that is masked.

prompt_mask_edit = "A strange character on a colorful galaxy background, with lots of stars and planets."
mask = open(img_path_mask_alpha, "rb")
result_mask_edit = client.images.edit(
    model="gpt-image-1",         
    prompt=prompt_mask_edit,
    image=img_input,
    mask=mask,
    size="1024x1024"
)
# Display result

img_path_mask_edit = "imgs/mask_edit.png"

image_base64 = result_mask_edit.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

image = Image.open(BytesIO(image_bytes))
image = image.resize((300, 300), Image.LANCZOS)
image.save(img_path_mask_edit, format="JPEG", quality=80, optimize=True)
    
display(IPImage(img_path_mask_edit))

In this cookbook, we’ve seen how to use our new image generation model, GPT Image, to either generate new images from scratch, or use reference images. We’ve also covered how to create a mask with an alpha channel to apply it to an input image, to guide the image edition even further.

Feel free to use this as a starting point to explore other use cases, and if you’re looking for some inspiration, check out the image gallery in our docs.

Happy building!