qwen-image

The qwen image api (Qwen-VL-Max) is a frontier vision-language model by Alibaba. It excels at high-resolution OCR, precise visual grounding with bounding boxes, and complex video analysis, outperforming GPT-4o in mathematical reasoning.

￥ 0.0315

￥ 0.035

text

image

￥ 0.0315

￥ 0.035

text

image

Core Features of the Qwen Image API

The qwen image api provides specialized tools for high-resolution document extraction and visual reasoning.

Precise Visual Grounding

Outputs exact bounding box coordinates for objects, enabling advanced visual search and UI automation.

realism, a young scholar with glasses, wearing a tweed blazer, sits in a grand, ancient library. Sunlight streams through a massive arched window, illuminating dust motes dancing in the air. An open book rests on her lap as she looks up thoughtfully. Warm and cozy atmosphere, light academia aesthetic, narrative lighting, photorealistic.

Prompt

After

Precise Visual Grounding

Outputs exact bounding box coordinates for objects, enabling advanced visual search and UI automation.

Prompt

After

20+ Minute Video Analysis

Analyzes long-duration video through dynamic sampling for temporal event detection and summarization.

realism, a young woman sitting alone in a laundromat at midnight, wearing headphones, staring at the rotating dryer drum, neon reflections on the glass, a subtle expression of nostalgia on her face

Prompt

After

20+ Minute Video Analysis

Analyzes long-duration video through dynamic sampling for temporal event detection and summarization.

realism, a young woman sitting alone in a laundromat at midnight, wearing headphones, staring at the rotating dryer drum, neon reflections on the glass, a subtle expression of nostalgia on her face

Prompt

After

Complex Chart Reasoning

Interprets graphs, tables, and mathematical formulas with state-of-the-art accuracy on MathVista.

A glamorous woman with a sharp bob haircut and dark lipstick. She is dressed in a stunning black and gold sequined flapper dress with long pearls. She leans against a gilded Art Deco bar, with a jazz band softly blurred in the background. Sophisticated, low-key lighting creates a luxurious and intimate mood, Great Gatsby era, glamorous, geometric patterns.

Prompt

After

Complex Chart Reasoning

Interprets graphs, tables, and mathematical formulas with state-of-the-art accuracy on MathVista.

Prompt

After

Native High-Res OCR

Preserves clarity for small text in complex layouts, outperforming standard LLMs on dense document extraction.

A man in a suit is standing in front of the window, looking at the bright moon outside the window. The man is holding a yellowed paper with handwritten words on it: "A lantern moon climbs through the silver night, Unfurling quiet dreams across the sky, Each star a whispered promise wrapped in light, That dawn will bloom, though darkness wanders by." There is a cute cat on the windowsill.

Prompt

After

Native High-Res OCR

Preserves clarity for small text in complex layouts, outperforming standard LLMs on dense document extraction.

Prompt

After

几分钟内用 qwen image 开始构建

按以下简单步骤注册账户、获取额度，并通过 GPT Proto 向 qwen image 发送 API 请求。

创建免费 GPT Proto 账户即可开始，随时可为团队创建组织。

充值

余额可在平台全部模型（含 qwen image）使用，灵活试验与扩展。

生成 API 密钥

在控制台创建 API 密钥，向 qwen image 发起请求时用于鉴权。

发起首次 API 调用

使用 API 密钥与示例代码，通过 GPT Proto 向 qwen image 发送请求，即刻获得 AI 结果。

获取 API 密钥

Qwen Image API: Frequently Asked Questions

How does the qwen image api handle high-res documents?

Unlike models that downscale images, the qwen image api supports variable input resolutions. This allows the qwen api to preserve the sharpness of small fonts and intricate details in architectural drawings or dense academic papers, leading to superior OCR accuracy compared to fixed-grid models.

Can the qwen api process long-form video content?

Yes, the qwen image api is capable of analyzing videos longer than 20 minutes. It uses dynamic frame sampling to perform temporal reasoning, allowing users to ask questions about specific events or patterns occurring across long durations of footage.

Is the qwen image api compatible with OpenAI SDKs?

Absolutely. We provide an OpenAI-compatible endpoint for the qwen image api. You can use your existing Python or Node.js OpenAI client by simply updating the base URL and setting the model name to qwen image, making integration effortless.

What is the pricing structure for the qwen image api?

The qwen image api is highly cost-effective, with both input and output priced at approximately $2.80 per 1M tokens. This is significantly more affordable than Claude 3.5 Sonnet, especially for high-volume tasks like bulk document extraction or large-scale OCR.

Does the qwen api support visual grounding coordinates?

Yes, the qwen image api can output precise normalized coordinates [ymin, xmin, ymax, xmax] for objects it detects. This makes the qwen api perfect for building visual search engines, automated safety auditing, or UI interaction tools.

Is data sent to the qwen image api used for training?

No. Data submitted through the GPTProto qwen image api is not used for model training. We prioritize enterprise-grade privacy, ensuring that your images and prompts remain confidential and secure at all times.

Related Scenarios

Polybuzz AI online gratis

Step into a dynamic polybuzz of interactive virtual companions, featuring seamless voice acting and personalized roleplay.

Image to Sketch Converter

Use our powerful AI sketch generator as your go-to image to sketch converter. Effortlessly capture delicate pencil strokes, facial features, and landscape textures.

Retouch Photo Online

Restore old photos, eliminate background clutter, and conceal skin defects instantly using our advanced retouch AI.

Historical Timelines

Create beautiful historical timelines from complex historical events using our advanced AI timeline maker.

Core Features of the Qwen Image API

Precise Visual Grounding

Precise Visual Grounding

20+ Minute Video Analysis

20+ Minute Video Analysis

Complex Chart Reasoning

Complex Chart Reasoning

Native High-Res OCR

Native High-Res OCR

几分钟内用 qwen image 开始构建

创建免费 GPT Proto 账户即可开始，随时可为团队创建组织。

余额可在平台全部模型（含 qwen image）使用，灵活试验与扩展。

在控制台创建 API 密钥，向 qwen image 发起请求时用于鉴权。

使用 API 密钥与示例代码，通过 GPT Proto 向 qwen image 发送请求，即刻获得 AI 结果。

Qwen Image API: Frequently Asked Questions

How does the qwen image api handle high-res documents?

Can the qwen api process long-form video content?

Is the qwen image api compatible with OpenAI SDKs?

What is the pricing structure for the qwen image api?

Does the qwen api support visual grounding coordinates?

Is data sent to the qwen image api used for training?

Related Scenarios

Polybuzz AI online gratis

Image to Sketch Converter

Retouch Photo Online

Historical Timelines

Further Reading

Qwen Image Edit: Optimize Models on Any GPU

Qwen Image Edit: Optimize Models on Any GPU

Meet Qwen 3: Alibaba's latest Open-Source AI Model Series

Qwen 2.5 32b: The Ultimate Local AI Sweet Spot