Name: WAN 2.5 (Image-to-Video)
Author: WAN Video

Question 1

What resolution and clip length does WAN 2.5 image-to-video support?

Accepted Answer

WAN 2.5 generates video at up to 1080p resolution, with clips up to 10 seconds long. The model also supports 480p and 720p output, and lower resolutions generate faster and cost less, making them useful for draft reviews and concept tests where 1080p is not yet needed. The 10-second limit is sufficient for the majority of social and paid-media placements.

Question 2

How does audio and lip-sync work in WAN 2.5 image-to-video?

Accepted Answer

The image-to-video endpoint works from an audio track you provide rather than generating sound itself. Upload a voiceover (WAV or MP3, typically 3–30 seconds) and WAN 2.5 drives the subject's lip-sync and pacing to match it, with natural mouth movement and expressions, so a portrait or spokesperson still becomes a talking-head clip. If you don't supply audio, it animates the image as a silent motion clip.

Question 3

What languages does the WAN 2.5 lip-sync support?

Accepted Answer

Because the lip-sync follows the voiceover track you upload, it isn't limited to a fixed set of languages. Supply audio in English, Spanish, Chinese, Russian, or another language and WAN 2.5 animates the mouth to match it. That makes localization straightforward, so swap the audio track, regenerate, and the animation follows.

Question 4

Can I upload my own audio to drive lip-sync in WAN 2.5?

Accepted Answer

Yes. This is how audio works on the image-to-video endpoint. You upload your own audio file (WAV or MP3, typically 3–30 seconds) and WAN 2.5 drives the lip-sync and pacing of the animation to match it. This is ideal when you have an approved voiceover recording and need the visual animation to follow it precisely, a common requirement in professional ad production.

Question 5

Does WAN 2.5 support text-to-video as well as image-to-video?

Accepted Answer

WAN 2.5 i2v is specifically the image-to-video variant of the WAN 2.5 model family. It is optimized for animating a still image input into video. Alibaba's WAN model family does include text-to-video variants, but the i2v model surfaced in Masonry is the image-to-video version. If you need text-to-video without a starting image, Kling 3.0 Standard or Pro on the same canvas are strong alternatives.

Question 6

What aspect ratios does WAN 2.5 image-to-video support?

Accepted Answer

Because WAN 2.5 image-to-video animates a still, the output aspect ratio follows your input image. Compose the source at 16:9 for horizontal placements, 9:16 for TikTok, Reels, and Stories, or 1:1 for square feed formats, and the clip matches. There's no separate aspect-ratio setting to manage.

Question 7

Is WAN 2.5 open source, and what does that mean for commercial use?

Accepted Answer

No. Although Alibaba released earlier versions of the family (WAN 2.1 and WAN 2.2) as open-weight models under the permissive Apache 2.0 license, WAN 2.5 is a closed model available through an API rather than as downloadable weights, so it cannot be self-hosted. When you access WAN 2.5 through Masonry or another API provider, commercial use of the output follows that provider's terms; confirm them with your provider if large-scale commercial distribution is planned.

Question 8

How does WAN 2.5 compare to Kling 3.0 for image-to-video animation?

Accepted Answer

WAN 2.5 and Kling 3.0 are both strong image-to-video models with different strengths. WAN 2.5 handles subtle, physics-aware motion particularly well and can lip-sync to a voiceover you provide. Kling 3.0 generates its own native audio (in Chinese and English) and adds an AI Director multi-shot sequencing system if you need more than one camera cut in a single clip. The practical approach is to run both on the same brief in Masonry and compare results directly.

Question 9

How does WAN 2.5 compare to Runway Gen-4.5 for still-image animation?

Accepted Answer

Runway Gen-4.5 has a mature Motion Brush tool that lets you paint motion onto specific areas of a frame with precision, and is noted for subject consistency across longer sequences. WAN 2.5's strengths are clean, physics-aware still-to-motion and built-in lip-sync to a voiceover you provide, useful when talking-head output matters. For maximum per-element motion control, Runway's Motion Brush remains a strong alternative to evaluate; the practical move is to compare both on the same still.

Question 10

What kind of input image works best with WAN 2.5?

Accepted Answer

Clean, well-composed images with clear subjects and deliberate lighting produce the most coherent results. Product packshots, lifestyle photography, and campaign hero images (the kind of polished stills a commercial creative team already produces) translate well to WAN 2.5 animation. Heavily cropped images, low-resolution sources, or images with complex busy backgrounds can introduce motion artifacts; a 1:1 or 16:9 composed still gives the model the most to work with.

Question 11

Can WAN 2.5 generate motion for product labels and text within the frame?

Accepted Answer

WAN 2.5 can animate scenes containing text and labels (a product bottle with a label, a sign, a branded package) with the text remaining legible through the motion. That said, for clips where text rendering and sharp label detail at large scale are the priority (e.g., a high-resolution product packshot reveal for a large-format display), Kling 3.0's native-4K mode and noted text-rendering performance may be the stronger choice.

Question 12

How does WAN 2.5 fit into a Masonry multi-model workflow?

Accepted Answer

WAN 2.5 is the natural animation step after image generation in Masonry. A common workflow generates the product or scene on an image model (Flux, Ideogram, or a fine-tuned model), brings the still into WAN 2.5 to add motion (and, for talking-head shots, lip-sync to a provided voiceover), and delivers a finished social clip, all without leaving the Masonry canvas. Because Masonry keeps both steps in the same project, the still and the animated clip stay linked, and comparing different animation approaches on the same source image is straightforward.

Question 13

What is the WAN model family and where does WAN 2.5 sit in it?

Accepted Answer

WAN (Wan Video) is Alibaba's series of video generation models, developed by the Tongyi Wanxiang team. The early versions (WAN 2.1 and 2.2) were released as open-weight models under Apache 2.0; WAN 2.5 moved to a closed, API-only release and added synchronized audio to the family along with stronger motion. The series has continued with 2.6 and 2.7 iterations adding further capabilities such as first-and-last-frame interpolation. WAN 2.5 i2v is the version surfaced in Masonry, the image-to-video variant, which animates a still and can lip-sync to a voiceover you provide.

Question 14

What is WAN 2.5 (Image-to-Video)?

Accepted Answer

WAN 2.5 (Image-to-Video) is an AI video generation model from WAN Video, available inside Masonry, the AI creative agent teams use to produce marketing, product, and brand videos.

Question 15

How does my team use WAN 2.5 (Image-to-Video) in Masonry?

Accepted Answer

Open a Masonry canvas, pick WAN 2.5 (Image-to-Video) from the model selector, and describe the video you need: a product shot, an ad creative, a social post. Masonry generates it, then you refine, edit, and combine WAN 2.5 (Image-to-Video) with other models in one workspace.

Question 16

Is WAN 2.5 (Image-to-Video) free to try?

Accepted Answer

Yes, you can start generating videos with WAN 2.5 (Image-to-Video) on Masonry's free tier, then scale up with higher limits and priority processing as your team grows.

Question 17

How do I write good prompts for WAN 2.5 (Image-to-Video)?

Accepted Answer

Start from a clean, well-composed image, then describe one clear motion, such as a push-in, a slow pan, or a single element moving. Subtle motion reads more believable than dramatic action. See the prompt gallery on this page for real WAN 2.5 (Image-to-Video) prompts you can copy and adapt.

Question 18

Who makes WAN 2.5 (Image-to-Video)?

Accepted Answer

WAN 2.5 (Image-to-Video) is built by WAN Video. Inside Masonry it runs alongside 50+ image and video models, so your team can pick the right one for each brief without switching tools.

Question 19

Can I see examples made with WAN 2.5 (Image-to-Video)?

Accepted Answer

Yes, the prompt gallery on this page shows real videos teams have generated with WAN 2.5 (Image-to-Video) in Masonry, each paired with the exact prompt you can copy and adapt for your own brand.

WAN 2.5 (Image-to-Video)

About WAN 2.5 (Image-to-Video)

Why teams choose WAN 2.5 (Image-to-Video)

What WAN 2.5 (Image-to-Video) can do

Flexible Resolution up to 1080p

Image-to-Video with Subject Preservation

Coherent, Physics-Aware Motion

Prompt-Guided Motion Direction

Audio-Driven Lip-Sync

Where teams reach for WAN 2.5 (Image-to-Video)

What sets WAN 2.5 (Image-to-Video) apart

Image-to-Video Asset Animation

Coherent, Believable Motion

Lip-Sync to Your Own Audio

Explore related categories

Frequently asked questions