- Up to 1080p
Resolution
- Up to 10s
Clip length
- From your audio
Lip-sync
- Follows input
Aspect ratio
About WAN 2.5 (Image-to-Video)
WAN 2.5 is Alibaba's image-to-video model, released in September 2025, and its defining capability for commercial creative teams is turning a finished still into believable, physics-aware motion. Give it a well-composed still image and a motion prompt, and it produces a clip up to 10 seconds at up to 1080p that animates your scene (parallax drift, light shifts, steam, a camera push-in) while preserving the subject, composition, and mood of the source image. For talking-head and spokesperson shots, it can also lip-sync the animation to a voiceover track you upload. Unlike Alibaba's earlier open-weight releases (WAN 2.1 and 2.2, both Apache 2.0), WAN 2.5 is a closed model accessed through an API rather than self-hosted, so teams use it through a provider without managing their own GPU infrastructure.
For creative teams, the practical value of WAN 2.5 is the still-to-video handoff. Product photographers produce strong packshots; key visual artists produce strong hero images. WAN 2.5 turns those finished assets into video without a reshoot. A product reveal clip with light shifting across the label and a soft camera push-in can be generated from a single photograph in one pass. In Masonry, WAN 2.5 sits on the same canvas as image models, so the workflow from still generation to animated social clip stays inside one project, and the motion output is ready to drop into a feed or presentation without additional production.
Why teams choose WAN 2.5 (Image-to-Video)
WAN 2.5 is the right model when you already have a strong still image and need to put it in motion without a new production. It handles subtle, physics-aware motion (parallax, light shifts, atmospheric movement) particularly well, and for talking-head shots it can lip-sync the animation to a voiceover track you provide. Against Kling 3.0 image-to-video, WAN 2.5 is a strong pick for clean still-to-motion work with flexible 480p–1080p output. Against Runway Gen-4.5, it's a straightforward, lower-overhead path from a finished still to an animated clip. In Masonry, WAN 2.5 is the natural companion to image generation models. Generate the still on a dedicated image model, then animate it here.
What WAN 2.5 (Image-to-Video) can do
The capabilities that set WAN 2.5 (Image-to-Video) apart and earn its place in a brief
Flexible Resolution up to 1080p
Outputs at 480p, 720p, or 1080p with clips up to 10 seconds. Generate fast, low-cost drafts at lower resolution, then step up to 1080p for the final, all from the same model.
Image-to-Video with Subject Preservation
Animates still images into video while preserving the source subject, composition, and lighting. This is the foundational capability for turning existing product photos, campaign stills, and key visuals into motion content without reshooting.
Coherent, Physics-Aware Motion
Adds believable natural movement (parallax drift, light shifts, steam, camera push-ins, subtle element animation) without distorting or drifting from the source image, so brand assets stay looking intentional rather than generated.
Prompt-Guided Motion Direction
Accepts detailed text prompts describing the specific motion, atmosphere, and camera behavior you want applied to the image, giving you creative control over how the scene comes to life rather than letting the model decide.
Audio-Driven Lip-Sync
Provide a voiceover track (WAV or MP3, via supported API providers) and WAN 2.5 animates the subject's mouth and pacing to match it, working across languages since it follows the audio you supply. Useful for spokesperson clips, product explainers, and talking-head social without a reshoot.
Where teams reach for WAN 2.5 (Image-to-Video)
- Product photo to reveal clipAnimate a packshot or product photograph into a commercial-quality reveal, with light shifting across the label and a soft camera push-in, without booking a video production day.
- Key visual to social videoTurn a campaign hero image into a moving social clip for Instagram, TikTok, or Reels feeds, adding parallax drift or atmospheric motion to extend the life of existing creative assets.
- Spokesperson and talking-head clips with lip-syncAnimate a spokesperson image into a lip-synced talking-head clip, with natural mouth movements and expressions driven by a voiceover track you upload. Useful for product explainers, announcements, and social ads.
- Multilingual product video localizationLocalize the same animated clip by swapping the uploaded voiceover (English, Spanish, Chinese, Russian, and more) so the lip-sync follows each track, cutting dubbing turnaround for global campaigns.
- Lifestyle and ambient brand videoBring lifestyle photography to life with subtle motion (steam rising from a cup, light shifting through leaves, fabric moving in wind) for brand content that reads as video without requiring a camera crew.
- E-commerce product animationProduce short animated product clips for e-commerce PDPs, marketplaces, and shopping ads by animating existing product photography, giving static catalog assets a video format without a new shoot.
- Social storyboarding and concept testsGenerate animated versions of key-visual concepts quickly to test motion direction and creative approach before committing to a full production, using image assets that already exist.
- Brand video repurposing from image archivesWork through existing photography archives and animate selected images into video content for seasonal campaigns, anniversary content, or channel launches, treating the image library as a raw video asset bank.
What sets WAN 2.5 (Image-to-Video) apart
The strengths teams reach for, shown on real renders.

Image-to-Video Asset Animation
Turns existing product photos, campaign stills, and key visuals into polished video clips, preserving the original composition and subject without a reshoot.

Coherent, Believable Motion
Adds natural movement (parallax drift, light shifts, steam, subtle camera push-ins) without distorting the source image, keeping brand assets looking intentional.

Lip-Sync to Your Own Audio
Upload a voiceover track and WAN 2.5 animates the subject's mouth and timing to match it, turning a portrait or spokesperson still into a talking-head clip without a reshoot or separate animation step.
Explore related categories
Browse adjacent categories and creative directions teams are exploring
Frequently asked questions
What teams need to know about creating with WAN 2.5 (Image-to-Video) in Masonry
What resolution and clip length does WAN 2.5 image-to-video support?
WAN 2.5 generates video at up to 1080p resolution, with clips up to 10 seconds long. The model also supports 480p and 720p output, and lower resolutions generate faster and cost less, making them useful for draft reviews and concept tests where 1080p is not yet needed. The 10-second limit is sufficient for the majority of social and paid-media placements.
How does audio and lip-sync work in WAN 2.5 image-to-video?
The image-to-video endpoint works from an audio track you provide rather than generating sound itself. Upload a voiceover (WAV or MP3, typically 3–30 seconds) and WAN 2.5 drives the subject's lip-sync and pacing to match it, with natural mouth movement and expressions, so a portrait or spokesperson still becomes a talking-head clip. If you don't supply audio, it animates the image as a silent motion clip.
What languages does the WAN 2.5 lip-sync support?
Because the lip-sync follows the voiceover track you upload, it isn't limited to a fixed set of languages. Supply audio in English, Spanish, Chinese, Russian, or another language and WAN 2.5 animates the mouth to match it. That makes localization straightforward, so swap the audio track, regenerate, and the animation follows.
Can I upload my own audio to drive lip-sync in WAN 2.5?
Yes. This is how audio works on the image-to-video endpoint. You upload your own audio file (WAV or MP3, typically 3–30 seconds) and WAN 2.5 drives the lip-sync and pacing of the animation to match it. This is ideal when you have an approved voiceover recording and need the visual animation to follow it precisely, a common requirement in professional ad production.
Does WAN 2.5 support text-to-video as well as image-to-video?
WAN 2.5 i2v is specifically the image-to-video variant of the WAN 2.5 model family. It is optimized for animating a still image input into video. Alibaba's WAN model family does include text-to-video variants, but the i2v model surfaced in Masonry is the image-to-video version. If you need text-to-video without a starting image, Kling 3.0 Standard or Pro on the same canvas are strong alternatives.
What aspect ratios does WAN 2.5 image-to-video support?
Because WAN 2.5 image-to-video animates a still, the output aspect ratio follows your input image. Compose the source at 16:9 for horizontal placements, 9:16 for TikTok, Reels, and Stories, or 1:1 for square feed formats, and the clip matches. There's no separate aspect-ratio setting to manage.
Is WAN 2.5 open source, and what does that mean for commercial use?
No. Although Alibaba released earlier versions of the family (WAN 2.1 and WAN 2.2) as open-weight models under the permissive Apache 2.0 license, WAN 2.5 is a closed model available through an API rather than as downloadable weights, so it cannot be self-hosted. When you access WAN 2.5 through Masonry or another API provider, commercial use of the output follows that provider's terms; confirm them with your provider if large-scale commercial distribution is planned.
How does WAN 2.5 compare to Kling 3.0 for image-to-video animation?
WAN 2.5 and Kling 3.0 are both strong image-to-video models with different strengths. WAN 2.5 handles subtle, physics-aware motion particularly well and can lip-sync to a voiceover you provide. Kling 3.0 generates its own native audio (in Chinese and English) and adds an AI Director multi-shot sequencing system if you need more than one camera cut in a single clip. The practical approach is to run both on the same brief in Masonry and compare results directly.
How does WAN 2.5 compare to Runway Gen-4.5 for still-image animation?
Runway Gen-4.5 has a mature Motion Brush tool that lets you paint motion onto specific areas of a frame with precision, and is noted for subject consistency across longer sequences. WAN 2.5's strengths are clean, physics-aware still-to-motion and built-in lip-sync to a voiceover you provide, useful when talking-head output matters. For maximum per-element motion control, Runway's Motion Brush remains a strong alternative to evaluate; the practical move is to compare both on the same still.
What kind of input image works best with WAN 2.5?
Clean, well-composed images with clear subjects and deliberate lighting produce the most coherent results. Product packshots, lifestyle photography, and campaign hero images (the kind of polished stills a commercial creative team already produces) translate well to WAN 2.5 animation. Heavily cropped images, low-resolution sources, or images with complex busy backgrounds can introduce motion artifacts; a 1:1 or 16:9 composed still gives the model the most to work with.
Can WAN 2.5 generate motion for product labels and text within the frame?
WAN 2.5 can animate scenes containing text and labels (a product bottle with a label, a sign, a branded package) with the text remaining legible through the motion. That said, for clips where text rendering and sharp label detail at large scale are the priority (e.g., a high-resolution product packshot reveal for a large-format display), Kling 3.0's native-4K mode and noted text-rendering performance may be the stronger choice.
How does WAN 2.5 fit into a Masonry multi-model workflow?
WAN 2.5 is the natural animation step after image generation in Masonry. A common workflow generates the product or scene on an image model (Flux, Ideogram, or a fine-tuned model), brings the still into WAN 2.5 to add motion (and, for talking-head shots, lip-sync to a provided voiceover), and delivers a finished social clip, all without leaving the Masonry canvas. Because Masonry keeps both steps in the same project, the still and the animated clip stay linked, and comparing different animation approaches on the same source image is straightforward.
What is the WAN model family and where does WAN 2.5 sit in it?
WAN (Wan Video) is Alibaba's series of video generation models, developed by the Tongyi Wanxiang team. The early versions (WAN 2.1 and 2.2) were released as open-weight models under Apache 2.0; WAN 2.5 moved to a closed, API-only release and added synchronized audio to the family along with stronger motion. The series has continued with 2.6 and 2.7 iterations adding further capabilities such as first-and-last-frame interpolation. WAN 2.5 i2v is the version surfaced in Masonry, the image-to-video variant, which animates a still and can lip-sync to a voiceover you provide.
What is WAN 2.5 (Image-to-Video)?
WAN 2.5 (Image-to-Video) is an AI video generation model from WAN Video, available inside Masonry, the AI creative agent teams use to produce marketing, product, and brand videos.
How does my team use WAN 2.5 (Image-to-Video) in Masonry?
Open a Masonry canvas, pick WAN 2.5 (Image-to-Video) from the model selector, and describe the video you need: a product shot, an ad creative, a social post. Masonry generates it, then you refine, edit, and combine WAN 2.5 (Image-to-Video) with other models in one workspace.
Is WAN 2.5 (Image-to-Video) free to try?
Yes, you can start generating videos with WAN 2.5 (Image-to-Video) on Masonry's free tier, then scale up with higher limits and priority processing as your team grows.
How do I write good prompts for WAN 2.5 (Image-to-Video)?
Start from a clean, well-composed image, then describe one clear motion, such as a push-in, a slow pan, or a single element moving. Subtle motion reads more believable than dramatic action. See the prompt gallery on this page for real WAN 2.5 (Image-to-Video) prompts you can copy and adapt.
Who makes WAN 2.5 (Image-to-Video)?
WAN 2.5 (Image-to-Video) is built by WAN Video. Inside Masonry it runs alongside 50+ image and video models, so your team can pick the right one for each brief without switching tools.
Can I see examples made with WAN 2.5 (Image-to-Video)?
Yes, the prompt gallery on this page shows real videos teams have generated with WAN 2.5 (Image-to-Video) in Masonry, each paired with the exact prompt you can copy and adapt for your own brand.
Start creating with WAN 2.5 (Image-to-Video)
Generate, edit, and compare across 50+ models in one workspace.
Explore more AI models
Compare WAN 2.5 (Image-to-Video) with other models teams run in Masonry