AI Models - Masonry | Nano Banana, FLUX, GPT Image, Imagen & More

Log in Start for free

Nano Banana 2

Preview of Gemini 3.1 Flash image generation optimized for price-performance balance with text-to-image and image mixing (supports up to 14 input images).

Text to ImageRemixInpaintOutpaintStyle Transfer

GPT Image 2

OpenAI's GPT Image 2 with native reasoning, up to 4K output, and multi-image consistency across a batch.

Text to ImageInpaint

Nano Banana Pro

Preview of Gemini 3 Pro image generation for text-to-image and image mixing (supports up to 14 input images).

Text to ImageRemixInpaintOutpaintStyle Transfer

Gemini 3.1 Flash Lite Image

Nano Banana 2 Lite — Google's fastest, most cost-efficient Nano Banana image model. Near-real-time 1K text-to-image and image editing (supports up to 14 input images).

Text to ImageRemixInpaintOutpaintStyle Transfer

Nano Banana

Fast Gemini 2.5 Flash image variant for text-to-image generation and image mixing (supports up to 3 input images).

Text to ImageRemixInpaintOutpaintStyle Transfer

Seedance 1.5 Pro

ByteDance's latest most powerful video model yet

Text to VideoImage to Video

Seedream 5.0 Pro

ByteDance's flagship Seedream 5.0 Pro for structured image generation and region-precise editing, with multilingual typography and up to 10 reference images.

Text to ImageRemixStyle Transfer

Imagen 4

Google's flagship Imagen 4 model for high-quality image generation with improved text rendering

Recraft V4 Vector

Recraft's V4 text-to-vector model. Generates true, editable SVG graphics — logos, icons, and brand assets that stay sharp at any resolution.

GPT Image 1.5

OpenAI's GPT Image 1.5 model for image generation and edits (supports up to 10 input images).

Text to ImageRemix

Kling 2.1

Kwaivgi's Kling v2.1 standard mode producing 720p 24fps video from a prompt and reference frame.

FLUX.2 Pro

Professional FLUX.2 model with higher quality, multi-image conditioning, and up to 4MP outputs.

Kling O3 Pro

Kuaishou's Kling O3 Pro for high-fidelity text-to-video and image-to-video generation with native audio and clips up to 15 seconds.

Text to VideoImage to Video

Kling Pro 2.1

Kwaivgi's Kling v2.1 pro mode offering 1080p 24fps output with optional end-frame guidance.

Kling v2.6 Pro

Kling v2.6 Pro Image-to-Video model with improved visual quality, motion consistency, and native audio generation support.

Seedance 2.0

ByteDance's Seedance 2.0 model via fal.ai for cinematic text-to-video and image-to-video generation with native audio.

Text to VideoImage to Video

FLUX Kontext Max

Advanced FLUX model for image generation and editing with reference image support for context and composition guidance.

Text to ImageStyle Transfer

Gemini Omni Flash

Google's Gemini Omni Flash — fast, cost-efficient text-to-video, image-to-video, and input-video editing with native audio, delivered through Vertex AI.

Text to VideoImage to Videovideo editing

Grok Imagine Image

xAI Grok Imagine text-to-image generation with aspect ratio and 1k/2k resolution controls.

Kling O1

Kling O1 first-frame-to-last-frame video generator with dual keyframe support for precise motion control and transitions.

Seedance 1 Lite

ByteDance's Seedance 1 Lite model for cost-effective prompt or image conditioned video generation.

Text to VideoImage to Video

Seedance 2.0 Fast

ByteDance's Seedance 2.0 fast endpoints via fal.ai, optimized for lower latency and cost.

Text to VideoImage to Video

SeedDream 4.5

ByteDance's SeedDream 4.5 model for high-quality text-to-image and image-to-image generation with improved spatial understanding and world knowledge, supporting up to 4K resolution.

Text to ImageRemixStyle Transfer

Veo 3.1 Fast

Veo 3.1 Fast Preview delivers rapid preview renders for text-to-video and image-to-video via Vertex AI.

Text to VideoImage to Video

WAN 2.5 (Image-to-Video)

WAN Video 2.5 image-to-video generation with 5–10s clips at 480p/720p/1080p.

Bria Embed Product

Bria's commercially licensed product-compositing model for placing product cutouts into scenes with controlled position, perspective, and natural lighting.

FLUX 1.1 Pro

Professional FLUX 1.1 model with enhanced quality and capabilities.

FLUX.2 Dev

Developer-focused FLUX.2 variant with lower latency and go_fast toggle.

FLUX.2 Flex

Flexible FLUX.2 variant optimized for creative exploration with tunable steps and guidance.

FLUX.2 Klein 4B Base

Un-distilled FLUX.2 Klein 4B base model optimized for fine-tuning and multi-reference workflows.

Text to ImageRemixStyle Transfer

FLUX.2 Klein 9B Base

Un-distilled FLUX.2 Klein foundation model for flexible text-to-image and multi-reference workflows.

Text to ImageRemixStyle Transfer

Grok Imagine Image Edit

xAI Grok Imagine image editing: edit up to 3 reference images with a text prompt, aspect ratio, and 1k/2k resolution controls.

Grok Imagine Video 1.5

xAI Grok Imagine 1.5 image-to-video: animate a source image with a text prompt at 480p or 720p.

Ideogram V3 Quality

The highest quality Ideogram v3 model. v3 creates images with stunning realism, creative designs, and consistent styles

Ideogram V3 Turbo

Turbo is the fastest and cheapest Ideogram v3. v3 creates images with stunning realism, creative designs, and consistent styles

Ideogram V4

Ideogram's latest text-to-image model. Best-in-class text rendering for posters, logos, and signage, with fine detail and strong creative control.

Kling O1 Reference (Character Lock)

Kuaishou's Kling O1 reference-to-video: lock a character's identity from multiple reference images (visual DNA) and generate a clip from a prompt, via fal.ai.

Kling O3 Standard

Kuaishou's faster, lower-cost Kling O3 tier for text-to-video and image-to-video generation with native audio and clips up to 15 seconds.

Text to VideoImage to Video

Kling v1 Camera Director

Kuaishou's Kling v1 text-to-video with pre-baked cinematic camera templates (dolly/crane, orbit, pan, tilt, roll, zoom) via fal.ai. Camera control is a Kling v1-era feature; newer Kling tiers omit it.

Kling v1.5 Motion Brush

Kuaishou's Kling v1.5 Pro image-to-video with motion-brush trajectory pathing — paint per-region motion paths over a start image (dynamic masks + a static hold region) via fal.ai.

Kling v2.5 Turbo Pro

Kwaivgi's Kling v2.5 Turbo Pro model for prompt-based or image-guided video generation.

Text to VideoImage to Video

LTX-2.3

Lightricks' LTX-2.3 open-source video model: fast, low-cost text-to-video and image-to-video with native audio (6/8/10s at 1080p).

Text to VideoImage to Video

Masonry Magic Layers

Decomposes a single image into multiple editable RGBA layers (foreground, background, text, and individual elements) in one pass, so each piece can be moved and edited independently on the canvas.

Minimax Hailuo 02

Minimax's Hailuo 02 standard tier supporting 512p and 1080p output.

Text to VideoImage to Video

PixVerse C1

PixVerse C1 for cinematic text-to-video and image-to-video generation with native audio, 1–15 second clips, and output up to 1080p.

Text to VideoImage to Video

Qwen Image

High-quality text-to-image model from Qwen with support for multiple canvas dimensions and LoRA weights.

Qwen Image Edit Plus

Qwen's enhanced image editing model supporting multi-image conditioning and rich prompt controls.

Recraft V4

Recraft's V4 text-to-image model. Balanced composition and cohesive color for design and marketing assets; top-ranked on the Text-to-Image Arena.

Seedance 1 Pro

ByteDance's Seedance 1 Pro model via BytePlus ModelArk API with multi-shot narrative capabilities and cinematic aesthetics.

Text to VideoImage to Video

SeedDream 4

ByteDance's SeedDream 4 model for high-quality text-to-image and image-to-image generation with support for up to 4K resolution.

Text to ImageRemixStyle Transfer

Veo 3

Google DeepMind's Veo 3 text-to-video model delivered through Vertex AI.

Veo 3 Fast

Veo 3 Fast delivers rapid text-to-video renders optimized for iteration via Vertex AI.

Text to VideoImage to Video

Veo 3.1

Preview release of Veo 3.1 supporting enhanced text-to-video and image-to-video generation on Vertex AI.

Text to VideoImage to Video

Veo 3.1 Lite Preview

Veo 3.1 Lite Preview offers lightweight, cost-efficient text-to-video and image-to-video generation via Vertex AI.

Text to VideoImage to Video

WAN 2.7

Alibaba's WAN 2.7 video model: text-to-video and image-to-video with native audio, 2–15s clips at 720p/1080p.

Text to VideoImage to Video