Runway shipped Gen-4 on May 3 with five upgrades, the headliner being native audio-video generation — synchronized dialogue, ambient sound, and environmental effects produced in the same pass as the visuals. The update also adds UGC ad templates with vertical TikTok-ready output and a new API enabling hybrid multi-model pipelines (mix Gen-4 with Veo 3 or Seedance in a single workflow). Motion physics and prompt adherence both got meaningful bumps.
Between April 1-6, Alibaba released four Wan 2.7 models — text-to-video, image-to-video, reference-to-video with voice cloning, and instruction-based video editing — all under Apache 2.0, fully downloadable and self-hostable. The 27B MoE architecture (14B active) outputs 1080p video up to 15 seconds with native audio sync, and the R2V model accepts up to five simultaneous reference inputs. API pricing starts at $0.10/sec via Together AI.
As of April 7, any Google account holder can generate Veo 3.1 video clips for free — 10 per month at 720p/8 seconds. AI Ultra subscribers get up to 1,000 generations. Google also shipped Veo 3.1 Lite for developers at $0.05/sec on Vertex AI, under half the cost of Veo 3.1 Fast. Custom music via Lyria 3 is bundled for Pro/Ultra tiers.
Instagram's algorithm update, which rolled live April 30-May 4, now penalizes accounts that primarily post unoriginal content across all formats — not just Reels. The system evaluates accounts over a rolling month: if most output is reposted material, the account drops out of Explore and recommendation surfaces entirely. For AI video creators, this means original generation and meaningful transformation are now table stakes for distribution on the platform.
OpenAI's GPT Image 2.0 landed in ComfyUI via Partner Nodes in late April. The model plans compositions before generating — dense text, UI mockups, infographics, and manga panels render cleanly. Editing preserves structural integrity at up to 2K. The practical play: use it to generate text-heavy hero frames, then hand off to local models for upscaling, stylization, or video-from-image pipelines.
Kling 3.0's Motion Control transfers facial motion, hand gestures, body movement, and camera rhythm from a reference video to any still image. Element Binding locks facial identity to motion data for consistent characters across shots. Outputs in native 4K with up to 6 camera cuts in a single 15-second generation — shot-reverse-shot patterns handled automatically.
ByteDance's Seedance 2.0 (February launch, now mature on CapCut and fal.ai) uses a unified Multimodal Diffusion Transformer that encodes text, image, audio, and video into a shared representation space. It accepts up to 12 reference assets per generation, outputs 4-15 sec at 2K with native stereo audio, and claims 90%+ usable output on first attempt. IP guardrails block recognizable likenesses, copyrighted characters, and brand impersonation at the model level.
X's creator revenue sharing pool has more than doubled for 2026 compared to 2025, with payouts tied to Verified Home Timeline impressions. However, as of March 3, posting AI-generated conflict footage without disclosure triggers a 90-day suspension from the program. AI video creators monetizing on X need visible disclosure on synthetic content to protect their revenue stream.