OpenAI officially killed the Sora 2 consumer app on April 26, and the API clock is ticking: it stops generating clips entirely on September 24, 2026. If you have any production workflows routing through Sora endpoints, that migration window is not as long as it looks — especially if you need time to QA a replacement.
Google dropped Veo 3.1 with two API tiers: Lite at $0.05 per second of video output, and Standard at $0.40 per second with native synchronized audio. That Lite price is the first time programmatic AI video generation has been cheap enough to prototype at real scale. Veo 3.1 also outputs native 4K at 3840x2160 with a reported 40–60 percent improvement in frame consistency over 3.0.
Alibaba's Wan 2.7 is now the only mainstream model where you set the opening frame and the closing frame and let the model generate the motion between them — true bracket-to-bracket control. It also accepts up to five reference videos simultaneously for character consistency across shots. Available now via the WaveSpeedAI API, Apache 2.0 licensed for commercial use.
The May 20 ComfyUI update adds official Wan 2.2 template workflows (T2V and I2V, both 5B and 14B sizes), a VRAM optimization for LTX-2.3 when guide_mask is active, Stable Audio 3.0 integration, and HiDream-O1 area conditioning for precise image control. The Wan 2.2 GGUF Q4_K quantization now runs the 14B model on 8–10GB VRAM, which opens it to a lot of mid-range cards.
Google's new multimodal Gemini Omni — which takes images, audio, video, and text as simultaneous inputs and generates video in a single reasoning pass — is live for Gemini app subscribers and rolling out free to YouTube Shorts and YouTube Create users right now. All output is SynthID watermarked. Developer API access is weeks away.
Current llm-stats arena scores: Kling v3 at 2127, Seedance 2.0 Fast at 1993, Alibaba HappyHorse 1.0 at 1962. Both Kling 3.0 and Seedance 2.0 generate audio and video in a single pass with phoneme-level lip sync — a workflow that used to require a separate pipeline stage. Seedance 2.0 has the strongest face fidelity numbers; Kling 3.0 leads on general motion.
TikTok's 2026 policy now uses C2PA Content Credentials to detect AI-generated visuals and audio automatically — it does not wait for creator self-disclosure. Deepfakes of real people without a visible label are prohibited outright; synthetic media featuring real private individuals is banned entirely, label or not. If TikTok's C2PA enforcement scales, expect Meta and YouTube to follow.
Lightricks' LTX-2.3, released March 5, brings a 22B parameter model with native 4K output at 50fps, stereo 24kHz audio, and portrait-native vertical video training — up to 20 seconds per clip. The ComfyUI v0.22.0 update specifically reduced peak VRAM usage for LTX-2.3, making it more accessible on prosumer hardware.
YouTube's 2026 policy update ties AI disclosure compliance to monetization eligibility — repeated non-disclosure triggers strikes that affect the revenue tap. Current RPM benchmarks: $1–$9 per thousand views for long-form in high-value niches, and $0.03–$0.13 per thousand for Shorts. Creators using AI-driven content optimization are reporting 35–60 percent earnings increases without publishing additional volume.
ByteDance's Seedance 2.0, out since February, generates synchronized audio and video together in one inference pass with phoneme-level lip sync — no separate audio model, no alignment step. That alone cuts a meaningful chunk of complexity out of character-driven short-form production workflows.