ComfyUI dropped v0.21.0 on May 11 and v0.21.1 two days later with native Flux2ImageNode, GrokImageEditNodeV2, ByteDance SeedreamNodeV2, an OpenAI Image node, and a Claude LLM node — all as first-class partner nodes with DynamicCombo and Autogrow UX. The release also adds high-quality Flux2 latent previews and support for Anima TE LoRA in Kohya format plus HiDream-O1-Image with fp8 dtype fixes.
Alibaba shipped the full Wan 2.7 suite between April 1-6: four models covering text-to-video, image-to-video, reference-to-video with voice cloning, and instruction-based video editing — all Apache 2.0. The 27B-parameter MoE architecture (14B active) is available via API from $0.10/sec and runs locally. Wan 3.0 targeting 60B params, 4K, and 30-second generation is expected mid-2026.
YouTube's enforcement against low-quality AI content is accelerating hard. Over 4.7 billion lifetime views wiped, 35 million subscribers affected, and an estimated $10M in annual creator revenue vanished. The key: YouTube isn't banning AI — properly disclosed, quality AI content is still fully monetizable. It's mass-produced, repetitive slop that's getting hit. Disclosure via the "altered or synthetic content" toggle in Studio is now mandatory.
Runway's Gen-4.5 now holds the top spot in the Artificial Analysis Text-to-Video benchmark at 1,247 Elo points, surpassing all competitors. It supports text-to-video and image-to-video from 2-10 seconds, with improved stylistic control and visual consistency. Available across all paid plans at comparable pricing, with full API access for first-frame image input alongside text prompts.
Kuaishou's Kling 3.0 (launched Feb 5) brings multi-shot storyboarding — up to 6 shots in a single 15-second clip with per-shot control over duration, framing, camera movement, and narrative content. Native audio generation covers English, Chinese, Japanese, Korean, and Spanish with accent control. The Omni variant supports reference-based character consistency across scenes with voice cloning. Images now go up to 4K.
OpenAI's Sora app shut down April 26 and the API follows on September 24, 2026. The model was burning roughly $1M/day in compute with active users dropping from 1M to under 500K. Disney's rumored $1B investment never materialized — they reportedly learned of the shutdown less than an hour before the public announcement. If you still have Sora assets, download them now before permanent deletion.
NVIDIA's SANA-Video, which scored an ICLR 2026 Oral, uses linear attention with a constant-memory KV cache to generate minute-long 720p video without VRAM scaling. With NVFP4 precision on an RTX 5090, a 5-second 720p clip drops from 71s to 29s inference. Supports both text-to-video and text+image-to-video. A real option for creators who want to keep generation local and fast.
ByteDance's Seedance 2.0 (released Feb 12) accepts text, image, audio, and video inputs in combination — up to 9 images, 3 video clips, and 3 audio clips as reference — and generates multi-shot videos up to 15 seconds with dual-channel synchronized audio. The standout feature is phoneme-level lip-sync in 8+ languages, making it the go-to for multilingual talking-head content.
Pika's 2.5 engine introduces Pikaffects — pre-set physics simulations you can apply to any object in frame. The update also adds automatic sound-effect generation matched to on-screen action (a car crash generates the crunch of metal) and near-zero flicker with professional-grade temporal consistency. Separately, Pika launched PikaStream 1.0 for real-time video chat with AI agents.