ByteDance unveiled Seedance 2.5 at its Volcano Engine FORCE conference this week — 30-second native 4K generation (not upscaled, genuinely native) with 10-bit colour depth and support for up to 50 multimodal reference inputs simultaneously, up from 12 in version 2.0. Audio is co-processed inside the same latent space as video, so sound effects and dialogue sync without a separate audio pass. Enterprise beta is live now; public launch is targeting early July, meaning this week.
Kling 3.0's Director Mode lets you define up to six distinct shots in a single generation — custom mode takes your per-shot storyboard with specific camera angles and durations; auto mode reads your scene description and plans the sequence itself. Native audio sync covers all shots with lip-synced dialogue in five languages. At roughly $0.10 per second of output, this is currently the most affordable multi-shot pipeline on any commercial API.
YouTube's Shorts algorithm now requires roughly 70% average view duration in the first 30–60 minutes or distribution effectively stops. The quieter change: the platform now actively suppresses content that is too similar to what you have already posted or to what is already trending in your niche — recycled hooks and formats now cost channel-wide reach. Properly disclosed AI content carries no additional penalty; viewer response is the only metric that matters.
Black Forest Labs' FLUX.1 Kontext Dev (12B params, Apache 2.0) now has native workflow templates in ComfyUI, enabling targeted text-driven edits — outfit swaps, background changes, lighting adjustments — across multiple successive iterations with minimal visual drift. The FP8 version runs in 20GB VRAM; the full model requires 32GB. Pro and Max tiers are available via API for commercial use without open-source license constraints.
Runway Gen-4.5 sits atop the Artificial Analysis Video Arena leaderboard with an Elo score of 1,247 — 21 points ahead of Google Veo 3.1 (1,226) and 41 ahead of Sora 2 Pro (1,206) in blind human evaluations. That 21-point gap means Runway wins roughly 53% of direct comparisons against Veo and about 56% against Sora. Strengths are physical realism, cinematic fidelity, and emotional expressiveness in characters.
Alibaba's Wan 2.7, released April 22, brings instruction-based video editing to the open-source stack: describe a change in natural language and the model edits the existing clip rather than regenerating from scratch. A 9-grid multi-image layout locks character consistency across up to nine reference images. Runs on an RTX 4090 under Apache 2.0 — fully commercial, no API fees.
TikTok now uses C2PA Content Credentials to detect synthetic media automatically, even when creators do not self-label. Getting caught unlabeled triggers a four-tier penalty ladder from warning to permanent ban. The platform has confirmed that proactively labeling costs far less reach than getting flagged retroactively — and with automated detection now active, the margin for error is effectively zero.
YouTube's July 2025 policy requiring "genuine creative value" for AI-generated content to monetize is now in active enforcement at scale. Pure TTS narration, auto-slideshows, and repurposed footage without human input are demonetization triggers. Creators who added face intros or personal voiceovers maintained monetization; fully automated channels are being hit systematically. YouTube Shorts RPM runs $3–7 per 1,000 views for qualifying content — but the automation-only approach is now a liability.
Community consensus across multiple production roundups has landed on one structural conclusion: locking character, lighting, and framing as reference images before running any video model is no longer optional. The economics at $0.10/second of output only work at an 80%+ first-try success rate, and that rate requires the storyboard layer. Prompt chaining — where each clip prompt inherits visual tokens from the previous shot — is now widely confirmed as the most reliable multi-shot consistency technique.