Runway’s new video-generating AI, Gen-3, offers improved controls

June 17, 2024 ndowd

The race to high-quality, AI-generated videos is heating up.

On Monday, Runway, a company building generative AI tools geared toward film and image content creators, unveiled Gen-3 Alpha. The company’s latest AI model generates video clips from text descriptions and still images. Runway says the model delivers a “major” improvement in generation speed and fidelity over Runway’s previous flagship video model, Gen-2, as well as fine-grained controls over the structure, style and motion of the videos that it creates.

Gen-3 will be available in the coming days for Runway subscribers, including enterprise customers and companies in Runway’s creative partners program.

“Gen-3 Alpha excels at generating expressive human characters with a wide range of actions, gestures and emotions,” Runway wrote in a post on its blog. “It was designed to interpret a wide range of styles and cinematic terminology [and enable] imaginative transitions and precise key-framing of elements in the scene.”

Gen-3 Alpha has its limitations, including the fact that its footage maxes out at 10 seconds. However, Runway co-founder Anastasis Germanidis promises that Gen-3 is only the first — and smallest — of several video-generating models to come in a next-gen model family trained on upgraded infrastructure.

“The model can struggle with complex character and object interactions, and generations don’t always follow the laws of physics precisely,” Germanidis told TechCrunch this morning in an interview. “This initial rollout will support 5- and 10-second high-resolution generations, with noticeably faster generation times than Gen-2. A 5-second clip takes 45 seconds to generate, and a 10-second clip takes 90 seconds to generate.”

Gen-3 Alpha, like all video-generating models, was trained on a vast number of examples of videos — and images — so it could “learn” the patterns in these examples to generate new clips. Where did the training data come from? Runway wouldn’t say. Few generative AI vendors volunteer such information these days, partly because they see training data as a competitive advantage and thus keep it and info relating to it close to the chest.

“We have an in-house research team that oversees all of our training and we use curated, internal data sets to train our models,” Germanidis said He left it at that.

Runway Gen-3 — A sample from Runway’s Gen-3 model. Note the blurriness and low resolution is from a video-to-GIF conversion tool TechCrunch used, not Gen-3.

Training data details are also a potential source of IP-related lawsuits if the vendor trained on public data, including copyrighted data from the web — and so another disincentive to reveal much. Several cases making their way through the courts reject vendors’ fair use training data defenses, arguing that generative AI tools replicate artists’ styles without the artists’ permission and let users generate new works resembling artists’ originals for which artists receive no payment.

Runway addressed the copyright issue somewhat, saying that it consulted with artists in developing the model. (Which artists? Not clear.) That mirrors what Germanidis told me during a fireside at TechCrunch’s Disrupt conference in 2023:

“We’re working closely with artists to figure out what the best approaches are to address this,” he said. “We’re exploring various data partnerships to be able to further grow … and build the next generation of models.”

Runway also says that it plans to release Gen-3 with a new set of safeguards including a moderation system to block attempts to generate videos from copyrighted images and content that doesn’t agree with Runway’s terms of service. Also in the works is a provenance system — compatible with the C2PA standard, which is backed by Microsoft, Adobe, OpenAI and others — to identify that videos came from Gen-3.

“Our new and improved in-house visual and text moderation system employs automatic oversight to filter out inappropriate or harmful content,” Germanidis said. “C2PA authentication verifies the provenance and authenticity of the media created with all Gen-3 models. As model capabilities and the ability to generate high-fidelity content increases, we will continue to invest significantly on our alignment and safety efforts.”

Runway has also revealed that it’s partnered and collaborated with “leading entertainment and media organizations” to create custom versions of Gen-3 that allow for more “stylistically controlled” and consistent characters, targeting “specific artistic and narrative requirements.” The company adds: “This means that the characters, backgrounds, and elements generated can maintain a coherent appearance and behavior across various scenes.”

A major unsolved problem with video-generating models is control — i.e. getting a model to generate consistent video aligned with a creator’s artistic intentions. As my colleague Devin Coldewey recently wrote, simple matters in traditional filmmaking, like choosing a color in a character’s clothing, require workarounds with generative models because each shot is created independently of the others. Sometimes not even workarounds do the trick — leaving extensive manual work for editors.

Runway has raised over $236.5 million from investors including Google (which whom it has cloud compute credits) and Nvidia, as well as VCs such as Amplify Partners, Felicis and Coatue. The company has aligned itself closely with the creative industry as its investments in generative AI tech grow. Runway operates Runway Studios, an entertainment division that serves as a production partner for enterprise clientele, and hosts the AI Film Festival, one of the first events dedicated to showcasing films produced wholly — or in part — by AI.

But the competition is getting fiercer.

Generative AI startup Luma last week announced Dream Machine, a video generator that’s gone viral for its aptitude at animating memes. And just a couple of months ago, Adobe revealed that it’s developing its own video-generating model trained on content in its Adobe Stock media library.

Elsewhere, there’s incumbents like OpenAI’s Sora, which remains tightly gated, but which OpenAI has been seeding with marketing agencies and indie and Hollywood film directors. (OpenAI CTO Mira Murati was in attendance at the 2024 Cannes Film Festival.) This year’s Tribeca Festival — which also has a partnership with Runway to curate movies made using AI tools — featured short films produced with Sora by directors who were given early access.

Google’s also put its image-generating model, Veo, in the hands of select creators, including Donald Glover (AKA Childish Gambino) and his creative agency Gilga, as it works to bring Veo into products like YouTube Shorts.

However the various collaborations shake out, one thing’s becoming clear: generative AI video tools threaten to upend the film and TV industry as we know it.

Filmmaker Tyler Perry recently said that he suspended a planned $800 million expansion of his production studio after seeing what Sora could do. Joe Russo, the director of tentpole Marvel films like “Avengers: Endgame,” predicts that within a year, AI will be able to create a fully fledged movie.

A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, found that 75% of film production companies that have adopted AI reduced, consolidated or eliminated jobs after incorporating the tech. The study also estimates that by 2026, more than 100,000 of U.S. entertainment jobs will be disrupted by generative AI.

It’ll take some seriously strong labor protections to ensure that video-generating tools don’t follow in the footsteps of other generative AI tech and lead to steep declines in the demand for creative work.

source