China’s generative video race heats up

February 6, 2024 ndowd

On Monday, Tencent, the Chinese internet giant known for its video gaming empire and chat app WeChat, unveiled a new version of its open source video generation model DynamiCrafter on GitHub. It’s a reminder that some of China’s largest tech firms have been quietly ramping up efforts to make a dent in the text- and image-to-video space.

Like other generative video tools on the market, DynamiCrafter uses the diffusion method to turn captions and still images into seconds-long videos. Inspired by the natural phenomenon of diffusion in physics, diffusion models in machine learning can transform simple data into more complex and realistic data, similar to how particles move from one area of high concentration to another of low concentration.

The second generation of DynamiCrafter is churning out videos at a pixel resolution of 640×1024, an upgrade from its initial release in October that featured 320×512 videos. An academic paper published by the team behind DynamiCrafter notes that its technology differs from those of competitors in that it broadens the applicability of image animation techniques to “more general visual content.”

“The key idea is to utilize the motion prior of text-to-video diffusion models by incorporating the image into the generative process as guidance,” says the paper. “Traditional” techniques, in comparison, “mainly focus on animating natural scenes with stochastic dynamics (e.g. clouds and fluid) or domain-specific motions (e.g. human hair or body motions).”

In a demo (see below) that compares DynamiCrafter, Stable Video Diffusion (launched in November), and the recently hyped-up Pika Labs, the result of the Tencent model appears slightly more animated than others. Inevitably, the chosen samples would favor DynamiCrafter, and none of the models, after my initial few tries, leaves the impression that AI will soon be able to produce full-fledged movies.

Nonetheless, generative videos have been given high hopes as the next focal point in the AI race following the boom of generative text and images. It’s thus expected that startups and tech incumbents are pouring resources into the field. That’s no exception in China. Aside from Tencent, TikTok’s parent ByteDance, Baidu and Alibaba have each released their video diffusion models.

Both ByteDance’s MagicVideo and Baidu’s UniVG have posted demos on GitHub, though neither appears to be available to the public yet. Like Tencent, Alibaba has made its video generation model VGen open source, a strategy that’s increasingly popular among Chinese tech firms hoping to reach the global developer community.

source