Former Snap AI chief launches Higgsfield to take on OpenAI’s Sora video generator

April 3, 2024 ndowd

OpenAI captivated the tech world a few months back with a generative AI model, Sora, that turns scene descriptions into original videos — no cameras or film crews required. But Sora has so far been tightly gated, and the firm seems to be aiming it toward well-funded creatives like Hollywood directors — not hobbyists or small-time marketers, necessarily.

Alex Mashrabov, the former head of generative AI at Snap, sensed an opportunity. So he launched Higgsfield AI, an AI-powered video creation and editing platform designed for more tailored, personalized applications.

Powered by a custom text-to-video model, Higgsfield’s first app, Diffuse, can generate videos from scratch or take a selfie and generate a clip starring that person.

“Our target audience is creators of all types,” Mashrabov told TechCrunch in an interview, “from regular users who want to create fun content with their friends to social content creators looking to try a new content format to social media marketers who want their brand to stand out.”

Mashrabov came to Snap by way of AI Factory, his previous startup, which Snap acquired in 2020 for $166 million. While at Snap, Mashrabov helped to build products like AR effects and filters for Snapchat, including Cameos, as well as Snapchat’s controversial MyAI chabot

Higgsfield — which Mashrabov co-launched with Yerzat Dulat, an AI researcher specializing in generative video, several months ago — offers a curated set of pre-generated clips, a tool to upload reference media (i.e. images and videos) and a prompt editor that lets users describe the characters, actions and scenes they wish to depict. Using Diffuse, users can insert themselves directly into an AI-generated scene, or have their digital likeness mimic things — like dance moves — captured in other videos.

Image Credits: Higgsfield

“Our model supports highly realistic movements and expressions,” Mashrabov said. “We’re pioneering ‘world models’ for consumers, which will allow us to build best-in-class video generation and editing with a great level of control.”

Higgsfield isn’t the only generative video startup going head to head with OpenAI. Runway was one of the first on the scene, and its tools continue to improve. There’s also Haiper, which has the backing of two DeepMind alums and over $13M in venture cash.

Mashrabov argues that Diffuse will stand out thanks to its mobile-first, social-forward go-to-market strategy.

“By prioritizing iOS and Android apps instead of desktop workflows, we enable creators to create compelling social media content anytime and anywhere,” Mashrabov said. “Indeed, by building on mobile, we’re able to prioritize ease of use and consumer-friendly features from day one.”

Higgsfield is also running lean. Mashrabov says that the generative models underpinning the platform were developed by a 16-person team in less than nine months and trained on a cluster of 32 GPUs. (32 GPUs might sound like a lot, but considering OpenAI uses tens of thousands, it’s not really.) And Higgsfield has only raised $8 million to date, the bulk of which came from a recent seed funding tranche led by Menlo Ventures.

Image Credits: Higgsfield

To stay one step ahead of rivals, Higgsfield plans to put the seed cash toward building an improved video editor that’ll let users modify characters and objects in videos, and toward training more powerful video generation models specifically for social media use cases. In fact, Mashrabov sees social media — and social media marketing — as Higgsfield’s principle money-making niche.

While Diffuse is currently free to use, Mashrabov envisions a future where marketers pay some sort of fee or subscription for premium features, or for volume or large-scale campaigns.

“We believe Higgsfield unlocks an incredible level of realism and content production use cases for social media marketers,” he said. “We constantly hear from CMOs and creative directors that they need to optimize content production budgets and shorten timelines while still delivering impactful content. So we believe video generative AI solutions will be a core solution in helping them to achieve it.”

Of course, Higgsfield isn’t immune from the broader challenges facing generative AI startups.

It’s well-established that generative AI models like the kind powering Diffuse can “regurgitate” training data. Why’s that problematic? Well, if the models were trained on copyrighted content without permission or some sort of licensing agreement in place, those models’ users could unwittingly generate a copyright-infringing work — exposing them to lawsuits.

Image Credits: Higgsfield

Mashrabov wouldn’t reveal the source of Higgsfield’s training data (other than say it comes from “multiple publicly available” places), and also wouldn’t say whether Higgsfield would retain user data to train future models, which might not sit right with some business customers. He did note that Diffuse users can request that their data be deleted at any time through the app.

Digital “cloning” platforms like Higgsfield are also ripe for abuse, as the wildfire spread of deepfakes on social media in recent months has shown.

In a similar vein, Higgsfield could make it easier to steal creators’ content. For instance, one need only upload a video of someone’s choreography to generate a video of themselves performing that same choreography.

I asked Mashrabov about what safeguards or protections Higgsfield might be using to attempt to prevent abuse, and — while he wouldn’t go into specifics — he claimed that the platform employs a mix of automated and manual moderation.

“We’ve decided to gradually roll out the product and test in select markets first, so that we can monitor where there’s the potential for abuse and evolve the product as necessary,” Mashrabov added.

We’ll have to wait and see how well that works in practice.

source