As AI agents and chatbots become more common in applications, their capabilities have largely been limited to text. Lemon Slice, a company specializing in digital avatar creation, aims to integrate video into these interactions through its new diffusion model, capable of generating digital avatars from just one image.
Known as Lemon Slice-2, this model can produce a digital avatar that operates using a knowledge base, allowing it to fulfill various roles for an AI agent, such as handling customer inquiries, assisting with academic tasks, or serving as a mental health support agent.
Co-founder Lina Colucci explained, “When generative AI was emerging, my co-founders and I experimented with various video models, recognizing early on that video would evolve to be interactive. The engaging aspect of platforms such as ChatGPT stemmed from their interactivity, and we aim to bring that same interactive dimension to video.”
According to Lemon Slice, their 20-billion-parameter model can stream videos at 20 frames per second using just one GPU. The company offers this model via an API and an embeddable widget, allowing businesses to integrate it into their websites with minimal code. Once an avatar is generated, its background, style, and look can be customized at any time.
Beyond human-like representations, the company is also developing the capability to create non-human characters for diverse applications. For the voice generation of these avatars, the startup leverages ElevenLabs’ technology.
Established in 2024 by Lina Colucci, Sidney Primas, and Andrew Weitz, Lemon Slice believes its proprietary general-purpose diffusion model—a generative model that reconstructs new data by reversing a noise process learned from training data—for avatar creation will distinguish it in the market.
Colucci commented, “Current avatar solutions I’ve encountered tend to detract from the product experience. They often appear unsettling and rigid, providing a brief moment of realism before becoming ‘uncanny’ and uncomfortable upon interaction. The primary barrier to widespread avatar adoption has been their insufficient quality.”
To support this initiative, the company announced on Tuesday that it secured $10.5 million in seed funding. Key investors include Matrix Partners, Y Combinator, Dropbox CTO Arash Ferdowsi, Twitch CEO Emmett Shear, and The Chainsmokers.
The company states it has implemented safeguards to prevent unauthorized cloning of faces or voices, and it employs large language models for content moderation.
While Lemon Slice did not disclose the specific organizations utilizing its technology, it indicated that the model is being applied in various sectors such as education, language acquisition, e-commerce, and corporate training.
The startup contends with strong competition from video generation firms such as D-ID, HeyGen, and Synthesia, along with other digital avatar developers like Genies, Soul Machine, Praktika, and AvatarOS.
Ilya Sukhar, a partner at Matrix, believes avatars will prove valuable in video-centric environments, citing the preference for visual learning on platforms like YouTube over extensive text. He highlighted Lemon Slice’s technical expertise and proprietary approach as key advantages over rivals.
He elaborated, “The team is highly technical, with a proven history of delivering machine learning products, not merely demonstrations or research. While many competitors focus on niche applications or specific industries, Lemon Slice adopts the generalized “bitter lesson” scaling methodology—prioritizing data and compute—which has been successful across other AI domains.”
Jared Friedman of Y Combinator suggests that Lemon Slice’s diffusion-style model enables it to create diverse avatars, unlike some startups that concentrate solely on human-like or game-character avatars.
He stated, “I believe Lemon Slice is unique in its fundamental machine learning strategy, which has the potential to conquer the uncanny valley and pass the avatar Turing test. Their training involves a video diffusion transformer, similar to models like Veo3 or Sora. As an end-to-end general-purpose model, its potential for improvement is limitless, unlike others that plateau below photorealism. Moreover, it supports both human and non-human faces, requiring only a single image to generate a new face.”
The startup, currently employing eight individuals, intends to allocate its newly raised capital towards recruiting engineering and go-to-market personnel, as well as covering the computational expenses for model training.