While AI enables widespread video creation, many tools for AI video generation often miss audio support. Mirelo is developing AI to automatically generate soundtracks that align with video content.
The Berlin-based startup launched Mirelo SFX v1.5 earlier this year, an AI model designed to analyze videos and incorporate synchronized sound effects (SFX).
This innovation captured the interest of venture capitalists anticipating a generative AI transformation in the gaming sector. TechCrunch exclusively reported that the two-year-old German startup secured a $41 million seed funding round, co-led by Index Ventures and Andreessen Horowitz.
The fresh funding will enable Mirelo to enhance its competitive standing within this nascent market. During Mirelo’s stealth phase and limited resources, major corporations like Sony and Tencent introduced their own video-to-SFX models. Additionally, China’s Kuaishou-backed Kling AI and ElevenLabs, another company supported by a16z, also launched similar offerings.
Although Mirelo distinguishes itself with a more specialized approach, outperforming these established models long-term necessitates expanding its workforce. Mirelo CEO and co-founder CJ Simon-Gabriel informed TechCrunch that the startup anticipates its current team of 10 to “double or even triple” in size by the close of next year.
The upcoming recruitments will bolster Mirelo’s research and development, alongside its product and market entry strategies. Simon-Gabriel stated that the startup has made its models available on Fal.ai and Replicate, projecting that API usage will be its primary revenue source initially. Concurrently, Mirelo is developing its creator workspace, Mirelo Studio, which is intended to evolve into a comprehensive professional platform.
In preparation for scaling, Mirelo and its investors are proactively addressing potential training data issues that have affected other generative AI firms. Georgia Stevenson, who spearheaded Index’s investments, indicated that Mirelo’s models utilize both public and commercially acquired sound libraries, and the company is forming revenue-sharing agreements to uphold artists’ rights.
While this tension is intrinsic to generative AI tools, Mirelo is not currently replacing musicians or sound designers. Operating on a freemium basis, with a suggested creator plan costing €20/month (around $23.50), the startup primarily aims to serve hobbyists and prosumers who wish to add audio to their AI-generated videos.
Simon-Gabriel believes that creators cannot fully leverage the new capabilities of AI video without incorporating audio.
He quoted George Lucas, stating, “Sound is 50% of the movie-going experience. It’s not an overstatement. If anything, it’s an understatement. Identical visuals can convey vastly different atmospheres based solely on the accompanying sound and music.”
Simon-Gabriel and co-founder Florian Wenzel, both accomplished AI researchers and musicians, plan to introduce AI music generation in the future. However, Simon-Gabriel noted that Mirelo is currently observing greater demand for sound effects, partly due to less extensive research in this area compared to other AI domains.
He commented, “It’s simpler to establish a strong competitive advantage here and then leverage it.”
This strategy has the potential to yield substantial returns for Mirelo. While Simon-Gabriel withheld the new valuation, he confirmed it had grown “very significantly” since its undisclosed pre-seed round. That initial funding was led by Berlin’s Atlantic, which also contributed to the latest round, elevating Mirelo’s total capital raised to $44 million and addressing its funding requirements.
Additionally, the startup benefits from angel investors who enhance its technological credibility and potentially open new opportunities, such as Mistral CEO Arthur Mensch, Hugging Face Chief Science Officer Thomas Wolf, Fal.ai co-founder Burkay Gur, among others.
Nevertheless, the team recognizes that AI-generated videos will likely not remain silent indefinitely.
For example, Gemini’s video generator now integrates soundtracks utilizing DeepMind’s Veo 3.1 video-to-audio model. This development, however, seems to affirm Simon-Gabriel’s perspective. He remarked, “Now, people are beginning to understand, “Perhaps we should include sound.” But of course, you absolutely should. It’s akin to the transition from silent films to talkies, isn’t it? It makes a considerable impact!”