On Thursday, Google introduced a significantly upgraded iteration of its Gemini Deep Research agent, powered by its highly anticipated state-of-the-art foundation model, Gemini 3 Pro.
Beyond generating research reports, this enhanced agent now provides developers the ability to integrate Google’s cutting-edge research functionalities into their proprietary applications. This integration is facilitated by Google’s recently launched Interactions API, aiming to empower developers with greater command in the burgeoning age of agentic AI.
The updated Gemini Deep Research tool functions as an agent adept at consolidating vast amounts of data and processing extensive contextual inputs. Google highlights its application across various customer tasks, from conducting due diligence to performing drug toxicity safety investigations.
Google further announced upcoming integration of this advanced research agent into several of its services, such as Google Search, Google Finance, the Gemini App, and the widely used NotebookLM. This move signifies a progression towards a future where AI agents, rather than humans, undertake information retrieval.
The technology giant asserts that Deep Research leverages Gemini 3 Pro’s designation as its “most factual” model, specifically engineered to reduce instances of hallucinations during intricate operations.
The phenomenon of AI hallucinations – where Large Language Models fabricate information – presents a particularly significant challenge for extended, complex agentic reasoning tasks that involve numerous autonomous decisions over prolonged periods. An increase in an LLM’s decision points directly amplifies the risk that a single erroneous, hallucinated choice could compromise the integrity of the entire result.
To substantiate its claims of advancement, Google introduced an additional benchmark, somewhat prosaically named DeepSearchQA, which is designed to evaluate agents on intricate, multi-stage information retrieval exercises. Google has made this benchmark open source.
Deep Research also underwent testing using Humanity’s Last Exam, a more evocatively titled, independent general knowledge benchmark featuring highly specialized tasks, as well as BrowserComp, a benchmark tailored for browser-centric agentic operations.
Predictably, Google’s latest agent outperformed rivals on both its proprietary benchmark and Humanity’s. Nevertheless, OpenAI’s ChatGPT 5 Pro emerged as a remarkably close contender across the board, even slightly surpassing Google on the BrowserComp benchmark.
Yet, these benchmark comparisons quickly became outdated. This was because, on the very same day, OpenAI unveiled its eagerly awaited GPT 5.2, codenamed Garlic. OpenAI claims its newest model surpasses its competitors, particularly Google, across a range of standard benchmarks, including its own internal assessments.
Notably, the timing of this announcement was particularly intriguing. With awareness of the impending launch of ‘Garlic,’ Google opted to release its own significant AI-related news.