Google’s New AI API is a Big Deal, But There’s a Catch

According to VentureBeat, Google DeepMind launched the public beta of its Interactions API last week, a major shift from the old “stateless” completion model to a new “stateful” architecture where Google’s servers retain conversation history, tool outputs, and agent “thoughts.” This new system, available now in Google AI Studio, supports models like Gemini 3.0 Pro Preview and Gemini 2.5 Flash and Pro, and introduces a built-in agent called Gemini Deep Research for long-horizon tasks. Crucially, it features Background Execution to avoid HTTP timeouts and native support for the open Model Context Protocol (MCP) for calling external tools. The API uses a tiered data retention policy, with a Free Tier keeping history for 1 day and a Paid Tier retaining it for 55 days to enable implicit caching and cost savings. This move follows OpenAI’s own shift nine months ago with its Responses API, setting up a key philosophical divergence in how tech giants handle the infrastructure for advanced AI agents.

The Remote Compute Shift

Here’s the thing: for years, building with LLMs has been a bit of a clumsy hack. You send a giant blob of text, get a response, and then you, the developer, have to manage that ever-growing blob of history. It’s stateless. It’s simple. But for agents that are supposed to “think” over time, it’s a total bottleneck. Google‘s new API basically says, “Stop mailing us the entire library every time. Just give us the new book, and we’ll keep the whole collection on our shelf.” The core innovation is server-side state as the default. You pass a session ID, and Google’s infrastructure holds the memory. This turns the API from a simple request-reply service into what they call a “remote operating system” or a job queue for intelligence. That’s a fundamental change in how we interact with these models.

The Good, The Bad, and The Sticky

So what’s in it for developers? The immediate win is implicit caching. You’re not re-uploading the same context tokens repeatedly, which should slash costs for chatty applications. The Background Execution feature is huge, too—finally, a way to tell an agent to “go research for an hour” without your HTTP connection timing out. And embracing MCP natively is a smart nod to the open tooling ecosystem. But. There are always buts. The expert critique from Sam Witteveen about Gemini Deep Research’s citation system is a perfect example of the hidden friction. If your research agent spits out URLs wrapped in Google redirects that expire, that’s basically useless for any real reporting or archival. It breaks the chain of trust and verification. That’s a big quality issue they need to fix.

The Philosophical Split With OpenAI

This is where it gets interesting. Google is playing catch-up to OpenAI’s Responses API, but their approaches reveal different priorities. OpenAI uses “Compaction”—they shrink the history into opaque, encrypted blobs to save tokens. It’s efficient, but it’s a black box. You can’t see the model’s past reasoning. Google’s Interactions API takes the “hosted” approach. They keep the full, inspectable history on their servers. They prioritize developer transparency and debuggability over pure compression. Which is better? Well, it depends. If you’re building a complex agent and need to debug why it went off the rails, Google’s way seems superior. If you’re purely optimizing for token cost and don’t care about the intermediate steps, maybe OpenAI’s method wins. This isn’t just a technical difference; it’s a philosophical one about what we value in these systems.

The Real Trade-Offs For Teams

Look, the promise is seductive: offload state management, get cost savings, and deploy powerful agents faster. For lead engineers, Background Execution solves a real headache. For budget managers, implicit caching looks like a spreadsheet miracle. But you’re trading control for convenience. Using the built-in Deep Research agent is a black box compared to crafting your own loops with LangChain. And for security and data directors, this is a double-edged sword. Keeping sensitive conversation history on Google’s servers for 55 days (on the Paid Tier) is a major data residency consideration. OpenAI offers Zero Data Retention for enterprises. Google? Not so much. You can disable statefulness, but then you lose the very benefits that make the API attractive. It’s a classic vendor lock-in play, wrapped in a shiny new feature. The official blog post talks about models becoming “systems.” They’re not wrong. But when you integrate a system this deeply, you need to ask: who really controls the memory, and at what cost?