Who Holds Evidence When the Agent Makes a Mistake

When an AI agent executes a rental transaction and something goes wrong, there will be a dispute. The question that most agentic commerce architects have not fully resolved is: who holds the evidence record, and what does it contain?

This is not a hypothetical. Landlord-tenant disputes are already among the most common civil proceedings in North American courts. They are disputes about what was agreed, what was paid, what was said, and what condition the property was in at specific moments in time. They are, at their core, disputes about evidence.

An AI agent can execute a lease faster than any human process. It cannot automatically create the bilateral evidence record that makes that lease resolvable if something goes wrong. That record requires a specific architecture, and it requires that architecture to be in place before the transaction, not assembled from fragments after a dispute arises.

The bilateral problem

A lease dispute involves two parties who each hold an incomplete record. The tenant has their version. The property manager has their version. Neither record was designed to be compared. The party that holds both sides of the record simultaneously, with each party's consent, is the party that can resolve the dispute with authority. That is not currently anyone in the rental transaction.

What the Responsibility Camp is trying to build

One of the six camps competing for agentic commerce infrastructure is focused specifically on the accountability layer: audit trails, liability attribution frameworks, and evidence standards for disputes involving AI agents. They are asking the right questions. Their answers, so far, address the general commerce case.

The general commerce case is relatively straightforward: a receipt, a delivery confirmation, a transaction log. The dispute resolution surface is limited because the transaction closes quickly and the obligations end at delivery.

Rental disputes are categorically different. They arise over a twelve-month or multi-year term. They involve obligations that were not fully explicit at the time the lease was signed. They frequently involve disagreements about the AI's own decision-making: why was an application approved or declined? Why was a specific clause in the lease worded as it was? If the AI was operating under instructions that were subsequently altered, what instructions was it actually using at the time of the decision?

3.6M

Eviction filings in the US annually, according to the Eviction Lab. Each filing represents a dispute that requires an evidence record. In an agentic era, the question of who created that record, and whether it is reliable, becomes legally significant.

Source: Eviction Lab; VFIntel analysis

The system prompt problem

The McKinsey Lilli breach demonstrated that write access to an AI's system prompts is achievable, inexpensive, and difficult to detect. If an AI agent's governing instructions can be silently altered, the evidentiary record of what that AI was reasoning about when it made a decision becomes unreliable.

In a lease dispute, this creates a specific problem. If the AI that approved or declined an application was operating under instructions that differ from the instructions on record, the dispute cannot be resolved by examining the system prompt log. The log may not reflect what the agent was actually doing.

This is why the evidence architecture for agentic rental transactions needs to be more than a log. It needs to be an immutable record of what the agent knew, what instructions it was operating under, and what both parties agreed to at each step of the transaction. That record is not produced automatically by any of the current agentic commerce protocols.

Why bilateral visibility matters

The strongest evidence position in a rental dispute belongs to the party that holds both sides of the transaction: what the renter consented to and what the property manager recorded, simultaneously, under a single consent architecture. No other position can produce a complete picture of the transaction without depending on records held by an adversarial party.

Building that position requires being embedded in both sides of the rental relationship before the dispute arises. It requires a structure where the renter's financial identity and the property manager's transaction record are captured under the same infrastructure, not assembled after the fact from separate systems that were never designed to be compared.

The agentic commerce protocols being built today handle execution well. They do not build this record. The organizations that build it first will hold the most defensible position in every rental dispute that involves an AI agent, and that category of dispute is going to become very large, very quickly.

Robert Elensky

Founder & CEO, VFIntel

Robert built VFIntel on the premise that the rental economy's financial coordination failure is an infrastructure problem, not a product problem. He writes on regulated fintech, embedded insurance, and the structural risks accumulating across the enterprise software stack as AI agents become the primary actors operating within it.

Who Holds the Evidence When the Agent Makes a Mistake

What the Responsibility Camp is trying to build

The system prompt problem

Why bilateral visibility matters

Robert Elensky

Fifteen minutes. A direct answer.