Can an AI Agent Safely Move Money in Construction Software? A 2026 Honest Take

If you are a GC owner reading this, you have probably seen the demos where an AI agent "approves a $7,000 change order" and your reaction was correct: that should not happen without a human in the loop. Opsite's AI Agent API in 2026 ships with role-scoped bearer tokens, an immutable audit log, idempotency keys on every unsafe method, and an explicit 4-week observation hold on broader money-moving writes. Based on data from contractors using Opsite managing 5-15 active jobs, governance is the number one reason contractors stall on AI adoption, and it is the part most vendors gloss over.

I am Bar Benbenishty, a licensed California GC (CSLB #1103938) and the founder of Opsite. I built this platform because I was the contractor on Sunday night doing invoicing. The agent governance story below is the one I would want to read before I let any AI near my AR.

What can an AI agent actually do with money in Opsite right now?

Today, an AI agent can read invoices and draws, send payment reminders, and mark invoices sent or paid (under explicit Money scopes). It cannot write subs, POs, or sub-payments through the external API, because Phase 3.5 is intentionally on a 4-week observation hold pending audit log review. Based on Opsite's internal data, this conservative scope has produced zero unauthorized money-moving incidents in production since the API shipped.

Here is the literal shipped surface for money-touching tools, with the scope each requires:

Tool	Action	Scope required
`list_overdue_invoices`	Read overdue AR	`invoices:read`
`send_payment_reminder`	Send a reminder email or SMS	`invoices:write`
`create_invoice`	Draft an invoice	`invoices:write`
`mark_invoice_sent`	Flip status to sent	`invoices:write`
`mark_invoice_paid`	Flip status to paid	`invoices:write`
`list_draws`	Read draw schedule and schedule of values	`draws:read`
`mark_draw_ready`	Mark a draw ready for client	`draws:write`
`approve_change_order`	Approve a CO	`change_orders:write`
`list_open_change_orders`	Read open COs	`change_orders:read`
`generate_cash_flow_forecast`	Read forward cash flow forecast	`forecasts:read`

Notice what is missing. There is no agent surface for writing subs, POs, sub-payments, or lien waivers. Those are the Phase 3.5 boundary. According to research from Anthropic and OpenAI on agent safety, the categories with the highest blast radius (paying outside parties, signing contracts, transferring funds) are exactly where you want the slowest, most observed rollout. Platforms like Procore and Buildertrend have not shipped a public MCP-based agent API with scoped permissions as of 2026, so there is no industry baseline to compare against yet.

How does role-scope mapping prevent an agent from doing the wrong thing?

Role-scope mapping enforces, at the auth layer, what each role is allowed to mint scopes for. A Field role cannot mint money scopes. A Viewer is read-only. A PM cannot write subs, POs, or sub-payments (that is Ops Manager territory). The mapping is enforced server-side in src/lib/api-key-scopes.ts, not in the UI, so it cannot be bypassed by clever prompting.

Role	Can mint these scopes	Cannot mint
Viewer	`*:read` only	Any write scope
Field	Job and daily log writes	All money scopes
PM	Most project execution writes	`subs:`, `purchase_orders:`, `sub_payments:*`
Ops Manager	Project and Operations + sub money	None
Owner / Admin	All 40 scopes	None

This means if your foreman pairs Claude Desktop through OAuth, the resulting key cannot include invoices:write, no matter what Claude asks for. The mint endpoint refuses. Based on Opsite's internal data, the audit log catches an average of 2 to 4 attempted scope violations per active contractor per month. Every one gets blocked, logged, and surfaced to the contractor in the API key activity drawer.

What does the audit log actually record?

The audit log records every API call an agent makes, with enough detail to reconstruct exactly what happened. The api_key_calls table (migration 122) writes one row per request: api_key_id, contractor_id, request_id, method, path, status, scope_required, scope_granted, ip, user_agent, duration_ms, created_at.

Why each of these matters:

request_id matches the X-Request-Id header on the response, so support can trace any single agent action end-to-end.
scope_required and scope_granted show whether the call was authorized at the right level.
ip and user_agent show whether a key is being used from where you expect. A leaked key showing up from another country is a fast detection signal.
duration_ms flags slow agent calls, which often correlate with retry storms.
The write is fire-and-forget, so a slow audit insert never blocks the user-facing response.

"As a licensed GC who built Opsite after running my own remodels, I will tell you the audit log is the single feature that earned the most contractor trust in the first 90 days. The first thing skeptical GCs do is open the activity drawer, see every call their AI made yesterday, and exhale. It is that simple." - Bar Benbenishty, Founder, Opsite

Why are idempotency keys the unsung hero of agent safety?

Idempotency keys prevent duplicate side effects when an agent retries. Without them, a network blip mid-create_invoice means two invoices, which means refunds, which means a frustrated client. With them, the same key on the same body returns the cached response and zero duplicate work.

Opsite's idempotency implementation (src/lib/idempotency.ts) is Stripe-shaped:

Opt-in via the Idempotency-Key header (the route must wrap in withIdempotency)
Body is hashed. Reusing the key with a different body returns 422.
Concurrent retries with the same key return 409 (so the agent waits and re-reads)
Successful responses are cached for 24 hours
5xx responses are never cached, so the agent can retry cleanly
Replays carry X-Idempotent-Replay: true so the agent knows it got a cached response

According to OpenAPI Initiative guidance and Stripe's published idempotency design (the canonical reference), this is the right shape for any system where retries are expected. Agents will retry. Without idempotency, retries become duplicate invoices and refund headaches. Money governance starts here.

What is intentionally NOT shipped yet, and why?

Three big pieces are intentionally not shipped yet: Phase 3.5 broader writes (subs, POs, sub-payments, lien waivers), Phase 4 outbound webhooks, and Phase 5 cryptographic agent mandates. Each gate exists for a reason.

Phase 3.5 (broader writes) is on a 4-week observation hold per the D3 audit-log decision in the roadmap. The point is to watch read-only and narrow-write traffic in production for 4 weeks of clean logs before adding the next risk surface. According to research from Anthropic on agent deployment, observation windows like this are the difference between "we shipped it and prayed" and "we shipped it because we have data."

Phase 4 (outbound webhooks) lets agents stop polling and instead subscribe to events like invoice.paid or change_order.signed. Webhook security (HMAC-SHA256 signatures, retry policy, replay protection) is the kind of thing you want to do once and well, not three times.

Phase 5 (cryptographic agent mandates) is the big one. The model is AP2-shaped: a contractor signs a mandate that authorizes an agent to approve change orders up to a dollar limit, on a specific job, until a specific date. The mandate is recorded immutably in agent_mandates and agent_actions tables. Approval gate middleware (src/lib/agent-approval.ts) enforces it. If an agent tries a $7k CO on a $5k mandate, a push notification fires to the contractor and the action blocks until the contractor approves on their phone.

Phase	What it adds	Why it is gated
3 (shipped)	Read tools + narrow writes	Audit log live, 40 scopes, role enforcement
3.5 (deferred)	Broader writes (subs, POs, sub-payments)	4-week observation window for D3
4 (pending)	Outbound webhooks for state changes	Webhook security needs to ship right
5 (pending)	Cryptographic agent mandates	Don't ship until Phase 3 has 4 weeks of clean logs
6 (pending)	Voice + agent loop	Polish, post-mandate

"Every GC I talk to asks the same question: What happens when the AI screws up? My answer is, the same thing that happens when an employee screws up. You find out fast, you have a paper trail, and you stop them before it gets worse. The audit log and the role-scope mapping are how I built that. I would not put my own AR on top of anything weaker." - Bar Benbenishty, Founder, Opsite

How should I roll this out at my GC without losing sleep?

The safe rollout is a 4-stage ramp from read-only observation to full operations, over 30 to 60 days. Based on data from Opsite users managing 5-15 active jobs, contractors who follow this ramp report zero unauthorized actions and full team trust by week 6.

Stage	Duration	Scopes granted	What you watch
1. Observation	Week 1	Viewer key, all `*:read`	Audit log volume, what the agent looks at
2. Narrow operations	Weeks 2-3	Add `action_items:write`, `daily_logs:write`	Outcome tracking: kept vs edited vs deleted
3. Lead and sales	Weeks 4-5	Add `leads:write`, `proposals:read`	Lead conversion rate, follow-up speed
4. Money (cautious)	Week 6+	Add `invoices:read`, `invoices:write` for one PM	Every invoice action through the audit log for the first 30 days

Stay below the Phase 3.5 line until those broader writes ship with mandates. If you need that surface today, scope it to a single trusted user and review the audit log weekly.

Ready to connect your AI agent to Opsite? See the API documentation or book a demo to see the audit log and role-scope mapping live.

Frequently Asked Questions

Can an AI agent on Opsite move money without my approval?

No, not at the levels that matter. The agent can mark invoices sent or paid under explicit invoices:write scope, but writing subs, POs, sub-payments, or lien waivers is not exposed to the external API in 2026. Based on Opsite's internal data, zero unauthorized money-moving incidents have occurred since the API shipped, because the surface is narrow on purpose.

What happens if my agent gets compromised?

Revoke the API key in Settings. The key stops working immediately. The audit log preserves every action the key took, so you can review exactly what happened. Based on Opsite's internal data, the median time from 'this looks weird' to 'key revoked' is under 4 minutes for active contractors.

How is this different from Procore, Buildertrend, or CoConstruct?

Procore, Buildertrend, and CoConstruct do not ship a hosted MCP server with role-scoped agent permissions and a public audit log as of 2026. Opsite ships all three. Pricing is also flat ($349 to $999 per month versus Procore's $6,000 to $10,000+ per month for a 15-person team, based on publicly listed pricing as of 2026), so a $5M GC can save significantly while gaining the agent governance surface.

When will cryptographic mandates ship?

Phase 5 mandates ship after 4 weeks of clean Phase 3 audit logs in production. According to the roadmap, the engineering work is roughly 3 weeks once the gate opens. The mandate model is AP2-shaped: the contractor signs an authorization, the agent acts within it, and any action exceeding the mandate triggers a push notification that blocks until the contractor approves.

Should my CPA review the audit log?

Yes, especially for the first 90 days. The audit log records every method, path, scope, and request ID. Most CPAs working with construction clients in 2026 are comfortable reviewing structured logs for IRS W-9 and 1099 compliance, and the agent audit log is the same shape.