Cost Matrix — Four Business Models

§2 · The shared spine

Three costs recur everywhere

Every model carries the same three cost atoms. The models differ only in who pays them and when.

Data licence (Cotality/RP Data) — irreducible everywhere. The redistribution licence gates Models 1, 2, 4 — not Model 3. That asymmetry is the whole story.
Inference + voice — a pure CapEx↔OpEx swap. The Framework Desktop converts per-minute cloud spend into a fixed box.
Labour — SaaS amortises it across tenants; in-house is labour-heavy; franchise pushes it onto the franchisee.

Explainer A

The CapEx ↔ OpEx swap

Cloud voice + inference (OpEx) grows with every minute used. The Framework Desktop is a flat upfront cost (CapEx). Past breakeven, the box is strictly cheaper — and it's already built.

§3 · Master matrix

The four models, side by side

Baseline = "small shop" (~5 seats) scaling up. ⚠ = dominated by an unconfirmed parameter (RPP_SEAT, API_SEAT).

Cost line	1 · On-prem box	2 · Hosted SaaS	3 · In-house	4 · Franchise
Hardware CapEx	box (1–2) → mainboard cluster	~A$0	box (own use)	box/node — franchisee
Setup labour	High (per site)	Low	Low (just you)	Med
Fixed OpEx/mo	~$0 + power	CLOUD_INFRA	CLOUD_INFRA	central clearing only
Data base/mo	API_BASE ⚠	API_BASE ⚠	portal — see §4	API_BASE ⚠
Data per seat/mo	API_SEAT ⚠	API_SEAT ⚠	RPP_SEAT ⚠	API_SEAT ⚠
Inference + voice	~$0 local	per-min unless relay	~$0 local	~$0 local
Sales / closing labour	buyer's	buyer's	HIGH — your closers	franchisee's
Redistribution licence?	if leads leave site	required (worst)	not required ✅	required (existential)
Marginal / +1 seat	step + data	~flat + data	new closer $$	~$0 to you
Marginal / +1 lead	~$0	cents	data + commission	~cents
Revenue capture	box + licence/support	per-seat sub	full commission/deal	rake % + franchise fee
Cost-scaling shape	stepwise	linear	linear in closers	sub-linear

Explainer B

Four scaling shapes

As headcount grows, costs scale differently. In-house climbs fastest (each closer is linear cost); the franchise rides a sub-linear curve because franchisees fund their own CapEx and labour.

§4 · Model 3 deep-dive

Portal vs API — and the two gates

For in-house, the Cotality API may be unnecessary: enrich through the RP Data portal on a normal seat. You don't redistribute, so the enrichment method stays a blackbox. This drops the API line entirely — but it only opens one of two gates.

Explainer C

Two gates — only one opens

"Not selling it on" kills the redistribution gate completely. It does not touch the access-ToU gate — how you pull is governed regardless of what you do downstream.

Revised Model 3 cost line

Line	API path	Portal path
UAT setup (testing only)	A$2,500 +GST	A$0
Production minimum/mo	A$500–1,000	A$0
Production volume (bulk-process book)	⚠ ~A$250k (est., unquoted) for ~896k	A$0 — only enrich leads you work
Per seat/mo	API_SEAT ⚠	RPP_SEAT ⚠ (your heads)
Redistribution licence	required at scale	not required ✅
Real saving	—	not the ~A$6k/yr entry cost — avoiding the ~A$250k bulk bill

Testing ≠ production. The A$2,500 + A$500/mo are testing-stage + minimum (confirmed, Cotality email). Production volume pricing was never quoted — Cotality prices per use-case/volume. The ~A$250k to bulk-process all ~896k is Dom's estimate, unconfirmed. You never pay it: the on-demand, human-triggered path (§13) enriches only the leads you work — that's its dollar justification.

Model-3-specific snag (independent of Cotality): owner-data-for-marketing may be capped by state land-registry codes regardless of contract. Model 3 is the one model that markets to those owners, so this bites hardest here and no Cotality arrangement clears it. Counsel line required before building owner outreach.

§5a · Access posture

Human operator, extended by tools

Chosen posture: HITL assisted automation (Camoufox) — a real human operator extended by tools of graduated agency. Defensibility is governed by one thing: how tightly intent couples to access, per unit of work.

Explainer D

The coupling spectrum (L1 → L4)

ToU cares about access pattern, not whether a human is present. As autonomy climbs, one human intent triggers more access — the coupling stretches and "human present" becomes a figleaf. Defensible zone: L1–L2.

Camoufox carries two separable purposes. Access-enablement — the portal blocks vanilla automation even at human rates, so stealth just lets you function (defensible). Volume-concealment — spoofing to hide a pattern you suspect would be flagged (the liability). Lowers P(caught), raises severity-if-caught.

§5b · The contingent-liability row

A cost you pay only if

Every other matrix row is deterministic cash. This one is priced by expected value. The triggering event is discovery — Cotality detecting automated access and acting on it.

E[contingent cost] = P(discovery) × severity(discovery)

Without this row, portal beats API trivially (saves ~A$6k/yr). But API carries ~zero contingent liability. The honest comparison:

true cost(portal) = cash − ~A$6k/yr saved + E[contingent liability]

Explainer E

Two postures — same cash, inverted answer

Probability ring × severity bar = expected cost. CLEAN stays a rounding error under the $6k saved. STEALTH-HEAVY blows through it — and the API was cheaper all along. Numbers illustrate the mechanic, not estimates.

Posture	P(discovery/yr)	Severity if caught	E[contingent]/yr	Verdict vs API
CLEAN — L1–L2, no stealth, logged, human-rate	~2%	~A$20k (seat pause + rebuild)	~A$400	portal wins — save A$6k, risk A$400
STEALTH-HEAVY — L3–L4 + concealment, in negotiation	~15%	~A$300k (licence talks collapse)	~A$45k	API was far cheaper

Explainer F

Why this risk is nastier than normal

It's correlated with your own upside. The discovery event that kills the seat is the same event that poisons the licence Models 1/2/4 are built on. One bolt, two losses — and it fires precisely when the upside matters most. You can't diversify it away.

Recommendation — run CLEAN. Stay L1–L2, use Camoufox only for access-enablement, and log every pull (operator, seat, timestamp, rate, purpose). That log is simultaneously your ToU defensibility and the first link in the provenance chain the exchange needs anyway. A business whose moat is provable consent + licence lineage must not have a deniable acquisition layer at its root.

§6 · Agentic price discovery

Model 4's margin engine

Not a tangent — it's what makes the franchise rake worth more than the on-prem licence. Per-lead pricing costs cents (≈$0 on the local box); its value compounds with network liquidity.

Explainer G

The liquidity flywheel

More nodes → tighter price discovery → more leads clear → bigger rake pool → more nodes. The honest dependency: it needs demand-side liquidity to mean anything. Build it after Model 3 gives the agent closed-deal signal.

§7 · Optimal path

3 → 1 → 4

The same sequence the canon already encodes (exchange PARKED, work the GREEN lane) — so it compounds rather than re-derives. SaaS (2) is conditional distribution only.

Explainer H

The sequence, and where the licence gate sits

Model 3 ships now, portal-only, zero licence dependency. The redistribution licence gate sits between 3 and 1 — the API cost switches on exactly when productisation gives you a reason to go land it.

1 · In-house first (3), portal-only, CLEAN posture. Zero licensing risk; generates the closed-deal signal the price-discovery agent needs.
2 · Productise your own box as on-prem (1). Already built in step 1 — repackaging, not new R&D. First point you need the licence → time it to the negotiation.
3 · Franchise the proven node + clearing layer (4) once the licence lands and you have closed-deal history for real price discovery.
4 · SaaS (2) only if a segment wants cloud and the licence already covers redistribution. Never the lead model.

Act II · The full value chain

The matrix costs the spine. The chain has two more stages.

Property-data was the easy part. The real pipeline bolts social enrichment on the front and social engagement + ads on the back — both riskier and more regulated than the spine. And they form a loop, not a line.

Explainer I

The flywheel — and the narrowing funnel

Enrichment builds audiences → engagement reaches them → responses feed back as data → sharper scoring. The moat is a seed-audience asset. And you don't work every record: cost concentrates on fewer, higher-conviction leads as the funnel narrows.

§9 · Social enrichment

Buy clean — when you're selling it on.

This is the CLEAN-lane rule, not a universal one. It applies when data is redistributed (Models 1/2/4). For the in-house lane, the calculus flips — see Cowboy mode (Act III), where scraping is the right call.

For the resale lane: the cheap scrape isn't cheap — its risk is hidden upstream. Proxycurl, the #1 LinkedIn data API at $10M ARR, was sued out of existence by LinkedIn/Microsoft in 2025 for running fake-account farms. If you're carrying lineage to a buyer, the cheap Apify actors are those farms one layer removed — provenance-tainted, able to vanish overnight. In-house, none of that bites (you don't redistribute) — that's the bifurcation.

Explainer J

Two provenance chains

A business whose moat is provable lineage can't afford a tainted link. The bought chain holds. The scraped chain has a link that snaps — and the market's best-funded version of it just disappeared.

Best enrichment for the AU / SMSF book is cheap, clean, AU-native: ABN/ASIC business-owner signal (the standout), Roy Morgan/Mosaic, phone & email validation. Buy professional data from PDL/Apollo. Spike-test coverage first — consumer skew may make match rate 10–25%, which could make the whole layer low-ROI (the cheapest answer of all).

§10 · Engagement & the ad funnel

The funnel is gated before it starts

You cannot run finance ads in Australia without an AFSL (or declared exemption) — Meta and Google both verify financial-services advertisers. SMSF/property-investment counts. So the paid engine is gated on holding a licence, being an authorised rep, or a partner (Greentree?) who holds one.

Explainer K

The AFSL gate, then the CAC

No licence, no paid funnel — the gate stays locked. Past it, AU finance is one of the most expensive verticals: stacked up, a closed client costs roughly A$2,400 to acquire, viable only against SMSF lifetime value — and front-loaded, so it's working-capital heavy.

The moat is a seed-audience asset: Meta lookalikes want 5–10k quality records; you have 30× that. But uploading it needs a consent basis (hashing ≠ Privacy-Act exemption; the OAIC is watching) → build lookalikes off consented records only.

§11 · Self-hosted Convex

The box now holds the database — and that solves residency

Self-hosted Convex (open-source backend, Rust over SQLite/Postgres) runs the data plane on the box, next to the voice + inference stack. Box cloud cost ≈ A$0 + power — self-hosting zeros the Convex line; Clerk (free ≤10k MAU) is the only remaining cloud tie; the A$1.5–3k/mo applies to the cloud SaaS model only. But the dollar saving isn't the headline — an AU-located box means PII never leaves the country, and the reluctant US-East residency compromise simply evaporates.

Explainer L

Where the data lives

Cloud: PII leaves the country (mitigations carried, counsel-gated). Self-hosted on an AU box: it stays inside. Eliminated, not mitigated.

Honest trade-offs: ops + single-point-of-failure → mandatory backups (snapshot → AU R2/NAS); Clerk is the last cloud tie (free tier covers small shops; full air-gap wants local auth later); cross-node federation stays bespoke (the clearing-saga atom).

The hardware: a stable box, volatile alternatives

The Framework Desktop (Ryzen AI Max+ 395, 128GB, ~A$3,300 landed) is the box — chosen because its fixed-config price is insulated from the DRAM spot market. It serves 1–2 people; scale headcount by clustering Framework Mainboards (~A$2,700 each) in a rack.

Stage	Hardware	~AUD
1–2 people	Framework Desktop (stable, repairable appliance)	~A$3,300
Scale headcount	Framework Mainboards clustered in a rack	~A$2,700/board
Alternative	Mac Studio — when available (512GB M3 Ultra sold out; M4 Max caps 128GB)	A$3,499+

The volatility isn't the Framework price — it's everything around it. The 2025–26 DRAM shock (DDR5 +~400%, Gartner +130% by EOY26) has made alternatives volatile and supply-constrained. The real risks: alternative availability, and Framework's own supply continuity — if it "has its Mac Mini moment" and sells out, we can't source units to scale. Hedge: the stack is hardware-agnostic (self-hosted Convex + Ollama), so the box is a swappable substrate — qualify backup vendors now. Optionality is the hedge, not price-watching.

§13 · Product positioning

The human-scale PI agent

A tool-calling Camoufox agent: your data-enrichment specialist that navigates the web like a human, at human pace, and enriches, engages, and tees up like your best salesman — across a database no human could hold in their head. The moat is operating legitimately while competitors get banned.

"Human pace" ≠ "human." Platforms prohibit automation, not speed; a slow bot is still a bot; an autonomous agent giving financial guidance is still unlicensed advice. The defence is the human in the loop on consequential actions — pace is a consequence of that, not a substitute.

Explainer M

Augmentation, not autonomy

Make one operator do the work of ten — defensible, durable, never banned. Don't replace the operator with ten throttled bots — that's the Proxycurl pattern, slower, and it gets shut off.

Mechanics (confirmed direction)

Bulk enrich = clean APIs (ABN/ASIC, Roy Morgan, Apollo/PDL) — not Camoufox.
Camoufox = on-demand, human-triggered, per-lead research + engagement ("research this prospect I'm working now") — not bulk harvest.
Close = scoped. The agent enriches, warms, tees up; a licensed human closes anything that's a financial product (AFSL + anti-hawking s992A line).
Scale = distributed human scale. The franchise is this — many real humans, each amplified, never an account farm.

The differentiator, and it's true: "while everyone else's automation gets shut off."

Act III · The cowboy in-house stack

Two thought modes, one firewall

The "buy clean" discipline is a redistribution rule. For the in-house lane — where data stays in the building and we sell our services, not the data — the calculus flips: scrape, Apify, browser automation, all in play. The flip trigger is one question: does the data leave to a third party?

Explainer N

Cowboy vs Clean — the lane firewall

In-house draws from everything (cheap, scrappy, proprietary) and stays internal. Resale draws only from licensed sources. A provenance tag at ingestion is the wall — cowboy data can never cross into anything sold.

Two guardrails don't bend even in cowboy: public-vs-authenticated (account-bans don't care about intent → logged-in surfaces stay human-paced) and the AU Privacy Act (public business data = green; personal-individual data still fenced).

§8b · The moat, reconsidered

Static list → living system

Actors don't just enrich — they source, on a schedule, on triggers. "We have 896k records" is a commodity; "we know who became a motivated SMSF-property prospect this week" is the edge. Apify + Convex reactivity turns the moat from a snapshot into a stream.

Explainer O

Snapshot → stream

A bought list decays. A living system grows (continuous sourcing), refreshes (scheduled re-runs), and fires on events (sold a property · new ABN · posted intent). Trigger-based outreach converts multiples higher.

§14 · Architecture

Convex × combinators — mostly already built

Four substrate layers, with stolen-Clay recipes composed on top. The substrate exists: Convex is live, the combinators are shipped, Apify/LLM are on hand. The work is composition — plus one load-bearing artifact.

Explainer P

The layers, and the spine through them

Acquisition · combinators (semantic grammar) · Convex (data+functions) · LLM (judgment) — recipes on top. One result-type cell threads every layer: {value, source, lane, confidence, freshness, cost, provenance}. Get it right and the firewall, waterfall, budget and audit all read off it.

Discipline: the architecture fitting means less code — compose 2–3 recipes against the existing substrate, design the result-type contract + the tree-interpreter, and ship. Not a general combinator platform. Tell you've drifted: improving the grammar instead of enriching leads.

§14b · What Apify does for us

Actors across the value chain

Not a scraper — programmatic access to any web surface, as agent tools. In cowboy mode the public-data taps (🟢) scale freely; authenticated/personal (🟡) stays human-paced + privacy-bounded.

Explainer Q

Wire in this order

MCP first (makes everything agent-callable), then sourcing fills the funnel, triggers make it living, intent adds warmth, validation stops wasted spend. All 🟢 except the last.

Stage	Play	Lane	Value
Source	QLD/NSW business owners (Google Maps)	🟢	SMSF-trustee candidates
Source	Who's transacting now (RE portals)	🟢	property-active prospects (trigger)
Source	Forum intent (Reddit / PropertyChat)	🟢	warm, in-market
Enrich	Business-site read (Firecrawl on-box)	🟢	qualification signal
Enrich	Contact find + verify	🟡	don't pay for dead contacts
Monitor	New listings · DA approvals · new-ABN	🟢	event triggers (living system)
Agent	MCP: any actor as an on-demand tool	—	per-lead 360° research
Build	Crawlee custom AU surfaces	🟢	proprietary data = IP

Start here: P0 (MCP) → P1 (Maps) → P2 (RE triggers) → P3 (intent) — net-new sourcing + the living system + the agent's hands, all in the 🟢 zone, before touching a single authenticated surface. Two force-multipliers: Apify MCP (45k actors as agent tools) and Crawlee (proprietary AU pipelines).

§8 · Open numbers to close

What makes this decision-grade

RPP_SEAT — actual RP Data portal seat price (biggest swing).
API_SEAT — actual API per-seat entitlement.
RP Data portal ToU clause wording on automated access → resolves CLEAN-vs-STEALTH on real text.
State land-registry codes on owner-data-for-marketing → counsel line for Model 3 outreach.
Headcount horizon (5 / 50 / 500) → sets where on-prem step-cost crosses SaaS linear-cost.