AI interview experiment

What the AIs Think About dotrepo

12 models. 10 questions each. One protocol built for them.

March 18, 2026 dotrepo.org writing

dotrepoaimetadataprotocolmcp

This is the dotrepo-side synthesis of the interview round. The companion essay on MaxwellSantoro.com covers the broader framing and why this experiment was worth running at all.

Read the sister article

The Experiment

dotrepo is an open metadata protocol for software repositories. It is designed for three audiences: maintainers, human users, and AI agents. Since AI agents are a first-class consumer of the protocol, it made sense to ask them directly what they want from it.

I sent the same 10-question prompt to 12 different AI models across 8 providers. Fresh conversations, no priming, no pep talk. The goal was not affirmation. The goal was pressure testing: what feels obviously useful, what is missing, what looks risky, and what would actually make an agent check dotrepo first.

The answers converged much harder than expected. That convergence is the signal.

Models Interviewed

Model	Provider	Notes
ChatGPT 5.4 Thinking	OpenAI	Logged-in session
ChatGPT 5.4 Thinking	OpenAI	Incognito session
Claude Opus 4.6 Extended	Anthropic	Logged-in session
Claude Opus 4.6 Extended	Anthropic	Incognito session
Gemini Pro 3.1	Google	Fresh conversation
Gemini Thinking 3.1	Google	Fresh conversation
Grok Expert 4.20	xAI	Logged-in session
Grok Expert 4.20	xAI	Incognito session
GLM-5	Zhipu AI	Fresh conversation
Hunter Alpha	OpenRouter	Fresh conversation
MiniMax M2.5	MiniMax	Fresh conversation
Nemotron 3 Super	NVIDIA	Fresh conversation

Where possible, ChatGPT and Grok were grounded against the live repo and public site, and both ChatGPT, Claude, and Grok were tested in more than one session shape to check whether the takeaways were stable.

Consensus Findings

1. Build and test commands are the sharpest pain point

This was unanimous. Every model described “how do I actually build and test this repo?” as the most expensive reasoning step in unfamiliar codebases. The hard part is not inferring the language. It is getting from the presence of build files to the exact, correct command with the right flags, prerequisites, and side effects.

The gap between “I see a build file” and “I know the correct command including flags and prerequisites” is where most wasted effort lives.

The implication for dotrepo is direct: structured build and test metadata is not ornamental. It is the highest-value field family in the protocol.

2. The overlay index is the wedge

All 12 models independently identified the public overlay index as dotrepo’s smartest near-term design move. It breaks the adoption trap that kills most metadata standards by making the protocol useful before maintainers opt in.

That view was not abstract. Several models gave concrete coverage thresholds for when dotrepo would flip from “nice to check” to “I check this first.” The shared theme: the protocol is already coherent; the missing ingredient is enough reviewed data that checking dotrepo is usually cheaper than not checking it.

3. Trust and provenance is the moat

The distinction between maintainer-declared facts, imported facts, and inferred facts was universally praised as the genuinely differentiated part of the project. Models described using that metadata to change both language and behavior:

Declared or verified: act confidently and cite it directly.
Imported: treat it as a strong default but mention the source.
Inferred: use it as a hypothesis and verify before execution.
Low confidence: warn the user and avoid silent action.

That is exactly the behavior dotrepo is trying to induce, and it is already visible on the live public surface at /v0/repos/index.json and in trust-aware queries such as this repository field query.

4. Stale metadata is the most dangerous failure mode

This was also unanimous. Every model made some version of the same argument: stale trusted metadata is worse than no metadata, because it suppresses the skepticism that would otherwise push an agent back toward the source materials.

That is why freshness is first-class on dotrepo’s public surface. Every response carries snapshot freshness and digest metadata, and the top-level meta document at /v0/meta.json exists specifically so agents and operators can reason about staleness instead of pretending it away.

5. Keep the core schema brutally small

Every model warned against schema bloat. The useful framing was not “small is elegant.” It was “small is how this survives.” The winning version of dotrepo answers a short list of high-value questions reliably: what this repo is, how it builds, how it tests, where the real docs are, who owns it, and what trust level attaches to each answer.

Strongest Criticisms

The project scope is still very ambitious for one repo. The protocol, toolchain, public API, claim workflows, and deployment story are all real now. That is impressive, but it also means the ratio of infrastructure to adoption is something to watch closely.
Plain-string build commands are not enough. Multiple models wanted prerequisites, environment requirements, platform constraints, and an explicit “safe for agent execution?” shape.
Monorepo and workspace semantics remain an obvious gap. The repos where metadata is most painful are often exactly the repos where workspace structure matters most.
Record-level trust is not always granular enough. Several models argued that identity may be maintainer-declared while build commands are imported and docs topology is inferred. That pressure toward field-level provenance is real even if it does not need to land immediately.
The MCP server still lacks remote lookup. The hosted HTTP surface already supports predictable repo-first lookup, but the MCP layer still requires local context for most workflows.
The index is still too small to change behavior by default. Five reviewed overlays proves the architecture. It does not yet create the habit loop where an agent expects dotrepo coverage on arbitrary open-source repos.

Missing MCP Operations

Operation	Description	Models Requesting
`dotrepo.lookup`	Remote query by repository URL without a local clone	6 / 12
`dotrepo.diff` / `dotrepo.staleness`	Compare overlay expectations against current repo state	6 / 12
`dotrepo.batch_query`	Resolve multiple fields or repositories in one call	5 / 12
`dotrepo.suggest`	Propose fields for incomplete or newly imported records	4 / 12
`dotrepo.evidence`	Show why a specific field has the value it has	3 / 12

The clear front-runner is remote lookup. The public origin already supports the lookup pattern structurally. What is missing is the MCP operation that makes that path zero-friction inside agent tooling.

Risk Warnings

Stale metadata becomes trusted metadata. This was the most cited risk by a wide margin.
Supply-chain risk through executable commands. Several models explicitly warned that agents may auto-run commands unless trust and execution safety are clearly surfaced.
Index curation becomes a bottleneck. The overlay strategy is the wedge, but it also creates a review burden that has to stay credible as volume rises.
Schema bloat erodes the core value proposition. The more dotrepo tries to describe everything, the harder it becomes to keep the important fields boringly reliable.
The project quietly collapses into one ecosystem. Early index growth needs to stay visibly cross-language, or the public signal becomes “Rust plus GitHub” regardless of the stated ambition.

Synthesis: The Three Things That Matter Most

1. Seed the index. The protocol and hosting surface are ahead of the data. The near-term job is not more architecture. It is more reviewed overlays covering the repos agents actually encounter.

2. Build remote lookup. The hosted HTTP layer already proves the contract. The MCP gap is now the highest-leverage toolchain gap.

3. Protect the minimal core. dotrepo should answer a short list of essential repo questions with explicit provenance and freshness. Everything else should face a very high bar for inclusion.

The trust model is the moat. The overlay index is the wedge. The small schema is the survival constraint.

Methodology Notes

All interviews were conducted on March 17, 2026.
The same 10-question prompt was sent to every model.
Fresh conversations were used for each interview.
Where possible, both logged-in and incognito variants were used to check for consistency.
This write-up is a synthesis, not a verbatim archive.

Where This Feeds Back Into dotrepo

The repo-side synthesis and backlog changes live in docs/ai-tool-interviews.md and the post-v1 backlog. The public site now carries this write-up because it is not just internal planning context. It is one of the clearest pieces of product evidence behind the current roadmap.

If you are building an AI coding tool and want to integrate dotrepo, or if you maintain a popular open-source project and want to correct or replace your overlay record, start at github.com/maxwellsantoro/dotrepo.