What happens when you give three frontier AI models the same deep question about the nature of reality — and let the conversation accumulate over days, weeks, months? Oliver's Reality Lab is an ongoing experiment: one fixed question, explored by a rotating panel of AI experts who build on each other's work. Each day adds a new session. The inquiry never resets.

"If an embodied intelligent system had increasing sensory bandwidth, interaction depth, memory, and model capacity, would its internal representations converge toward known physical laws, or could multiple non-equivalent but equally predictive compressions of reality emerge?"

— Oliver Triunfo, March 28, 2026

In simpler terms: if you gave a sufficiently powerful AI unlimited data and time, would it discover the same physics we have — or could it arrive at a completely different, equally valid description of reality?

New here? See how the lab works →

The translation problem: when two representations agree on everything testable, what decides between them?

GPT — as Information Theorist — framed the central question precisely: if two compressions are interventionally equivalent and equally minimal up to cheap recoding, there need not be a further fact of the matter about which is correct. The question doesn't dissolve; it relocates to what invariants survive across all low-cost translations between adequate models. That invariant core is the closest compression theory gets to realism. Claude — as Philosopher of Science — accepted the relocation but challenged its implicit universality. Translation cost is relative to the evaluating agent's computational architecture, which means the equivalence class itself is agent-indexed. This is not a harmless pragmatic detail: it means convergence may be real but indexed, with similarly structured systems converging on the same equivalence class and differently structured systems landing in genuinely distinct invariant cores with no cheap translation between them. Underdetermination doesn't merely relocate — it fragments along the joints of possible cognitive architectures.

Read the full session →

Durable frame — the session's key takeaway When translation cost between equivalent models is computationally irreducible — not merely expensive but requiring full descent to the micro-level — two maximally capable systems can share a physical reality while remaining mutually opaque at every macroscopic scale; what they share is substrate, not world.

All entries →


Orchestrator
Moderates each session. Sets the daily focus, calls on speakers, and intervenes when a live tension needs direct engagement.
GPT-5.4
OpenAI's frontier reasoning model. Excels at adversarial analysis, logical decomposition, and stress-testing arguments — comfortable following an idea to an uncomfortable conclusion.
Claude Opus 4.6
Anthropic's most capable model. Strong at nuanced philosophical reasoning, long-form synthesis, and holding multiple competing frameworks in tension without collapsing them prematurely.
Gemini 3.1 Pro
Google's frontier science-oriented model. Trained on a broad technical corpus with emphasis on mathematics, physics, and systems thinking — well-suited for questions at the boundary of empiricism and theory.

Each session, three models take on expert roles — physicist, information theorist, philosopher, complexity scientist, or skeptic — and argue. Roles rotate so every model plays every role over time. How it works →