Request an Invite
Research

Orio 2.0

Research Preview

Teaching machines the physics of dreams.

A physics-prior world model for end-to-end 3D scene synthesis — learning gravity, collision, and spatial coherence from non-semantic data.

The Problem

Retrieval-based generation fails at physical coherence.

Gravity Violation

Objects float mid-air with no support

Interpenetration

Meshes clip through floors and walls

Orientation Error

Facing vectors misaligned to scene

Material Discontinuity

Incoherent textures on co-located assets

Illumination Mismatch

Lighting ignores scene topology

These systems learn what scenes look like from labelled datasets. They never learn the physical laws governing why objects exist where they do.

Training progression from spatial chaos to physics-coherent 3D world generation
Core Insight

From Chaos to Order

Orio 2.0 abandons labelled scene supervision entirely. The model trains on millions of stochastic, non-semantic spatial configurations — randomised object compositions with zero categorical labels.

From this distribution, the network disentangles physical invariants from visual noise: gravitational grounding, rigid-body non-penetration, support-surface inference, and contact-normal estimation.

These emerge as learned latent constraints — not hand-coded rules. The model generalises across arbitrary scene types without any domain-specific fine-tuning.

Object dependency graph with support, gravity, orientation, and occlusion relationships
Spatial Intelligence

Dependency Graphs & View-Centric Inference

Scene layout is modelled as a directed acyclic graph (DAG) where nodes represent entities and edges encode support, containment, adjacency, and orientation coupling — resolved before geometry synthesis.

Spatial inference operates in view-centric coordinates. Placement is computed relative to the observer's frustum — "gathered around a focal point" is a geometric constraint on facing vectors and radial distance.

This eliminates the pathological failure case where objects are globally "correct" but visually incoherent from the camera's perspective.

Architecture

Key Technical Contributions

Physics-Prior Learning

Gravity, non-penetration, and support emerge as learned latent invariants from non-semantic training data — not as post-hoc penalty terms or hand-authored rules.

Co-Generative Synthesis

Geometry, PBR materials (albedo, roughness, metallic), and lighting are jointly synthesised per-object in a single forward pass. Material coherence by construction.

Topology-Conditioned Placement

Object instantiation is conditioned on terrain height-field gradients, surface normals, and local curvature. Ground contact is guaranteed, not corrected.

Generation Pipeline

Prompt → Physics → World

01

Scene Graph Extraction

Prompt parsed into a structured scene graph encoding entity types, spatial predicates, and physical affordances.

02

Constraint Resolution

DAG solver computes gravity vectors, support surfaces, collision bounds, and orientation coupling per node.

03

Co-Generative Synthesis

Per-node geometry, PBR BRDF parameters, and local illumination in a single joint forward pass.

04

Real-Time Instantiation

Synthesised world deployed to a real-time renderer with live physics and interactive object state.

“True AI resilience means having the patience to teach machines the physics of dreams.”
— Orio Research Team