What question did this study set out to answer?

The aim is to address the structural gap in geometric foundations of model-based deep reinforcement learning systems.

April 1, 2026Open Access

Riemannian World Models: Deep Reinforcement Learning on Data-Driven Differentiable Manifolds

Key Points

The aim is to address the structural gap in geometric foundations of model-based deep reinforcement learning systems.
Identified two distinct Riemannian manifolds in DRL: policy parameter space and environmental state space.
Formulated Riemannian World Models (RWM) to handle the geometry of state spaces.
Conducted a gap analysis on previous methods that treated environmental manifolds as Euclidean.
Developed implementations for geometric corrections using PyTorch and automatic differentiation.
Identified four geometric errors in existing World Models frameworks.
Presented geometrically correct methods to replace errors identified in the transition and reward models.
Outlined potential extensions of the framework to more complex manifolds like Finsler.

Abstract

This paper identifies and formalizes a fundamental structural gap in the geometric foundations of model-based Deep Reinforcement Learning. Every DRL system simultaneously involves two distinct Riemannian manifolds: the policy parameter space (equipped with the Fisher information metric) and the environmental state space (equipped with the pullback metric derived from a VAE decoder). All prior geometric DRL research — from Natural Policy Gradient through TRPO and PPO — has correctly addressed only the first manifold, while every existing World Models architecture (DreamerV3, DreamerV2, PlaNet, and related methods) treats the second manifold as Euclidean. We present Riemannian World Models (RWM), the first framework to handle the geometry of the environmental state space manifold in a World Models setting. We identify four geometric errors committed by Euclidean World Models on curved state spaces constructed via the Geometric Intelligence (GI) Theory VAE pipeline: (1) Euclidean vector addition in the transition model, (2) Euclidean policy gradient instead of the Riemannian gradient, (3) trivial parallel transport instead of Levi-Civita parallel transport for temporal credit assignment, and (4) Euclidean distance reward instead of geodesic distance. We provide geometrically correct replacements for each error with PyTorch-compatible implementations via automatic differentiation. A four-layer gap analysis explains why this two-manifold structure has not been identified in prior literature. We further outline a research roadmap extending the framework to Finsler and time-varying Finsler manifolds, motivated by industrial applications in aviation routing, wind energy control, maritime logistics, and supply chain optimization.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Etale Cohomology (Mon,) studied this question.

synapsesocial.com/papers/69ccb7c216edfba7beb89e64 https://doi.org/https://doi.org/10.5281/zenodo.19337472

Bookmark

View Full Paper