What question did this study set out to answer?

The aim is to develop a neural network framework that efficiently generates realistic urban street scenes while ensuring spatial and semantic consistency.

March 21, 2026Open Access

A Recursive Variational Neural Network Framework for Dynamic Generation of Urban Street Scenes

Key Points

The aim is to develop a neural network framework that efficiently generates realistic urban street scenes while ensuring spatial and semantic consistency.
Introduced FocusRvNN, a recursive variational neural network framework.
Implemented an attention-driven mechanism for clustering geospatial substructures.
Utilized graph convolutional network embeddings to enhance spatial relations.
Adopted a coarse-to-fine assembly approach for scene generation.
Integrated with Unreal Engine 5 for real-time visualization.
Achieved an mIoU of 82.89% for layout consistency on the CarlaSC dataset.
Improved accuracy in substructure extraction by 16.1% and spatial relation precision by 13.3%.
Reduced semantic inconsistency by 15% and improved scene completeness by 15%.
Generated complete urban street scenes in approximately 15 seconds.

Abstract

• FocusRvNN enables AI-driven generation of large-scale 3D urban digital twins for sustainable planning, addressing data scarcity and computational costs while improving fidelity and diversity in complex urban scenes. • Attention-based subgraph clustering boosts substructure extraction accuracy by 16.1% in unbounded urban settings, dynamically identifying high-frequency patterns and suppressing noise for reliable semantic anchors. • GCN conditional embeddings enhance spatial relation precision by 13.3% for controllable geospatial modeling, enabling diversified generation and style transfer through graph-based representations. • Multi-level splicing optimization ensures physical consistency, reduces semantic inconsistency by 15%, and raises scene completeness by 15% via coarse-to-fine hierarchical assembly. • UE5 integration achieves faster real-time VR visualization for immersive health assessments, supporting infinite detail geometry and dynamic global illumination in large-scale interactions. Generating large-scale 3D urban street scenes from structured data is a key challenge in geospatial computing and urban simulation. Existing generative approaches often struggle to reuse semantically coherent local structures, to encode relational constraints beyond isolated objects, and to organize unbounded outdoor layouts in a controllable and physically consistent manner, which limits their applicability to complex road networks and heterogeneous roadside infrastructure. To address these challenges at the algorithmic level, this paper proposes FocusRvNN, a focus-driven recursive variational neural network framework based on variational autoencoders (VAEs) and scene graphs. The framework introduces a focus-driven attention mechanism to identify and cluster high-frequency geospatial substructures as reusable semantic building blocks, employs graph convolutional network (GCN) embeddings to encode inter-substructure spatial relations as conditional generation cues, and adopts a coarse-to-fine hierarchical assembly strategy to progressively compose large-scale layouts while enforcing physical and semantic consistency. The proposed framework is evaluated on the CarlaSC dataset, where it achieves an mIoU of 82.89% for layout consistency and generates a complete urban street scene in approximately 15 seconds under the tested hardware configuration. The generation pipeline is further integrated with Unreal Engine 5 to support interactive visualization and inspection, demonstrating its applicability primarily in simulated environments (evaluated on the synthetic CarlaSC dataset with supplementary semantic validation on the real-world Cityscapes dataset) to simulation-oriented workflows for urban planning studies and virtual environment design.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Fang et al. (Sun,) studied this question.

synapsesocial.com/papers/69be37406e48c4981c676cb8 https://doi.org/https://doi.org/10.1016/j.rineng.2026.110119

Bookmark

View Full Paper