Los puntos clave no están disponibles para este artículo en este momento.
We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a for distributed reinforcement learning in resource-intensive simulated. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever stale), it conceptually simple and easy to implement. In our experiments on virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear -- achieving a speedup of 107x on 128 GPUs over a serial. We leverage this scaling to train an agent for 2. 5 Billion of experience (the equivalent of 80 years of human experience) -- over 6 of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Navigation Challenge 2019, but essentially solves the task--near-perfect autonomous navigation in an unseen environment without access to map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, vs computation exhibits a power-law-like distribution; thus, 90% of peak is obtained relatively early (at 100 million steps) and relatively (under 1 day with 8 GPUs). Finally, we show that the scene and navigation policies learned can be transferred to other tasks -- the analog of ImageNet pre-training + task-specific-tuning for embodied AI. Our model outperforms ImageNet pre-trained CNNs on transfer tasks and can serve as a universal resource (all models and code publicly available).
Wijmans et al. (Fri,) studied this question.