Abstract Today’s construction sites present a logistical challenge because bulk materials are often delivered and stored in such a way that it is difficult to determine their quantity, quality, and sometimes even their exact location. An illustrative example is a pile of sand or stone delivered to a construction site: its quantity (in terms of weight or volume) is initially known. However, the material used cannot usually be perfectly tracked with a specific sensor. It is therefore a challenge to determine the amount of bulk material left, which poses a problem for logistical delivery, site management, and construction operations. Despite the development of 3D shape and volume reconstruction techniques, the acquisition of expensive equipment such as 3D laser scanners and skilled operators is usually a prerequisite for determining the volume of an object based on scans of the object itself. As a result, this work focuses on implementing and evaluating a deep learning-based approach for the estimation of bulk material volume using an RGB camera. Although bulk material volume estimation has been addressed by photogrammetry and LiDAR-based workflows, these often rely on specialized software or capture protocols that are not always adopted in daily site operations. This work therefore investigates the feasibility of a proof-of-concept pipeline that combines SAM-based segmentation with multi-view vision transformer architecture, using only RGB images, and compares its performance against laser scanner reference data on synthetic and real sand piles.
Picchi et al. (Mon,) studied this question.