In recent years, point cloud data has become increasingly popular across various aspects of life, enhancing entertainment through Virtual Reality and Augmented Reality, improving transportation with Autonomous Driving, and advancing healthcare by enabling accurate visualization of the human body for better understanding and diagnosis. The 3D multimedia landscape is set to replace conventional 2D multimedia, although at the cost of generating an enormous volume of data. Point cloud data, in particular, presents significant challenges for compression due to its sparsity, irregularity and high-dimensional structure. The majority of 3D space remains unoccupied, with typically less than 2% of the volume containing a point. Additionally, unlike 2D images, where pixels are uniformly sampled over a structured grid, point clouds consist of irregularly distributed samples. This irregularity complicates the application of traditional signal processing techniques, which rely on structured spatial grid. Beyond these structural challenges, the large spatial volume of point clouds results in substantial data rates. At the same per-dimension resolution as Full HD 2D video, an uncompressed 3D point cloud can generate 36 Gbit/s of data — 31 times higher than the data rate of an uncompressed 2D video. Processing such massive datasets demands significant computational resources, further emphasizing the need for efficient compression techniques. Therefore, the development of advanced point cloud compression algorithms is critical to enabling effective transmission, streaming, and storage of 3D point cloud content. This thesis investigates compression techniques for both point cloud geometry and attributes, with a particular emphasis on human body data. Initially, a novel lossless compression framework is proposed, where point cloud geometry and attributes are jointly encoded to achieve superior compression efficiency. The framework leverages innovative sparse representations to effectively address the inherent sparsity of point clouds while maintaining a low computational cost. By employing advanced sparse convolutional operations, the method captures complex dependencies among points, enabling the construction of a highly accurate probability model tailored for a state-of-the-art lossless codec. Experimental results indicate that the proposed method achieves a significant bitrate reduction of up to 37% compared to state-of-the-art codecs. Lossy attribute coding is also investigated in this thesis. In paticular, scalable lossy-to-lossless compression framework that leverages multi-scale context modeling is developed. This approach enables the efficient decoding of multiple quality levels from a single bitstream while maintaining a low complexity. Additionally, a dynamic point cloud coding framework is proposed to enhance lossy attribute compression by leveraging temporal relationships between point cloud frames. In the context of human body point clouds, certain regions (e.g., the face and hands) possess richer and more important information than others; however, region-of-interest (ROI)-based coding for point cloud data remains underexplored. To address this gap, an end-to-end ROI-based lossy attribute compression method for human body point clouds is developed. This method demonstrates superior performance, achieving an average bitrate reduction of 19% compared to state-of-the-art compression techniques, while operating at one-fourth the running time and enabling ROI-based compression. To enhance the end-user experience, a method combining compression and visualization is introduced. Point clouds are encoded progressively, allowing the reconstruction of human body point clouds at various quality levels for streaming or real-time visualization. Collectively, these novel contributions substantially advance the field of point cloud coding, with a particular impact on human body point cloud compression.
Dat Thanh Nguyen (Thu,) studied this question.