This paper seriously investigates the two typical datasets in CAD sequence learning. The DeepCAD (with 178k samples) paper indicates that the limited size of the Fusion 360 dataset (less than 10k samples) is insufficient for training a well-generalized model. Thus, almost all of the top conference and journal papers adopt DeepCAD dataset for training. However, this investigation reveals that, although there is a huge gap in data size between Fusion 360 Reconstruction dataset and DeepCAD dataset, they have almost the same ability in CAD sequence learning. We devise reasonable experiments and a data augmentation method to demonstrate that Fusion 360 Reconstruction dataset and DeepCAD dataset are essentially indistinguishable, exhibiting equivalent capabilities in CAD sequence learning for simple sketch and extrusion commands. Therefore, to advance the development of CAD sequence learning, we need more complex and advanced CAD datasets, which is a more challenging task for our community in the future.
Wan et al. (Thu,) studied this question.