Building Information Modelling has become a common paradigm in the construction industry. To bridge the gap between end users and BIM data, some studies have adopted Natural Language Processing (NLP) in the BIM applications. Due to the incorrect segmentation of users’ natural language, most NLP-based BIM applications usually provide users with redundant or inaccurate BIM data. Sequence labeling has been widely studied in the area of NLP to find correct segments of a natural language sequence. However, the existing sequence labeling schemes perform poorly for specific BIM models. To address this issue, this study proposed a BIM model of an adaptive natural-language Sequence Labeling scheme using Machine learning, termed BIM-SeL. We first presented the problem definition of sequence labeling and the overall framework of the BIM-SeL. The BIM-SeL employs Conditional Random Field (CRF) to model the sequence labeling problem and Machine learning to train a sequence labeling model using a corpus of millions of data from the news and web domains. Then, a BIM dictionary extraction algorithm is developed to collect the exclusive vocabularies from the BIM models. A BIM dictionary-enhanced sequence labeling scheme is proposed to achieve the BIM model adaptive sequence labeling, by jointly utilizing the trained sequence labeling model and the BIM dictionary. To further enhance contextual representation and compare with state-of-the-art deep learning methods, we extend BIM-SeL with an advanced BERT*-BiLSTM-CRF model under the same framework. The effectiveness of the BIM-SeL was verified through two real-world projects, the BUCEA Library and a water pump house. The experiment results showed that the sequence accuracies of BIM-SeL in the BUCEA Library and the water pump house projects achieved 92.61% and 93.41%, respectively, and the vocabulary accuracies reach 96.77% and 97.32%, respectively. Compared with the original CRF-based sequence labeling algorithm, the BIM-SeL improved the sequence accuracies by 7.05 and 18.50 times, and the vocabulary accuracies by 1.33 and 2.48 times, in the two projects. Meanwhile, the BERT-BiLSTM-CRF variant obtains up to 99.93% vocabulary accuracy on real BIM test sequences, further validating the generality and advancement of the proposed framework. These observations proved that the BIM-SeL contributed to the natural language understanding of BIM applications using BIM data and could bridge the gap between users and BIM data.
Qiu et al. (Mon,) studied this question.