Accurate industrial classification of firms forms the backbone of business surveys, economic policymaking, and international trade analysis. However, national statistics institutes (NSIs) worldwide grapple with the labor intensive manual assignment of International Standard Industrial Classification (ISIC) codes: a process prone to human error, inconsistent across regions, and particularly burdensome for developing economies. This study confronts these challenges by assessing performance of token-overlap (Jaccard), TF-IDF cosine similarity, edit-distance (fuzzy) and SBERT embeddings against human-coded ground truth in classifying firms. Using a dataset of 6588 firms, performance diverges sharply: SBERT attains Accuracy = 0.78 and Weighted F 1 = 0.78 (Cohen’s κ ≈ 0.75 ), while surface methods lag (Fuzzy: Accuracy 0.43; Cosine: 0.31; Jaccard: 0.26). Statistical tests confirms these differences (Cochran’s ( Q = 8320.81 ) with p 0.001 ) and inter-method agreement is only fair ( κ Fleiss ≈ 0.270 ), motivating a class-level diagnostic approach. Using confusion matrices and Haberman adjusted residuals we expose systematic off-diagonal confusions (notably between manufacturing, professional/service and certain retail/wholesale categories) and identify classes with strong, automatable diagonals versus sparse or ambiguous tails that require human coding.
Watambwa et al. (Tue,) studied this question.