The growth of IoT devices in shared environments has outpaced our ability to identify them, posing urgent risks to privacy, safety, and accountability. This challenge is especially pronounced in open-world environments, where network traffic metadata is often sparse, noisy, or adversarial. To address this problem, we introduce a semantic inference pipeline that reframes device identification as a language modeling task over real-world network metadata. As this approach depends on reliable supervision, we first construct high-fidelity vendor labels for the IoT Inspector dataset—the largest real-world corpus of its kind—using an ensemble of large language models guided by mutual-information and entropy-based stability scores. We then instruction-tune a quantized LLaMA 3.1 8B model on this dataset using curriculum learning to support generalization under sparsity and long-tail vendor distributions. Our model achieves 98.69% top-1 and 90.73% macro accuracy across 2,015 vendors, while remaining robust to missing fields, protocol drift, and adversarial manipulation. We also evaluate the model on an independent IoT testbed dataset, assess explanation quality, and conduct adversarial tests to probe robustness under spoofed and obfuscated input. These results position instruction-tuned LLMs as a scalable, interpretable foundation for trustworthy device identification at scale.
Mahmood et al. (Mon,) studied this question.