This article reviews the development of Natural Language Processing (NLP) models from early statistical approaches to modern large language models (LLMs). Beginning with probability-based n-gram models, it outlines their limitations in data sparsity and long-term dependency. It then introduces neural network-based models, including word embeddings, logistic regression, multi-layer perceptrons, and Word2Vec, followed by sequential models such as RNNs and LSTMs that capture temporal dependencies. The shift to pre-trained models, marked by Word2Vec and the Transformer architecture, enabled scalable transfer learning and laid the foundation for state-of-the-art models like BERT, GPT, and their derivatives. Applications in text classification are illustrated through experiments with RoBERTa, hybrid BERT-LightGBM models, and fine-tuning techniques such as LoRA. Finally, the article discusses the broader ecosystem of large models, including chat models, multimodal models, and agent frameworks that integrate planning, memory, and tool use. The review emphasizes both theoretical principles and practical workflows, highlighting the necessity of iterative learning, coding practice, and ecosystem familiarity for effectively leveraging NLP technologies.
Keming Zhang (Mon,) studied this question.