Key points are not available for this paper at this time.
Legacy software systems represent a critical component of many organizations' technology stacks. However, they often lack documentation and comprehensive testing, making them susceptible to drift as the software environment changes over time. This thesis investigates the efficacy of supervised and unsupervised machine learning algorithms for identifying and quantifying drift in legacy software systems.In this study, a thorough review of existing drift detection approaches is conducted, highlighting their strengths and limitations. Subsequently, supervised machine learning algorithms, such as Support Vector Machines and Random Forests, are employed to construct predictive models based on historical data. These models aim to proactively detect drift by learning from labeled instances of normal and anomalous system behaviour. Additionally, unsupervised machine learning techniques, including Principal Component Analysis, and clustering algorithms, are explored for their ability to discern patterns in software system data without the need for labeled samples. Experimental evaluations are conducted using real-world legacy software systems, demonstrating the comparative performance of supervised and unsupervised machine learning approaches in drift detection.The findings contribute to the growing body of knowledge in software maintenance and evolution, offering practical guidance for organizations seeking to enhance their legacy software management strategies using machine learning techniques.
Hadge et al. (Fri,) studied this question.