ABSTRACT Background: The majority of stroke rehabilitation occurs in the community without objective daily monitoring. Patients and families lack accessible tools to identify recovery plateaus or deterioration between clinical appointments. Methods: A synthetic dataset of 2,000 records (16 clinical features, 3 outcome classes) was generated using published stroke rehabilitation variable distributions. Three supervised classifiers — Logistic Regression, Random Forest, and LightGBM — were trained on a stratified 80/20 partition. Model selection was automated by macro-average ROC-AUC. A rule-based recommendation engine was layered above the best classifier. The system was deployed as a Streamlit web application. Results: LightGBM achieved the highest performance (accuracy 92.1%, ROC-AUC 0.991), followed by Random Forest (91.3%, 0.982) and Logistic Regression (90.2%, 0.970). Feature importance analysis identified days post-stroke, mobility score, and exercise completion as the three most influential variables. The deployed application is publicly accessible at https://stroketracker.streamlit.app/. Conclusion: A three-classifier ensemble approach achieves robust classification performance for daily stroke recovery monitoring using self-reported variables. The system addresses a documented gap in community rehabilitation monitoring. Future clinical validation on real patient data under ethical approval is required before translation.
Samuel Tobi Oluwakoya (Sat,) studied this question.