Cell-penetrating peptides (CPPs) are short amino-acid chains capable of crossing cellular membranes to deliver intracellular cargoes such as drugs, proteins, nucleic acids, and nanoparticles, making them widely studied for drug delivery, gene therapy, and medical imaging 10. Discovery of new CPPs is time- and resource-intensive due to the need for in-vitro synthesis and experimental testing, motivating computational CPP classification models 14. Increasing model complexity has reduced interpretability, and it remains unclear how much predictive capability remains when sequence order is ignored. Focusing on amino-acid physicochemical properties enables evaluation of the class-separating signal encoded in global composition rather than residue order. Here, we evaluated CPP prediction using aggregated physicochemical descriptors from the Expasy Server 15, summarized per peptide via statistics including means, minima/maxima, sums, and standard deviations across secondary-structure propensities, solvent accessibility, side-chain volume, conformational flexibility, charge, polarity, hydrophobicity, transmembrane propensity, and residue mass. A non-overlapping training set of 5,102 unique peptides (1,469 CPPs; 3,633 non-CPPs) compiled from published studies was balanced by oversampling CPPs and normalized by z-score parameters derived exclusively from training data. Final models were evaluated on an independent benchmark of 347 experimentally validated peptides (187 CPPs; 160 non-CPPs), unused during training or tuning. Tree-ensemble models were the most effective composition-only learners, with XGBoost achieving AUC 0.845 and MCC 0.521 5, and the stacked ensemble (GB + Ada + RF) reaching AUC 0.849 and MCC 0.532. Error profiles indicate that CPPs occupy a broad, multimodal physicochemical landscape shaped by charge-hydrophobicity balance 10. The findings demonstrate that physicochemical composition alone carries strong, nonlinear signal for membrane-penetration predisposition, but does not fully explain experimentally verified CPP status.
Raagav Bala (Sun,) studied this question.