Attacks on software supply chains are on the rise, and attackers are becoming increasingly creative in how they inject malicious code into software components. This paper is the first to investigate Python cache poisoning, which manipulates bytecode cache files to execute malicious code without altering the human-readable source code. We demonstrate a proof of concept, showing that an attacker can inject malicious bytecode into a cache file without failing the Python interpreter's integrity checks. In a large-scale analysis of the Python Package Index, we find that about 12,500 packages are distributed with cache files. Through manual investigation of cache files that cannot be reproduced automatically from the corresponding source files, we identify classes of reasons for irreproducibility to locate malicious cache files. While we did not identify any malware leveraging this attack vector, we demonstrate that several widespread package managers are vulnerable to such attacks.
Ohm et al. (Thu,) studied this question.