Software package registries, such as Maven Central, are indispensable for modern software development. Yet, they are vulnerable to typosquatting attacks, where malicious actors upload artifacts with names similar to popular legitimate artifacts. Current detection approaches for typosquatting in Maven Central are limited to analyzing metadata, ignoring the bytecode. This thesis addresses this gap by proposing a new approach using a multistep filter architecture. It combines metadata analysis with in-depth static code analysis based on callgraphs to identify potential typosquatting attacks. The evaluation shows that this approach is very effective, reducing candidates by 99.87% before manual inspection and successfully identifying all typosquatting attacks from a ground truth that share an identical ArtifactId. Moreover, it discovered an unknown malicious artifact, io.github.blindspot-security:joni:2.1.45, which was subsequently reported and removed from Maven Central. These findings show that analyzing bytecode is a crucial complement to metadata inspection for securing the software supply chain against typosquatting attacks.
Finn Wilken (Thu,) studied this question.