ABSTRACT Open‐source licenses grant developers significant flexibility to use, modify, and distribute code, but they also introduce obligations that may pose legal or compliance risks. Among these, copyright‐related terms are especially critical. Ensuring compliance with such terms is challenging due to the widespread use of third‐party libraries and the diversity of license types. While license‐related issues have attracted research interest, most prior work targets a narrow set of licenses and often overlooks copyright terms. To bridge this gap, we propose an automated approach for detecting copyright term violations in open‐source projects that incorporate third‐party libraries, covering a wide range of Open Source Initiative‐approved licenses. Our method extracts licenses from both projects and their dependencies, uses large language models to interpret key copyright terms, and identifies potential violations. We applied our approach to 500 popular Python projects on GitHub and found that about 10% exhibited at least one violation. These results highlight the complexity of third‐party license management and the need for improved compliance tools. By analyzing tens of thousands of licenses, we also uncovered common patterns in license usage. To foster further research and promote proactive compliance, we release our tool and dataset to the community.
Yang et al. (Mon,) studied this question.