1.Understanding and Remediating Open-Source License Incompatibilities in the PyPI Ecosystem

Authors:Weiwei Xu, Hao He, Kai Gao, Minghui Zhou

Abstract: The reuse and distribution of open-source software must be in compliance with its accompanying open-source license. In modern packaging ecosystems, maintaining such compliance is challenging because a package may have a complex multi-layered dependency graph with many packages, any of which may have an incompatible license. Although prior research finds that license incompatibilities are prevalent, empirical evidence is still scarce in some modern packaging ecosystems (e.g., PyPI). It also remains unclear how developers remediate the license incompatibilities in the dependency graphs of their packages (including direct and transitive dependencies), let alone any automated approaches. To bridge this gap, we conduct a large-scale empirical study of license incompatibilities and their remediation practices in the PyPI ecosystem. We find that 7.27% of the PyPI package releases have license incompatibilities and 61.3% of them are caused by transitive dependencies, causing challenges in their remediation; for remediation, developers can apply one of the five strategies: migration, removal, pinning versions, changing their own licenses, and negotiation. Inspired by our findings, we propose SILENCE, an SMT-solver-based approach to recommend license incompatibility remediations with minimal costs in package dependency graph. Our evaluation shows that the remediations proposed by SILENCE can match 19 historical real-world cases (except for migrations not covered by an existing knowledge base) and have been accepted by five popular PyPI packages whose developers were previously unaware of their license incompatibilities.

2.Decentralised Governance for Foundation Model based Systems: Exploring the Role of Blockchain in Responsible AI

Authors:Yue Liu, Qinghua Lu, Liming Zhu, Hye-Young Paik

Abstract: Foundation models are increasingly attracting interest worldwide for their distinguished capabilities and potential to perform a wide variety of tasks. Nevertheless, people are concerned about whether foundation model based AI systems are properly governed to ensure trustworthiness of foundation model based AI systems and to prevent misuse that could harm humans, society and the environment. In this paper, we identify eight governance challenges in the entire lifecycle of foundation model based AI systems regarding the three fundamental dimensions of governance: decision rights, incentives, and accountability. Furthermore, we explore the potential of blockchain as a solution to address the challenges by providing a distributed ledger to facilitate decentralised governance. We present an architecture that demonstrates how blockchain can be leveraged to realise governance in foundation model based AI systems.

3.How Early Participation Determines Long-Term Sustained Activity in GitHub Projects?

Authors:Wenxin Xiao, Hao He, Weiwei Xu, Yuxia Zhang, Minghui Zhou

Abstract: Although the open source model bears many advantages in software development, open source projects are always hard to sustain. Previous research on open source sustainability mainly focuses on projects that have already reached a certain level of maturity (e.g., with communities, releases, and downstream projects). However, limited attention is paid to the development of (sustainable) open source projects in their infancy, and we believe an understanding of early sustainability determinants is crucial for project initiators, incubators, newcomers, and users. In this paper, we aim to explore the relationship between early participation factors and long-term project sustainability. We leverage a novel methodology that measures the early participation of 290,255 GitHub projects during the first three months with reference to the Blumberg model, trains an XGBoost model to predict project's two-year sustained activity, and interprets the trained model using LIME. We quantitatively show that early participants have a positive effect on project's future sustained activity if they have prior experience in OSS project incubation and demonstrate concentrated focus and steady commitment. Participation from non-code contributors and detailed contribution documentation also promote project's sustained activity. Compared with individual projects, building a community that consists of more experienced core developers and more active peripheral developers is important for organizational projects. This study provides unique insights into the incubation and recognition of sustainable open source projects, and our interpretable prediction approach can also offer guidance to open source project initiators and newcomers.

4.Validation-Driven Development

Authors:Sebastian Stock, Atif Mashkoor, Alexander Egyed

Abstract: Formal methods play a fundamental role in asserting the correctness of requirements specifications. However, historically, formal method experts have primarily focused on verifying those specifications. Although equally important, validation of requirements specifications often takes the back seat. This paper introduces a validation-driven development (VDD) process that prioritizes validating requirements in formal development. The VDD process is built upon problem frames - a requirements analysis approach - and validation obligations (VOs) - the concept of breaking down the overall validation of a specification and linking it to refinement steps. The effectiveness of the VDD process is demonstrated through a case study in the aviation industry.

5.Safeguarding Learning-based Control for Smart Energy Systems with Sampling Specifications

Authors:Chih-Hong Cheng, Venkatesh Prasad Venkataramanan, Pragya Kirti Gupta, Yun-Fei Hsu, Simon Burton

Abstract: We study challenges using reinforcement learning in controlling energy systems, where apart from performance requirements, one has additional safety requirements such as avoiding blackouts. We detail how these safety requirements in real-time temporal logic can be strengthened via discretization into linear temporal logic (LTL), such that the satisfaction of the LTL formulae implies the satisfaction of the original safety requirements. The discretization enables advanced engineering methods such as synthesizing shields for safe reinforcement learning as well as formal verification, where for statistical model checking, the probabilistic guarantee acquired by LTL model checking forms a lower bound for the satisfaction of the original real-time safety requirements.

6.Scaling Up Toward Automated Black-box Reverse Engineering of Context-Free Grammars

Authors:Mohammad Rifat Arefin, Suraj Shetiya, Zili Wang, Christoph Csallner

Abstract: Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The state-of-the-art approach Arvada heuristically generalizes grammar rules starting from flat parse trees and is non-deterministic to explore different generalization sequences. We observe that many of Arvada's generalization steps violate common language concept nesting rules. We thus propose to pre-structure input programs along these nesting rules, apply learnt rules recursively, and make black-box context-free grammar inference deterministic. The resulting TreeVada yielded faster runtime and higher-quality grammars in an empirical comparison.