1.Wanted: standards for automatic reproducibility of computational experiments

Authors:Samuel Grayson, Reed Milewicz, Joshua Teves, Daniel S. Katz, Darko Marinov

Abstract: Those seeking to reproduce a computational experiment often need to manually look at the code to see how to build necessary libraries, configure parameters, find data, and invoke the experiment; it is not automatic. Automatic reproducibility is a more stringent goal, but working towards it would benefit the community. This work discusses a machine-readable language for specifying how to execute a computational experiment. We invite interested stakeholders to discuss this language at https://github.com/charmoniumQ/execution-description .

2.Exploring Technical Debt in Security Questions on Stack Overflow

Authors:Joshua Aldrich Edbert, Sahrima Jannat Oishwee, Shubhashis Karmakar, Zadia Codabux, Roberto Verdecchia

Abstract: Background: Software security is crucial to ensure that the users are protected from undesirable consequences such as malware attacks which can result in loss of data and, subsequently, financial loss. Technical Debt (TD) is a metaphor incurred by suboptimal decisions resulting in long-term consequences such as increased defects and vulnerabilities if not managed. Although previous studies have studied the relationship between security and TD, examining their intersection in developers' discussion on Stack Overflow (SO) is still unexplored. Aims: This study investigates the characteristics of security-related TD questions on SO. More specifically, we explore the prevalence of TD in security-related queries, identify the security tags most prone to TD, and investigate which user groups are more aware of TD. Method: We mined 117,233 security-related questions on SO and used a deep-learning approach to identify 45,078 security-related TD questions. Subsequently, we conducted quantitative and qualitative analyses of the collected security-related TD questions, including sentiment analysis. Results: Our analysis revealed that 38% of the security questions on SO are security-related TD questions. The most recurrent tags among the security-related TD questions emerged as "security" and "encryption." The latter typically have a neutral sentiment, are lengthier, and are posed by users with higher reputation scores. Conclusions: Our findings reveal that developers implicitly discuss TD, suggesting developers have a potential knowledge gap regarding the TD metaphor in the security domain. Moreover, we identified the most common security topics mentioned in TD-related posts, providing valuable insights for developers and researchers to assist developers in prioritizing security concerns in order to minimize TD and enhance software security.

3.Software engineering to sustain a high-performance computing scientific application: QMCPACK

Authors:William F. Godoy, Steven E. Hahn, Michael M. Walsh, Philip W. Fackler, Jaron T. Krogel, Peter W. Doak, Paul R. C. Kent, Alfredo A. Correa, Ye Luo, Mark Dewing

Abstract: We provide an overview of the software engineering efforts and their impact in QMCPACK, a production-level ab-initio Quantum Monte Carlo open-source code targeting high-performance computing (HPC) systems. Aspects included are: (i) strategic expansion of continuous integration (CI) targeting CPUs, using GitHub Actions runners, and NVIDIA and AMD GPUs in pre-exascale systems, using self-hosted hardware; (ii) incremental reduction of memory leaks using sanitizers, (iii) incorporation of Docker containers for CI and reproducibility, and (iv) refactoring efforts to improve maintainability, testing coverage, and memory lifetime management. We quantify the value of these improvements by providing metrics to illustrate the shift towards a predictive, rather than reactive, sustainable maintenance approach. Our goal, in documenting the impact of these efforts on QMCPACK, is to contribute to the body of knowledge on the importance of research software engineering (RSE) for the sustainability of community HPC codes and scientific discovery at scale.

4.Towards a TDD maturity model through an anti-patterns framework

Authors:Matheus Marabesi, Francisco Jose Garcia-Penalvo, Alicia Garcia-Holgado

Abstract: Agile software development has been adopted in the industry to quickly react to business change. Since its inception both academia and industry debate the different shades that agile processes and technical practices play in the day-to-day of students and professional developers. Efforts have been made to understand the pros and cons of the Test Driven Development (TDD) practice to develop software as part of a professional environment. Despite the effort of practitioners to list the TDD anti-patterns that unveil undesired effects in the code when practicing TDD, work is needed to understand the causes that lead to that. In that sense, this paper proposes a research project that explores the TDD anti-patterns context and what leads practitioners to face them in the software development context. As a result, we expect to offer a TDD maturity framework to help practitioners in the process of writing code guided by tests and prevent the addition of anti-patterns

5.Feature Map Testing for Deep Neural Networks

Authors:Dong Huang, Qingwen Bu, Yahao Qing, Yichao Fu, Heming Cui

Abstract: Due to the widespread application of deep neural networks~(DNNs) in safety-critical tasks, deep learning testing has drawn increasing attention. During the testing process, test cases that have been fuzzed or selected using test metrics are fed into the model to find fault-inducing test units (e.g., neurons and feature maps, activating which will almost certainly result in a model error) and report them to the DNN developer, who subsequently repair them~(e.g., retraining the model with test cases). Current test metrics, however, are primarily concerned with the neurons, which means that test cases that are discovered either by guided fuzzing or selection with these metrics focus on detecting fault-inducing neurons while failing to detect fault-inducing feature maps. In this work, we propose DeepFeature, which tests DNNs from the feature map level. When testing is conducted, DeepFeature will scrutinize every internal feature map in the model and identify vulnerabilities that can be enhanced through repairing to increase the model's overall performance. Exhaustive experiments are conducted to demonstrate that (1) DeepFeature is a strong tool for detecting the model's vulnerable feature maps; (2) DeepFeature's test case selection has a high fault detection rate and can detect more types of faults~(comparing DeepFeature to coverage-guided selection techniques, the fault detection rate is increased by 49.32\%). (3) DeepFeature's fuzzer also outperforms current fuzzing techniques and generates valuable test cases more efficiently.