1.Which Requirements Artifact Quality Defects are Automatically Detectable? A Case Study

Authors:Henning Femmer, Michael Unterkalmsteiner, Tony Gorschek

Abstract: [Context] The quality of requirements engineering artifacts, e.g. requirements specifications, is acknowledged to be an important success factor for projects. Therefore, many companies spend significant amounts of money to control the quality of their RE artifacts. To reduce spending and improve the RE artifact quality, methods were proposed that combine manual quality control, i.e. reviews, with automated approaches. [Problem] So far, we have seen various approaches to automatically detect certain aspects in RE artifacts. However, we still lack an overview what can and cannot be automatically detected. [Approach] Starting from an industry guideline for RE artifacts, we classify 166 existing rules for RE artifacts along various categories to discuss the share and the characteristics of those rules that can be automated. For those rules, that cannot be automated, we discuss the main reasons. [Contribution] We estimate that 53% of the 166 rules can be checked automatically either perfectly or with a good heuristic. Most rules need only simple techniques for checking. The main reason why some rules resist automation is due to imprecise definition. [Impact] By giving first estimates and analyses of automatically detectable and not automatically detectable rule violations, we aim to provide an overview of the potential of automated methods in requirements quality control.

2.Summary of the 4th International Workshop on Requirements Engineering and Testing (RET 2017)

Authors:Markus Borg, Elizabeth Bjarnason, Michael Unterkalmsteiner, Tingting Yu, Gregory Gay, Michael Felderer

Abstract: The RET (Requirements Engineering and Testing) workshop series provides a meeting point for researchers and practitioners from the two separate fields of Requirements Engineering (RE) and Testing. The long term aim is to build a community and a body of knowledge within the intersection of RE and Testing, i.e., RET. The 4th workshop was co-located with the 25th International Requirements Engineering Conference (RE'17) in Lisbon, Portugal and attracted about 20 participants. In line with the previous workshop instances, RET 2017 o ered an interactive setting with a keynote, an invited talk, paper presentations, and a concluding hands-on exercise.

3.Introducing Interactions in Multi-Objective Optimization of Software Architectures

Authors:Vittorio Cortellessa, J. Andres Diaz-Pace, Daniele Di Pompeo, Sebastian Frank, Pooyan Jamshidi, Michele Tucci, André van Hoorn

Abstract: Software architecture optimization aims to enhance non-functional attributes like performance and reliability while meeting functional requirements. Multi-objective optimization employs metaheuristic search techniques, such as genetic algorithms, to explore feasible architectural changes and propose alternatives to designers. However, the resource-intensive process may not always align with practical constraints. This study investigates the impact of designer interactions on multi-objective software architecture optimization. Designers can intervene at intermediate points in the fully automated optimization process, making choices that guide exploration towards more desirable solutions. We compare this interactive approach with the fully automated optimization process, which serves as the baseline. The findings demonstrate that designer interactions lead to a more focused solution space, resulting in improved architectural quality. By directing the search towards regions of interest, the interaction uncovers architectures that remain unexplored in the fully automated process.

4.Best performance and reliability for your time: budget-aware search-based optimization of software model refactoring

Authors:J. Andres Diaz-Pace, Daniele Di Pompeo, Michele Tucci

Abstract: Context: Software model optimization is a process that automatically generates design alternatives, typically to enhance quantifiable non-functional properties of software systems, such as performance and reliability. Multi-objective evolutionary algorithms have shown to be effective in this context for assisting the designer in identifying trade-offs between the desired non-functional properties. Objective: In this work, we investigate the effects of imposing a time budget to limit the search for design alternatives, which inevitably affects the quality of the resulting alternatives. Method: The effects of time budgets are analyzed by investigating both the quality of the generated design alternatives and their structural features when varying the budget and the genetic algorithm (NSGA-II, PESA2, SPEA2). This is achieved by employing multi-objective quality indicators and a tree-based representation of the search space. Results: The study reveals that the time budget significantly affects the quality of Pareto fronts, especially for performance and reliability. NSGA-II is the fastest algorithm, while PESA2 generates the highest-quality solutions. The imposition of a time budget results in structurally distinct models compared to those obtained without a budget, indicating that the search process is influenced by both the budget and algorithm selection. Conclusions: In software model optimization, imposing a time budget can be effective in saving optimization time, but designers should carefully consider the trade-off between time and solution quality in the Pareto front, along with the structural characteristics of the generated models. By making informed choices about the specific genetic algorithm, designers can achieve different trade-offs.

5.Exploring API Behaviours Through Generated Examples

Authors:Stefan Karlsson, John Hughes, Robbert Jongeling, Adnan Causevic, Daniel Sundmark

Abstract: Understanding the behaviour of a system's API can be hard. Giving users access to relevant examples of how an API behaves has been shown to make this easier for them. In addition, such examples can be used to verify expected behaviour or identify unwanted behaviours. Methods for automatically generating examples have existed for a long time. However, state-of-the-art methods rely on either white-box information, such as source code, or on formal specifications of the system behaviour. But what if you do not have access to either? e.g., when interacting with a third-party API. In this paper, we present an approach to automatically generate relevant examples of behaviours of an API, without requiring either source code or a formal specification of behaviour. Evaluation on an industry-grade REST API shows that our method can produce small and relevant examples that can help engineers to understand the system under exploration.

6.Multilevel Semantic Embedding of Software Patches: A Fine-to-Coarse Grained Approach Towards Security Patch Detection

Authors:Xunzhu Tang, zhenghan Chen, Saad Ezzini, Haoye Tian, Yewei Song, Jacques Klein, Tegawende F. Bissyande

Abstract: The growth of open-source software has increased the risk of hidden vulnerabilities that can affect downstream software applications. This concern is further exacerbated by software vendors' practice of silently releasing security patches without explicit warnings or common vulnerability and exposure (CVE) notifications. This lack of transparency leaves users unaware of potential security threats, giving attackers an opportunity to take advantage of these vulnerabilities. In the complex landscape of software patches, grasping the nuanced semantics of a patch is vital for ensuring secure software maintenance. To address this challenge, we introduce a multilevel Semantic Embedder for security patch detection, termed MultiSEM. This model harnesses word-centric vectors at a fine-grained level, emphasizing the significance of individual words, while the coarse-grained layer adopts entire code lines for vector representation, capturing the essence and interrelation of added or removed lines. We further enrich this representation by assimilating patch descriptions to obtain a holistic semantic portrait. This combination of multi-layered embeddings offers a robust representation, balancing word complexity, understanding code-line insights, and patch descriptions. Evaluating MultiSEM for detecting patch security, our results demonstrate its superiority, outperforming state-of-the-art models with promising margins: a 22.46\% improvement on PatchDB and a 9.21\% on SPI-DB in terms of the F1 metric.

7.Hyperbolic Code Retrieval: A Novel Approach for Efficient Code Search Using Hyperbolic Space Embeddings

Authors:Xunzhu Tang, zhenghan Chen, Saad Ezzini, Haoye Tian, Yewei Song, Jacques Klein, Tegawende F. Bissyande

Abstract: Within the realm of advanced code retrieval, existing methods have primarily relied on intricate matching and attention-based mechanisms. However, these methods often lead to computational and memory inefficiencies, posing a significant challenge to their real-world applicability. To tackle this challenge, we propose a novel approach, the Hyperbolic Code QA Matching (HyCoQA). This approach leverages the unique properties of Hyperbolic space to express connections between code fragments and their corresponding queries, thereby obviating the necessity for intricate interaction layers. The process commences with a reimagining of the code retrieval challenge, framed within a question-answering (QA) matching framework, constructing a dataset with triple matches characterized as \texttt{<}negative code, description, positive code\texttt{>}. These matches are subsequently processed via a static BERT embedding layer, yielding initial embeddings. Thereafter, a hyperbolic embedder transforms these representations into hyperbolic space, calculating distances between the codes and descriptions. The process concludes by implementing a scoring layer on these distances and leveraging hinge loss for model training. Especially, the design of HyCoQA inherently facilitates self-organization, allowing for the automatic detection of embedded hierarchical patterns during the learning phase. Experimentally, HyCoQA showcases remarkable effectiveness in our evaluations: an average performance improvement of 3.5\% to 4\% compared to state-of-the-art code retrieval techniques.

8.Large Language Models in Fault Localisation

Authors:Yonghao Wu, Zheng Li, Jie M. Zhang, Mike Papadakis, Mark Harman, Yong Liu

Abstract: Large Language Models (LLMs) have shown promise in multiple software engineering tasks including code generation, code summarisation, test generation and code repair. Fault localisation is essential for facilitating automatic program debugging and repair, and is demonstrated as a highlight at ChatGPT-4's launch event. Nevertheless, there has been little work understanding LLMs' capabilities for fault localisation in large-scale open-source programs. To fill this gap, this paper presents an in-depth investigation into the capability of ChatGPT-3.5 and ChatGPT-4, the two state-of-the-art LLMs, on fault localisation. Using the widely-adopted Defects4J dataset, we compare the two LLMs with the existing fault localisation techniques. We also investigate the stability and explanation of LLMs in fault localisation, as well as how prompt engineering and the length of code context affect the fault localisation effectiveness. Our findings demonstrate that within a limited code context, ChatGPT-4 outperforms all the existing fault localisation methods. Additional error logs can further improve ChatGPT models' localisation accuracy and stability, with an average 46.9% higher accuracy over the state-of-the-art baseline SmartFL in terms of TOP-1 metric. However, performance declines dramatically when the code context expands to the class-level, with ChatGPT models' effectiveness becoming inferior to the existing methods overall. Additionally, we observe that ChatGPT's explainability is unsatisfactory, with an accuracy rate of only approximately 30%. These observations demonstrate that while ChatGPT can achieve effective fault localisation performance under certain conditions, evident limitations exist. Further research is imperative to fully harness the potential of LLMs like ChatGPT for practical fault localisation applications.

9.Back to the Future: From Microservice to Monolith

Authors:Ruoyu Su, Xiaozhou Li, Davide Taibi

Abstract: Recently the trend of companies switching from microservice back to monolith has increased, leading to intense debate in the industry. We conduct a multivocal literature review, to investigate reasons for the phenomenon and key aspects to pay attention to during the switching back and analyze the opinions of other practitioners. The results pave the way for further research and provide guidance for industrial companies switching from microservice back to monolith.

10.PEM: Representing Binary Program Semantics for Similarity Analysis via a Probabilistic Execution Model

Authors:Xiangzhe Xu, Zhou Xuan, Shiwei Feng, Siyuan Cheng, Yapeng Ye, Qingkai Shi, Guanhong Tao, Le Yu, Zhuo Zhang, Xiangyu Zhang

Abstract: Binary similarity analysis determines if two binary executables are from the same source program. Existing techniques leverage static and dynamic program features and may utilize advanced Deep Learning techniques. Although they have demonstrated great potential, the community believes that a more effective representation of program semantics can further improve similarity analysis. In this paper, we propose a new method to represent binary program semantics. It is based on a novel probabilistic execution engine that can effectively sample the input space and the program path space of subject binaries. More importantly, it ensures that the collected samples are comparable across binaries, addressing the substantial variations of input specifications. Our evaluation on 9 real-world projects with 35k functions, and comparison with 6 state-of-the-art techniques show that PEM can achieve a precision of 96% with common settings, outperforming the baselines by 10-20%.