1.MaintainoMATE: A GitHub App for Intelligent Automation of Maintenance Activities

Authors:Anas Nadeem, Muhammad Usman Sarwar, Muhammad Zubair Malik

Abstract: Software development projects rely on issue tracking systems at the core of tracking maintenance tasks such as bug reports, and enhancement requests. Incoming issue-reports on these issue tracking systems must be managed in an effective manner. First, they must be labelled and then assigned to a particular developer with relevant expertise. This handling of issue-reports is critical and requires thorough scanning of the text entered in an issue-report making it a labor-intensive task. In this paper, we present a unified framework called MaintainoMATE, which is capable of automatically categorizing the issue-reports in their respective category and further assigning the issue-reports to a developer with relevant expertise. We use the Bidirectional Encoder Representations from Transformers (BERT), as an underlying model for MaintainoMATE to learn the contextual information for automatic issue-report labeling and assignment tasks. We deploy the framework used in this work as a GitHub application. We empirically evaluate our approach on GitHub issue-reports to show its capability of assigning labels to the issue-reports. We were able to achieve an F1-score close to 80\%, which is comparable to existing state-of-the-art results. Similarly, our initial evaluations show that we can assign relevant developers to the issue-reports with an F1 score of 54\%, which is a significant improvement over existing approaches. Our initial findings suggest that MaintainoMATE has the potential of improving software quality and reducing maintenance costs by accurately automating activities involved in the maintenance processes. Our future work would be directed towards improving the issue-assignment module.

2.Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing

Authors:Arghavan Moradi Dakhel, Amin Nikanjam, Vahid Majdinasab, Foutse Khomh, Michel C. Desmarais

Abstract: One of the critical phases in software development is software testing. Testing helps with identifying potential bugs and reducing maintenance costs. The goal of automated test generation tools is to ease the development of tests by suggesting efficient bug-revealing tests. Recently, researchers have leveraged Large Language Models (LLMs) of code to generate unit tests. While the code coverage of generated tests was usually assessed, the literature has acknowledged that the coverage is weakly correlated with the efficiency of tests in bug detection. To improve over this limitation, in this paper, we introduce MuTAP for improving the effectiveness of test cases generated by LLMs in terms of revealing bugs by leveraging mutation testing. Our goal is achieved by augmenting prompts with surviving mutants, as those mutants highlight the limitations of test cases in detecting bugs. MuTAP is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs). We employ different LLMs within MuTAP and evaluate their performance on different benchmarks. Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets. Among these, 17% remained undetected by both the current state-of-the-art fully automated test generation tool (i.e., Pynguin) and zero-shot/few-shot learning approaches on LLMs. Furthermore, MuTAP achieves a Mutation Score (MS) of 93.57% on synthetic buggy code, outperforming all other approaches in our evaluation. Our findings suggest that although LLMs can serve as a useful tool to generate test cases, they require specific post-processing steps to enhance the effectiveness of the generated test cases which may suffer from syntactic or functional errors and may be ineffective in detecting certain types of bugs and testing corner cases PUTs.

3.Learning to Represent Patches

Authors:Xunzhu Tang, Haoye Tian, Zhenghan Chen, Weiguo Pian, Saad Ezzini, Abdoul Kader Kabore, Andrew Habib, Jacques Klein, Tegawende F. Bissyande

Abstract: Patch representation is crucial in automating various software engineering tasks, like determining patch accuracy or summarizing code changes. While recent research has employed deep learning for patch representation, focusing on token sequences or Abstract Syntax Trees (ASTs), they often miss the change's semantic intent and the context of modified lines. To bridge this gap, we introduce a novel method, Patcherizer. It delves into the intentions of context and structure, merging the surrounding code context with two innovative representations. These capture the intention in code changes and the intention in AST structural modifications pre and post-patch. This holistic representation aptly captures a patch's underlying intentions. Patcherizer employs graph convolutional neural networks for structural intention graph representation and transformers for intention sequence representation. We evaluated Patcherizer's embeddings' versatility in three areas: (1) Patch description generation, (2) Patch accuracy prediction, and (3) Patch intention identification. Our experiments demonstrate the representation's efficacy across all tasks, outperforming state-of-the-art methods. For example, in patch description generation, Patcherizer excels, showing an average boost of 19.39% in BLEU, 8.71% in ROUGE-L, and 34.03% in METEOR scores.

4.Safety of the Intended Functionality Concept Integration into a Validation Tool Suite

Authors:Víctor J. Expósito Jiménez, Bernhard Winkler, Joaquim M. Castella Triginer, Heiko Scharke, Hannes Schneider, Eugen Brenner, Georg Macher

Abstract: Nowadays, the increasing complexity of Advanced Driver Assistance Systems (ADAS) and Automated Driving (AD) means that the industry must move towards a scenario-based approach to validation rather than relying on established technology-based methods. This new focus also requires the validation process to take into account Safety of the Intended Functionality (SOTIF), as many scenarios may trigger hazardous vehicle behaviour. Thus, this work demonstrates how the integration of the SOTIF process within an existing validation tool suite can be achieved. The necessary adaptations are explained with accompanying examples to aid comprehension of the approach.

5.JavaScript Dead Code Identification, Elimination, and Empirical Assessment

Authors:Ivano Malavolta, Kishan Nirghin, Gian Luca Scoccia, Simone Romano, Salvatore Lombardi, Giuseppe Scanniello, Patricia Lago

Abstract: Web apps are built by using a combination of HTML, CSS, and JavaScript. While building modern web apps, it is common practice to make use of third-party libraries and frameworks, as to improve developers' productivity and code quality. Alongside these benefits, the adoption of such libraries results in the introduction of JavaScript dead code, i.e., code implementing unused functionalities. The costs for downloading and parsing dead code can negatively contribute to the loading time and resource usage of web apps. The goal of our study is two-fold. First, we present Lacuna, an approach for automatically detecting and eliminating JavaScript dead code from web apps. The proposed approach supports both static and dynamic analyses, it is extensible and can be applied to any JavaScript code base, without imposing constraints on the coding style or on the use of specific JavaScript constructs. Secondly, by leveraging Lacuna we conduct an experiment to empirically evaluate the run-time overhead of JavaScript dead code in terms of energy consumption, performance, network usage, and resource usage in the context of mobile web apps. We applied Lacuna four times on 30 mobile web apps independently developed by third-party developers, each time eliminating dead code according to a different optimization level provided by Lacuna. Afterward, each different version of the web app is executed on an Android device, while collecting measures to assess the potential run-time overhead caused by dead code. Experimental results, among others, highlight that the removal of JavaScript dead code has a positive impact on the loading time of mobile web apps, while significantly reducing the number of bytes transferred over the network.

6.Native vs Web Apps: Comparing the Energy Consumption and Performance of Android Apps and their Web Counterparts

Authors:Ruben Horn, Abdellah Lahnaoui, Edgardo Reinoso, Sicheng Peng, Vadim Isakov, Tanjina Islam, Ivano Malavolta

Abstract: Context. Many Internet content platforms, such as Spotify and YouTube, provide their services via both native and Web apps. Even though those apps provide similar features to the end user, using their native version or Web counterpart might lead to different levels of energy consumption and performance. Goal. The goal of this study is to empirically assess the energy consumption and performance of native and Web apps in the context of Internet content platforms on Android. Method. We select 10 Internet content platforms across 5 categories. Then, we measure them based on the energy consumption, network traffic volume, CPU load, memory load, and frame time of their native and Web versions; then, we statistically analyze the collected measures and report our results. Results. We confirm that native apps consume significantly less energy than their Web counterparts, with large effect size. Web apps use more CPU and memory, with statistically significant difference and large effect size. Therefore, we conclude that native apps tend to require fewer hardware resources than their corresponding Web versions. The network traffic volume exhibits statistically significant difference in favour of native apps, with small effect size. Our results do not allow us to draw any conclusion in terms of frame time. Conclusions. Based on our results, we advise users to access Internet contents using native apps over Web apps, when possible. Also, the results of this study motivate further research on the optimization of the usage of runtime resources of mobile Web apps and Android browsers.

7.Toward Automatically Completing GitHub Workflows

Authors:Antonio Mastropaolo, Fiorella Zampetti, Massimiliano Di Penta, Gabriele Bavota

Abstract: Continuous integration and delivery (CI/CD) are nowadays at the core of software development. Their benefits come at the cost of setting up and maintaining the CI/CD pipeline, which requires knowledge and skills often orthogonal to those entailed in other software-related tasks. While several recommender systems have been proposed to support developers across a variety of tasks, little automated support is available when it comes to setting up and maintaining CI/CD pipelines. We present GH-WCOM (GitHub Workflow COMpletion), a Transformer-based approach supporting developers in writing a specific type of CI/CD pipelines, namely GitHub workflows. To deal with such a task, we designed an abstraction process to help the learning of the transformer while still making GH-WCOM able to recommend very peculiar workflow elements such as tool options and scripting elements. Our empirical study shows that GH-WCOM provides up to 34.23% correct predictions, and the model's confidence is a reliable proxy for the recommendations' correctness likelihood.