1.A Survey of Modern Compiler Fuzzing

Authors:Haoyang Ma

Abstract: Most software that runs on computers undergoes processing by compilers. Since compilers constitute the fundamental infrastructure of software development, their correctness is paramount. Over the years, researchers have invested in analyzing, understanding, and characterizing the bug features over mainstream compilers. These studies have demonstrated that compilers correctness requires greater research attention, and they also pave the way for compiler fuzzing. To improve compilers correctness, researchers have proposed numerous compiler fuzzing techniques. These techniques were initially developed for testing traditional compilers such as GCC/LLVM and have since been generalized to test various newly developed, domain-specific compilers, such as graphics shader compilers and deep learning (DL) compilers. In this survey, we provide a comprehensive summary of the research efforts for understanding and addressing compilers defects. Specifically, this survey mainly covers two aspects. First, it covers researchers investigation and expertise on compilers bugs, such as their symptoms and root causes. The compiler bug studies cover GCC/LLVM, JVM compilers, and DL compilers. In addition, it covers researchers efforts in designing fuzzing techniques, including constructing test programs and designing test oracles. Besides discussing the existing work, this survey outlines several open challenges and highlights research opportunities.

2.LIVABLE: Exploring Long-Tailed Classification of Software Vulnerability Types

Authors:Xin-Cheng Wen, Cuiyun Gao, Feng Luo, Haoyu Wang, Ge Li, Qing Liao

Abstract: Prior studies generally focus on software vulnerability detection and have demonstrated the effectiveness of Graph Neural Network (GNN)-based approaches for the task. Considering the various types of software vulnerabilities and the associated different degrees of severity, it is also beneficial to determine the type of each vulnerable code for developers. In this paper, we observe that the distribution of vulnerability type is long-tailed in practice, where a small portion of classes have massive samples (i.e., head classes) but the others contain only a few samples (i.e., tail classes). Directly adopting previous vulnerability detection approaches tends to result in poor detection performance, mainly due to two reasons. First, it is difficult to effectively learn the vulnerability representation due to the over-smoothing issue of GNNs. Second, vulnerability types in tails are hard to be predicted due to the extremely few associated samples.To alleviate these issues, we propose a Long-taIled software VulnerABiLity typE classification approach, called LIVABLE. LIVABLE mainly consists of two modules, including (1) vulnerability representation learning module, which improves the propagation steps in GNN to distinguish node representations by a differentiated propagation method. A sequence-to-sequence model is also involved to enhance the vulnerability representations. (2) adaptive re-weighting module, which adjusts the learning weights for different types according to the training epochs and numbers of associated samples by a novel training loss.

3.Assessing the Impact of File Ordering Strategies on Code Review Process

Authors:Farid Bagirov, Pouria Derakhshanfar, Alexey Kalina, Elena Kartysheva, Vladimir Kovalenko

Abstract: Popular modern code review tools (e.g. Gerrit and GitHub) sort files in a code review in alphabetical order. A prior study (on open-source projects) shows that the changed files' positions in the code review affect the review process. Their results show that files placed lower in the order have less chance of receiving reviewing efforts than the other files. Hence, there is a higher chance of missing defects in these files. This paper explores the impact of file order in the code review of the well-known industrial project IntelliJ IDEA. First, we verify the results of the prior study on a big proprietary software project. Then, we explore an alternative to the default Alphabetical order: ordering changed files according to their code diff. Our results confirm the observations of the previous study. We discover that reviewers leave more comments on the files shown higher in the code review. Moreover, these results show that, even with the data skewed toward Alphabetical order, ordering changed files according to their code diff performs better than standard Alphabetical order regarding placing problematic files, which needs more reviewing effort, in the code review. These results confirm that exploring various ordering strategies for code review needs more exploration.

4.Automated use case diagram generator using NLP and ML

Authors:Rukshan Piyumadu Dias, C. S. L. Vidanapathirana, Rukshala Weerasinghe, Asitha Manupiya, R. M. S. J. Bandara, Y. P. H. W. Ranasinghe

Abstract: This paper presents a novel approach to generate a use case diagram by analyzing the given user story using NLP and ML. Use case diagrams play a major role in the designing phase of the SDLC. This proves the fact that automating the use case diagram designing process would save a lot of time and effort. Numerous manual and semi-automated tools have been developed previously. This paper also discusses the need for use case diagrams and problems faced during designing that. This paper is an attempt to solve those issues by generating the use case diagram in a fully automatic manner.

5.A UML Profile for Bitcoin Blockchain

Authors:Behrouz Sefid-Dashti, Javad Salimi Sartakhti, Hassan Daghigh

Abstract: Blockchain has received attention for its potential use in business. Bitcoin is powered by blockchain, and interest in it has surged in the past few years. It has many uses that need to be modeled. Modeling is used in many walks of life to share ideas, reduce complexity, achieve close alignment of one person viewpoint with another and provide abstractions of a system at some level of precision and detail. Software modeling is used in Model Driven Engineering (MDE), and Domain Specific Languages (DSLs) ease model development and provide intuitive syntax for domain experts. The present study has designed and evaluated a meta-model for the bitcoin application domain to facilitate application development and help in truly understanding bitcoin. The proposed meta-model, including stereotypes, tagged values, enumerations and a set of constraints defined by Object Constraint Language (OCL), was defined as a Unified Modeling Language (UML) profile and was implemented in the Sparx Enterprise Architect (Sparx EA) modeling tool. A case study developed by our meta-model is also presented.