1.LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analysis

Authors:Yilun Liu, Shimin Tao, Weibin Meng, Jingyu Wang, Wenbing Ma, Yanqing Zhao, Yuhang Chen, Hao Yang, Yanfei Jiang, Xun Chen

Abstract: Automated log analysis is crucial in modern software-intensive systems for ensuring reliability and resilience throughout software maintenance and engineering life cycles. Existing methods perform tasks such as log parsing and log anomaly detection by providing a single prediction value without interpretation. However, given the increasing volume of system events, the limited interpretability of analysis results hinders analysts' trust and their ability to take appropriate actions. Moreover, these methods require substantial in-domain training data, and their performance declines sharply (by up to 62.5%) in online scenarios involving unseen logs from new domains, a common occurrence due to rapid software updates. In this paper, we propose LogPrompt, a novel zero-shot and interpretable log analysis approach. LogPrompt employs large language models (LLMs) to perform zero-shot log analysis tasks via a suite of advanced prompt strategies tailored for log tasks, which enhances LLMs' performance by up to 107.5% compared with simple prompts. Experiments on nine publicly available evaluation datasets across two tasks demonstrate that LogPrompt, despite using no training data, outperforms existing approaches trained on thousands of logs by up to around 50%. We also conduct a human evaluation of LogPrompt's interpretability, with six practitioners possessing over 10 years of experience, who highly rated the generated content in terms of usefulness and readability (averagely 4.42/5). LogPrompt also exhibits remarkable compatibility with open-source and smaller-scale LLMs, making it flexible for practical deployment.

2.Software Engineering Knowledge Areas in Startup Companies: A Mapping Study

Authors:Eriks Klotins, Michael Unterkalmsteiner, Tony Gorschek

Abstract: Background - Startup companies are becoming important suppliers of innovative and software intensive products. The failure rate among startups is high due to lack of resources, immaturity, multiple influences and dynamic technologies. However, software product engineering is the core activity in startups, therefore inadequacies in applied engineering practices might be a significant contributing factor for high failure rates. Aim - This study identifies and categorizes software engineering knowledge areas utilized in startups to map out the state-of-art, identifying gaps for further research. Method - We perform a systematic literature mapping study, applying snowball sampling to identify relevant primary studies. Results - We have identified 54 practices from 14 studies. Although 11 of 15 main knowledge areas from SWEBOK are covered, a large part of categories is not. Conclusions - Existing research does not provide reliable support for software engineering in any phase of a startup life cycle. Transfer of results to other startups is difficult due to low rigor in current studies.

3.Prism: Revealing Hidden Functional Clusters from Massive Instances in Cloud Systems

Authors:Jinyang Liu, Zhihan Jiang, Jiazhen Gu, Junjie Huang, Zhuangbin Chen, Cong Feng, Zengyin Yang, Yongqiang Yang, Michael R. Lyu

Abstract: Ensuring the reliability of cloud systems is critical for both cloud vendors and customers. Cloud systems often rely on virtualization techniques to create instances of hardware resources, such as virtual machines. However, virtualization hinders the observability of cloud systems, making it challenging to diagnose platform-level issues. To improve system observability, we propose to infer functional clusters of instances, i.e., groups of instances having similar functionalities. We first conduct a pilot study on a large-scale cloud system, i.e., Huawei Cloud, demonstrating that instances having similar functionalities share similar communication and resource usage patterns. Motivated by these findings, we formulate the identification of functional clusters as a clustering problem and propose a non-intrusive solution called Prism. Prism adopts a coarse-to-fine clustering strategy. It first partitions instances into coarse-grained chunks based on communication patterns. Within each chunk, Prism further groups instances with similar resource usage patterns to produce fine-grained functional clusters. Such a design reduces noises in the data and allows Prism to process massive instances efficiently. We evaluate Prism on two datasets collected from the real-world production environment of Huawei Cloud. Our experiments show that Prism achieves a v-measure of ~0.95, surpassing existing state-of-the-art solutions. Additionally, we illustrate the integration of Prism within monitoring systems for enhanced cloud reliability through two real-world use cases.

4.Assessing requirements engineering and software test alignment -- Five case studies

Authors:Michael Unterkalmsteiner, Tony Gorschek, Robert Feldt, Eriks Klotins

Abstract: The development of large, software-intensive systems is a complex undertaking that we generally tackle by a divide and conquer strategy. Companies thereby face the challenge of coordinating individual aspects of software development, in particular between requirements engineering (RE) and software testing (ST). A lack of REST alignment can not only lead to wasted effort but also to defective software. However, before a company can improve the mechanisms of coordination they need to be understood first. With REST-bench we aim at providing an assessment tool that illustrates the coordination in software development projects and identify concrete improvement opportunities. We have developed REST-bench on the sound fundamentals of a taxonomy on REST alignment methods and validated the method in five case studies. Following the principles of technical action research, we collaborated with five companies, applying REST-bench and iteratively improving the method based on the lessons we learned. We applied REST-bench both in Agile and plan-driven environments, in projects lasting from weeks to years, and staffed as large as 1000 employees. The improvement opportunities we identified and the feedback we received indicate that the assessment was effective and efficient. Furthermore, participants confirmed that their understanding on the coordination between RE and ST improved.

5.From Commit Message Generation to History-Aware Commit Message Completion

Authors:Aleksandra Eliseeva, Yaroslav Sokolov, Egor Bogomolov, Yaroslav Golubev, Danny Dig, Timofey Bryksin

Abstract: Commit messages are crucial to software development, allowing developers to track changes and collaborate effectively. Despite their utility, most commit messages lack important information since writing high-quality commit messages is tedious and time-consuming. The active research on commit message generation (CMG) has not yet led to wide adoption in practice. We argue that if we could shift the focus from commit message generation to commit message completion and use previous commit history as additional context, we could significantly improve the quality and the personal nature of the resulting commit messages. In this paper, we propose and evaluate both of these novel ideas. Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages. We use this dataset to evaluate the completion setting and the usefulness of the historical context for state-of-the-art CMG models and GPT-3.5-turbo. Our results show that in some contexts, commit message completion shows better results than generation, and that while in general GPT-3.5-turbo performs worse, it shows potential for long and detailed messages. As for the history, the results show that historical information improves the performance of CMG models in the generation task, and the performance of GPT-3.5-turbo in both generation and completion.

6.Maat: Performance Metric Anomaly Anticipation for Cloud Services with Conditional Diffusion

Authors:Cheryl Lee, Tianyi Yang, Zhuangbin Chen, Yuxin Su, Michael R. Lyu

Abstract: Ensuring the reliability and user satisfaction of cloud services necessitates prompt anomaly detection followed by diagnosis. Existing techniques for anomaly detection focus solely on real-time detection, meaning that anomaly alerts are issued as soon as anomalies occur. However, anomalies can propagate and escalate into failures, making faster-than-real-time anomaly detection highly desirable for expediting downstream analysis and intervention. This paper proposes Maat, the first work to address anomaly anticipation of performance metrics in cloud services. Maat adopts a novel two-stage paradigm for anomaly anticipation, consisting of metric forecasting and anomaly detection on forecasts. The metric forecasting stage employs a conditional denoising diffusion model to enable multi-step forecasting in an auto-regressive manner. The detection stage extracts anomaly-indicating features based on domain knowledge and applies isolation forest with incremental learning to detect upcoming anomalies. Thus, our method can uncover anomalies that better conform to human expertise. Evaluation on three publicly available datasets demonstrates that Maat can anticipate anomalies faster than real-time comparatively or more effectively compared with state-of-the-art real-time anomaly detectors. We also present cases highlighting Maat's success in forecasting abnormal metrics and discovering anomalies.

7.Research Software Engineering in 2030

Authors:Daniel S. Katz, Simon Hettrick

Abstract: This position paper for an invited talk on the "Future of eScience" discusses the Research Software Engineering Movement and where it might be in 2030. Because of the authors' experiences, it is aimed globally but with examples that focus on the United States and United Kingdom.