1.CodeMark: Imperceptible Watermarking for Code Datasets against Neural Code Completion Models

Authors:Zhensu Sun, Xiaoning Du, Fu Song, Li Li

Abstract: Code datasets are of immense value for training neural-network-based code completion models, where companies or organizations have made substantial investments to establish and process these datasets. Unluckily, these datasets, either built for proprietary or public usage, face the high risk of unauthorized exploits, resulting from data leakages, license violations, etc. Even worse, the ``black-box'' nature of neural models sets a high barrier for externals to audit their training datasets, which further connives these unauthorized usages. Currently, watermarking methods have been proposed to prohibit inappropriate usage of image and natural language datasets. However, due to domain specificity, they are not directly applicable to code datasets, leaving the copyright protection of this emerging and important field of code data still exposed to threats. To fill this gap, we propose a method, named CodeMark, to embed user-defined imperceptible watermarks into code datasets to trace their usage in training neural code completion models. CodeMark is based on adaptive semantic-preserving transformations, which preserve the exact functionality of the code data and keep the changes covert against rule-breakers. We implement CodeMark in a toolkit and conduct an extensive evaluation of code completion models. CodeMark is validated to fulfill all desired properties of practical watermarks, including harmlessness to model accuracy, verifiability, robustness, and imperceptibility.

2.STEAM: Simulating the InTeractive BEhavior of ProgrAMmers for Automatic Bug Fixing

Authors:Yuwei Zhang, Zhi Jin, Ying Xing, Ge Li

Abstract: Bug fixing holds significant importance in software development and maintenance. Recent research has made notable progress in exploring the potential of large language models (LLMs) for automatic bug fixing. However, existing studies often overlook the collaborative nature of bug resolution, treating it as a single-stage process. To overcome this limitation, we introduce a novel stage-wise framework named STEAM in this paper. The objective of STEAM is to simulate the interactive behavior of multiple programmers involved in various stages across the bug's life cycle. Taking inspiration from bug management practices, we decompose the bug fixing task into four distinct stages: bug reporting, bug diagnosis, patch generation, and patch verification. These stages are performed interactively by LLMs, aiming to imitate the collaborative abilities of programmers during the resolution of software bugs. By harnessing the collective contribution, STEAM effectively enhances the bug-fixing capabilities of LLMs. We implement STEAM by employing the powerful dialogue-based LLM -- ChatGPT. Our evaluation on the widely adopted bug-fixing benchmark demonstrates that STEAM has achieved a new state-of-the-art level of bug-fixing performance.

3.MELT: Mining Effective Lightweight Transformations from Pull Requests

Authors:Daniel Ramos, Hailie Mitchell, Inês Lynce, Vasco Manquinho, Ruben Martins, Claire Le Goues

Abstract: Software developers often struggle to update APIs, leading to manual, time-consuming, and error-prone processes. We introduce MELT, a new approach that generates lightweight API migration rules directly from pull requests in popular library repositories. Our key insight is that pull requests merged into open-source libraries are a rich source of information sufficient to mine API migration rules. By leveraging code examples mined from the library source and automatically generated code examples based on the pull requests, we infer transformation rules in \comby, a language for structural code search and replace. Since inferred rules from single code examples may be too specific, we propose a generalization procedure to make the rules more applicable to client projects. MELT rules are syntax-driven, interpretable, and easily adaptable. Moreover, unlike previous work, our approach enables rule inference to seamlessly integrate into the library workflow, removing the need to wait for client code migrations. We evaluated MELT on pull requests from four popular libraries, successfully mining 461 migration rules from code examples in pull requests and 114 rules from auto-generated code examples. Our generalization procedure increases the number of matches for mined rules by 9x. We applied these rules to client projects and ran their tests, which led to an overall decrease in the number of warnings and fixing some test cases demonstrating MELT's effectiveness in real-world scenarios.

4.The Effect of Stereotypes on Perceived Competence of Indigenous Software Practitioners: A Professional Photo

Authors:Mary Sánchez-Gordón, Ricardo Colomo-Palacios, Cathy Guevara-Vega, Antonio Quiña-Mera

Abstract: Context: Potential employers can readily find job candidates' photos through various online sources such as former employers' websites or professional and social networks. The alignment or 'fit' between a candidate and an organization is inferred in online photos through dress style and presentations of self. On the other hand, for candidates from under-represented groups like Indigenous people traditional clothing is an important and lively aspect that allows them to express belonging, enter ceremony, and show resistance.Objective: This exploratory study aims to empirically demonstrate whether traditional clothing in a picture affects the evaluation of candidates' competence for a position like a software developer in which clothing should not be crucial. Method: We plan a quasi-experimental design with both candidates (photo models) and participants (evaluators) from IT companies. It follows a 2 x 2 x 2 design with dress style (traditional / non-traditional clothing), gender and race/ethnicity of the candidates as within-subjects factors. In addition, we will explore the evaluator's gender and experience in hiring as between-subjects factors.

5.Distilled GPT for Source Code Summarization

Authors:Chia-Yi Su, Collin McMillan

Abstract: A code summary is a brief natural language description of source code. Summaries are usually only a single sentence long, and yet form the backbone of developer documentation. A short descriptions such as "changes all visible polygons to the color blue" can give a programmer a high-level idea of what code does without the effort of reading the code itself. Recently, products based on Large Language Models such as ChatGPT have demonstrated a strong ability to write these descriptions automatically. However, to use these tools, programmers must send their code to untrusted third parties for processing (e.g., via an API call). This loss of custody is not acceptable to many organizations. In this paper, we present an alternative: we train an open source model using sample output generated by GPT-3.5 in a process related to knowledge distillation. Our model is small enough (350m parameters) to be run on a single 16gb GPU, yet we show in our evaluation that it is large enough to mimic GPT-3.5 on this task.