Cryptography and Security (cs.CR)

Mon, 22 May 2023

Other arXiv digests in this category:Thu, 14 Sep 2023; Wed, 13 Sep 2023; Tue, 12 Sep 2023; Mon, 11 Sep 2023; Fri, 08 Sep 2023; Tue, 05 Sep 2023; Fri, 01 Sep 2023; Thu, 31 Aug 2023; Wed, 30 Aug 2023; Tue, 29 Aug 2023; Mon, 28 Aug 2023; Fri, 25 Aug 2023; Thu, 24 Aug 2023; Wed, 23 Aug 2023; Tue, 22 Aug 2023; Mon, 21 Aug 2023; Fri, 18 Aug 2023; Thu, 17 Aug 2023; Wed, 16 Aug 2023; Tue, 15 Aug 2023; Mon, 14 Aug 2023; Fri, 11 Aug 2023; Thu, 10 Aug 2023; Wed, 09 Aug 2023; Tue, 08 Aug 2023; Mon, 07 Aug 2023; Fri, 04 Aug 2023; Thu, 03 Aug 2023; Wed, 02 Aug 2023; Tue, 01 Aug 2023; Mon, 31 Jul 2023; Fri, 28 Jul 2023; Thu, 27 Jul 2023; Wed, 26 Jul 2023; Tue, 25 Jul 2023; Mon, 24 Jul 2023; Fri, 21 Jul 2023; Thu, 20 Jul 2023; Wed, 19 Jul 2023; Tue, 18 Jul 2023; Mon, 17 Jul 2023; Fri, 14 Jul 2023; Thu, 13 Jul 2023; Wed, 12 Jul 2023; Tue, 11 Jul 2023; Mon, 10 Jul 2023; Fri, 07 Jul 2023; Thu, 06 Jul 2023; Wed, 05 Jul 2023; Tue, 04 Jul 2023; Mon, 03 Jul 2023; Fri, 30 Jun 2023; Thu, 29 Jun 2023; Wed, 28 Jun 2023; Tue, 27 Jun 2023; Mon, 26 Jun 2023; Fri, 23 Jun 2023; Thu, 22 Jun 2023; Wed, 21 Jun 2023; Tue, 20 Jun 2023; Fri, 16 Jun 2023; Thu, 15 Jun 2023; Tue, 13 Jun 2023; Mon, 12 Jun 2023; Fri, 09 Jun 2023; Thu, 08 Jun 2023; Wed, 07 Jun 2023; Tue, 06 Jun 2023; Mon, 05 Jun 2023; Fri, 02 Jun 2023; Thu, 01 Jun 2023; Wed, 31 May 2023; Tue, 30 May 2023; Mon, 29 May 2023; Fri, 26 May 2023; Thu, 25 May 2023; Wed, 24 May 2023; Tue, 23 May 2023; Fri, 19 May 2023; Thu, 18 May 2023; Wed, 17 May 2023; Tue, 16 May 2023; Mon, 15 May 2023; Fri, 12 May 2023; Thu, 11 May 2023; Wed, 10 May 2023; Tue, 09 May 2023; Mon, 08 May 2023; Fri, 05 May 2023; Thu, 04 May 2023; Wed, 03 May 2023; Tue, 02 May 2023; Mon, 01 May 2023; Fri, 28 Apr 2023; Thu, 27 Apr 2023; Wed, 26 Apr 2023; Tue, 25 Apr 2023; Mon, 24 Apr 2023; Fri, 21 Apr 2023; Thu, 20 Apr 2023; Wed, 19 Apr 2023; Tue, 18 Apr 2023; Mon, 17 Apr 2023; Fri, 14 Apr 2023; Thu, 13 Apr 2023; Wed, 12 Apr 2023; Tue, 11 Apr 2023; Mon, 10 Apr 2023

1.The "code'' of Ethics:A Holistic Audit of AI Code Generators

2305.12747

Authors:Wanlun Ma, Yiliao Song, Minhui Xue, Sheng Wen, Yang Xiang

Abstract: AI-powered programming language generation (PLG) models have gained increasing attention due to their ability to generate source code of programs in a few seconds with a plain program description. Despite their remarkable performance, many concerns are raised over the potential risks of their development and deployment, such as legal issues of copyright infringement induced by training usage of licensed code, and malicious consequences due to the unregulated use of these models. In this paper, we present the first-of-its-kind study to systematically investigate the accountability of PLG models from the perspectives of both model development and deployment. In particular, we develop a holistic framework not only to audit the training data usage of PLG models, but also to identify neural code generated by PLG models as well as determine its attribution to a source model. To this end, we propose using membership inference to audit whether a code snippet used is in the PLG model's training data. In addition, we propose a learning-based method to distinguish between human-written code and neural code. In neural code attribution, through both empirical and theoretical analysis, we show that it is impossible to reliably attribute the generation of one code snippet to one model. We then propose two feasible alternative methods: one is to attribute one neural code snippet to one of the candidate PLG models, and the other is to verify whether a set of neural code snippets can be attributed to a given PLG model. The proposed framework thoroughly examines the accountability of PLG models which are verified by extensive experiments. The implementations of our proposed framework are also encapsulated into a new artifact, named CodeForensic, to foster further research.

2.FGAM:Fast Adversarial Malware Generation Method Based on Gradient Sign

2305.12770

Authors:Kun Li, Fan Zhang, Wei Guo

Abstract: Malware detection models based on deep learning have been widely used, but recent research shows that deep learning models are vulnerable to adversarial attacks. Adversarial attacks are to deceive the deep learning model by generating adversarial samples. When adversarial attacks are performed on the malware detection model, the attacker will generate adversarial malware with the same malicious functions as the malware, and make the detection model classify it as benign software. Studying adversarial malware generation can help model designers improve the robustness of malware detection models. At present, in the work on adversarial malware generation for byte-to-image malware detection models, there are mainly problems such as large amount of injection perturbation and low generation efficiency. Therefore, this paper proposes FGAM (Fast Generate Adversarial Malware), a method for fast generating adversarial malware, which iterates perturbed bytes according to the gradient sign to enhance adversarial capability of the perturbed bytes until the adversarial malware is successfully generated. It is experimentally verified that the success rate of the adversarial malware deception model generated by FGAM is increased by about 84\% compared with existing methods.

3.Hot Pixels: Frequency, Power, and Temperature Attacks on GPUs and ARM SoCs

2305.12784

Authors:Hritvik Taneja, Jason Kim, Jie Jeff Xu, Stephan van Schaik, Daniel Genkin, Yuval Yarom

Abstract: The drive to create thinner, lighter, and more energy efficient devices has resulted in modern SoCs being forced to balance a delicate tradeoff between power consumption, heat dissipation, and execution speed (i.e., frequency). While beneficial, these DVFS mechanisms have also resulted in software-visible hybrid side-channels, which use software to probe analog properties of computing devices. Such hybrid attacks are an emerging threat that can bypass countermeasures for traditional microarchitectural side-channel attacks. Given the rise in popularity of both Arm SoCs and GPUs, in this paper we investigate the susceptibility of these devices to information leakage via power, temperature and frequency, as measured via internal sensors. We demonstrate that the sensor data observed correlates with both instructions executed and data processed, allowing us to mount software-visible hybrid side-channel attacks on these devices. To demonstrate the real-world impact of this issue, we present JavaScript-based pixel stealing and history sniffing attacks on Chrome and Safari, with all side channel countermeasures enabled. Finally, we also show website fingerprinting attacks, without any elevated privileges.

4.POSTER: spaceQUIC: Securing Communication in Computationally Constrained Spacecraft

2305.12948

Authors:Joshua Smailes, Razvan David, Sebastian Kohler, Simon Birnbach, Ivan Martinovic

Abstract: Recent years have seen a rapid increase in the number of CubeSats and other small satellites in orbit - these have highly constrained computational and communication resources, but still require robust secure communication to operate effectively. The QUIC transport layer protocol is designed to provide efficient communication with cryptography guarantees built-in, with a particular focus on networks with high latency and packet loss. In this work we provide spaceQUIC, a proof of concept implementation of QUIC for NASA's "core Flight System" satellite operating system, and assess its performance.

5.FSSA: Efficient 3-Round Secure Aggregation for Privacy-Preserving Federated Learning

2305.12950

Authors:Fucai Luo, Saif Al-Kuwari, Haiyan Wang, Xingfu Yan

Abstract: Federated learning (FL) allows a large number of clients to collaboratively train machine learning (ML) models by sending only their local gradients to a central server for aggregation in each training iteration, without sending their raw training data. Unfortunately, recent attacks on FL demonstrate that local gradients may leak information about local training data. In response to such attacks, Bonawitz \textit{et al.} (CCS 2017) proposed a secure aggregation protocol that allows a server to compute the sum of clients' local gradients in a secure manner. However, their secure aggregation protocol requires at least 4 rounds of communication between each client and the server in each training iteration. The number of communication rounds is closely related not only to the total communication cost but also the ML model accuracy, as the number of communication rounds affects client dropouts. In this paper, we propose FSSA, a 3-round secure aggregation protocol, that is efficient in terms of computation and communication, and resilient to client dropouts. We prove the security of FSSA in honest-but-curious setting and show that the security can be maintained even if an arbitrarily chosen subset of clients drop out at any time. We evaluate the performance of FSSA and show that its computation and communication overhead remains low even on large datasets. Furthermore, we conduct an experimental comparison between FSSA and Bonawitz \textit{et al.}'s protocol. The comparison results show that, in addition to reducing the number of communication rounds, FSSA achieves a significant improvement in computational efficiency.

6.Analyzing the Shuffle Model through the Lens of Quantitative Information Flow

2305.13075

Authors:Mireya Jurado, Ramon G. Gonze, Mário S. Alvim, Catuscia Palamidessi

Abstract: Local differential privacy (LDP) is a variant of differential privacy (DP) that avoids the need for a trusted central curator, at the cost of a worse trade-off between privacy and utility. The shuffle model is a way to provide greater anonymity to users by randomly permuting their messages, so that the link between users and their reported values is lost to the data collector. By combining an LDP mechanism with a shuffler, privacy can be improved at no cost for the accuracy of operations insensitive to permutations, thereby improving utility in many tasks. However, the privacy implications of shuffling are not always immediately evident, and derivations of privacy bounds are made on a case-by-case basis. In this paper, we analyze the combination of LDP with shuffling in the rigorous framework of quantitative information flow (QIF), and reason about the resulting resilience to inference attacks. QIF naturally captures randomization mechanisms as information-theoretic channels, thus allowing for precise modeling of a variety of inference attacks in a natural way and for measuring the leakage of private information under these attacks. We exploit symmetries of the particular combination of k-RR mechanisms with the shuffle model to achieve closed formulas that express leakage exactly. In particular, we provide formulas that show how shuffling improves protection against leaks in the local model, and study how leakage behaves for various values of the privacy parameter of the LDP mechanism. In contrast to the strong adversary from differential privacy, we focus on an uninformed adversary, who does not know the value of any individual in the dataset. This adversary is often more realistic as a consumer of statistical datasets, and we show that in some situations mechanisms that are equivalent w.r.t. the strong adversary can provide different privacy guarantees under the uninformed one.

7.Watermarking Text Data on Large Language Models for Dataset Copyright Protection

2305.13257

Authors:Yixin Liu, Hongsheng Hu, Xuyun Zhang, Lichao Sun

Abstract: Large Language Models (LLMs), such as BERT and GPT-based models like ChatGPT, have recently demonstrated their impressive capacity for learning language representations, yielding significant benefits for various downstream Natural Language Processing (NLP) tasks. However, the immense data requirements of these large models have incited substantial concerns regarding copyright protection and data privacy. In an attempt to address these issues, particularly the unauthorized use of private data in LLMs, we introduce a novel watermarking technique via a backdoor-based membership inference approach, i.e., TextMarker, which can safeguard diverse forms of private information embedded in the training text data in LLMs. Specifically, TextMarker is a new membership inference framework that can eliminate the necessity for additional proxy data and surrogate model training, which are common in traditional membership inference techniques, thereby rendering our proposal significantly more practical and applicable.

8.Network Participation and Accessibility of Proof-of-Stake (PoS) Blockchains: A Cross-platform Comparative Analysis

2305.13259

Authors:Jiseong Noh, Donghwan Kwon, Soohwan Cho, Neo C. K. Yiu

Abstract: The comparative analysis examined eleven Proof-of-Stake (PoS) consensus-based blockchain networks to assess their openness based on five indicative metrics. These metrics include those of decentralization-related aspects, such as the number of validators and capital concentration, and participation-related aspects, including entry capital requirements and economic network stability. This is to assess and characterize the openness of Proof-of-Stake blockchain networks. The analysis suggested that networks with higher openness included Solana and Avalanche, while BNB Chain, Klaytn, and Polygon measured with lower levels of openness. According to the comparative analysis, Ethereum scored high on network openness in terms of the number of participants and the cost of running the chain, but scored relatively low on capital concentration and staking ratio, which is likely due to the low ratio of staked ether (ETH) to circulating supply and the significant stakes in staking pools like Lido. Permissioned blockchains such as Klaytn and Polygon have limited openness, which suggests the need to take the level of openness into account when transitioning into a permissionless blockchain architecture with a more decentralized setting.

9.Data-Centric Machine Learning Approach for Early Ransomware Detection and Attribution

2305.13287

Authors:Aldin Vehabovic, Hadi Zanddizari, Nasir Ghani, Farooq Shaikh, Elias Bou-Harb, Morteza Safaei Pour, Jorge Crichigno

Abstract: Researchers have proposed a wide range of ransomware detection and analysis schemes. However, most of these efforts have focused on older families targeting Windows 7/8 systems. Hence there is a critical need to develop efficient solutions to tackle the latest threats, many of which may have relatively fewer samples to analyze. This paper presents a machine learning(ML) framework for early ransomware detection and attribution. The solution pursues a data-centric approach which uses a minimalist ransomware dataset and implements static analysis using portable executable(PE) files. Results for several ML classifiers confirm strong performance in terms of accuracy and zero-day threat detection.

Cryptography and Security (cs.CR)

Mon, 22 May 2023

1.The "code'' of Ethics:A Holistic Audit of AI Code Generators

Abstract & Audio

2.FGAM:Fast Adversarial Malware Generation Method Based on Gradient Sign

Abstract & Audio

3.Hot Pixels: Frequency, Power, and Temperature Attacks on GPUs and ARM SoCs

Abstract & Audio

4.POSTER: spaceQUIC: Securing Communication in Computationally Constrained Spacecraft

Abstract & Audio

5.FSSA: Efficient 3-Round Secure Aggregation for Privacy-Preserving Federated Learning

Abstract & Audio

6.Analyzing the Shuffle Model through the Lens of Quantitative Information Flow

Abstract & Audio

7.Watermarking Text Data on Large Language Models for Dataset Copyright Protection

Abstract & Audio

8.Network Participation and Accessibility of Proof-of-Stake (PoS) Blockchains: A Cross-platform Comparative Analysis

Abstract & Audio

9.Data-Centric Machine Learning Approach for Early Ransomware Detection and Attribution

Abstract & Audio