arXiv daily

Cryptography and Security (cs.CR)

Wed, 09 Aug 2023

Other arXiv digests in this category:Thu, 14 Sep 2023; Wed, 13 Sep 2023; Tue, 12 Sep 2023; Mon, 11 Sep 2023; Fri, 08 Sep 2023; Tue, 05 Sep 2023; Fri, 01 Sep 2023; Thu, 31 Aug 2023; Wed, 30 Aug 2023; Tue, 29 Aug 2023; Mon, 28 Aug 2023; Fri, 25 Aug 2023; Thu, 24 Aug 2023; Wed, 23 Aug 2023; Tue, 22 Aug 2023; Mon, 21 Aug 2023; Fri, 18 Aug 2023; Thu, 17 Aug 2023; Wed, 16 Aug 2023; Tue, 15 Aug 2023; Mon, 14 Aug 2023; Fri, 11 Aug 2023; Thu, 10 Aug 2023; Tue, 08 Aug 2023; Mon, 07 Aug 2023; Fri, 04 Aug 2023; Thu, 03 Aug 2023; Wed, 02 Aug 2023; Tue, 01 Aug 2023; Mon, 31 Jul 2023; Fri, 28 Jul 2023; Thu, 27 Jul 2023; Wed, 26 Jul 2023; Tue, 25 Jul 2023; Mon, 24 Jul 2023; Fri, 21 Jul 2023; Thu, 20 Jul 2023; Wed, 19 Jul 2023; Tue, 18 Jul 2023; Mon, 17 Jul 2023; Fri, 14 Jul 2023; Thu, 13 Jul 2023; Wed, 12 Jul 2023; Tue, 11 Jul 2023; Mon, 10 Jul 2023; Fri, 07 Jul 2023; Thu, 06 Jul 2023; Wed, 05 Jul 2023; Tue, 04 Jul 2023; Mon, 03 Jul 2023; Fri, 30 Jun 2023; Thu, 29 Jun 2023; Wed, 28 Jun 2023; Tue, 27 Jun 2023; Mon, 26 Jun 2023; Fri, 23 Jun 2023; Thu, 22 Jun 2023; Wed, 21 Jun 2023; Tue, 20 Jun 2023; Fri, 16 Jun 2023; Thu, 15 Jun 2023; Tue, 13 Jun 2023; Mon, 12 Jun 2023; Fri, 09 Jun 2023; Thu, 08 Jun 2023; Wed, 07 Jun 2023; Tue, 06 Jun 2023; Mon, 05 Jun 2023; Fri, 02 Jun 2023; Thu, 01 Jun 2023; Wed, 31 May 2023; Tue, 30 May 2023; Mon, 29 May 2023; Fri, 26 May 2023; Thu, 25 May 2023; Wed, 24 May 2023; Tue, 23 May 2023; Mon, 22 May 2023; Fri, 19 May 2023; Thu, 18 May 2023; Wed, 17 May 2023; Tue, 16 May 2023; Mon, 15 May 2023; Fri, 12 May 2023; Thu, 11 May 2023; Wed, 10 May 2023; Tue, 09 May 2023; Mon, 08 May 2023; Fri, 05 May 2023; Thu, 04 May 2023; Wed, 03 May 2023; Tue, 02 May 2023; Mon, 01 May 2023; Fri, 28 Apr 2023; Thu, 27 Apr 2023; Wed, 26 Apr 2023; Tue, 25 Apr 2023; Mon, 24 Apr 2023; Fri, 21 Apr 2023; Thu, 20 Apr 2023; Wed, 19 Apr 2023; Tue, 18 Apr 2023; Mon, 17 Apr 2023; Fri, 14 Apr 2023; Thu, 13 Apr 2023; Wed, 12 Apr 2023; Tue, 11 Apr 2023; Mon, 10 Apr 2023
1.Communication-Efficient Search under Fully Homomorphic Encryption for Federated Machine Learning

Authors:Dongfang Zhao

Abstract: Homomorphic encryption (HE) has found extensive utilization in federated learning (FL) systems, capitalizing on its dual advantages: (i) ensuring the confidentiality of shared models contributed by participating entities, and (ii) enabling algebraic operations directly on ciphertexts representing encrypted models. Particularly, the approximate fully homomorphic encryption (FHE) scheme, known as CKKS, has emerged as the de facto encryption scheme, notably supporting decimal numbers. While recent research predominantly focuses on enhancing CKKS's encryption rate and evaluation speed in the context of FL, the search operation has been relatively disregarded due to the tendency of some applications to discard intermediate encrypted models. Yet, emerging studies emphasize the importance of managing and searching intermediate models for specific applications like large-scale scientific computing, necessitating robust data provenance and auditing support. To address this, our paper introduces an innovative approach that efficiently searches for a target encrypted value, incurring only a logarithmic number of network interactions. The proposed method capitalizes on CKKS's additive and multiplicative properties on encrypted models, propagating equality comparisons between values through a balanced binary tree structure to ultimately reach a single aggregate. A comprehensive analysis of the proposed algorithm underscores its potential to significantly broaden FL's applicability and impact.

2.VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model

Authors:Tianyu Chen, Lin Li, Liuchuan Zhu, Zongyang Li, Guangtai Liang, Ding Li, Qianxiang Wang, Tao Xie

Abstract: To avoid potential risks posed by vulnerabilities in third-party libraries, security researchers maintain vulnerability databases (e.g., NVD) containing vulnerability reports, each of which records the description of a vulnerability and the name list of libraries affected by the vulnerability (a.k.a. vulnerable libraries). However, recent studies on about 200,000 vulnerability reports in NVD show that 53.3% of these reports do not include the name list of vulnerable libraries, and 59.82% of the included name lists of vulnerable libraries are incomplete or incorrect. To address the preceding issue, in this paper, we propose the first generative approach named VulLibGen to generate the name list of vulnerable libraries (out of all the existing libraries) for the given vulnerability by utilizing recent enormous advances in Large Language Models (LLMs), in order to achieve high accuracy. VulLibGen takes only the description of a vulnerability as input and achieves high identification accuracy based on LLMs' prior knowledge of all the existing libraries. VulLibGen also includes the input augmentation technique to help identify zero-shot vulnerable libraries (those not occurring during training) and the post-processing technique to help address VulLibGen's hallucinations. We evaluate VulLibGen using three state-of-the-art/practice approaches (LightXML, Chronos, and VulLibMiner) that identify vulnerable libraries on an open-source dataset (VulLib). Our evaluation results show that VulLibGen can accurately identify vulnerable libraries with an average F1 score of 0.626 while the state-of-the-art/practice approaches achieve only 0.561. The post-processing technique helps VulLibGen achieve an average improvement of F1@1 by 9.3%. The input augmentation technique helps VulLibGen achieve an average improvement of F1@1 by 39% in identifying zero-shot libraries.

3.SSL-Auth: An Authentication Framework by Fragile Watermarking for Pre-trained Encoders in Self-supervised Learning

Authors:Xiaobei Li, Changchun Yin, Liming Fang, Run Wang, Chenhao Lin

Abstract: Self-supervised learning (SSL) which leverages unlabeled datasets for pre-training powerful encoders has achieved significant success in recent years. These encoders are commonly used as feature extractors for various downstream tasks, requiring substantial data and computing resources for their training process. With the deployment of pre-trained encoders in commercial use, protecting the intellectual property of model owners and ensuring the trustworthiness of the models becomes crucial. Recent research has shown that encoders are threatened by backdoor attacks, adversarial attacks, etc. Therefore, a scheme to verify the integrity of pre-trained encoders is needed to protect users. In this paper, we propose SSL-Auth, the first fragile watermarking scheme for verifying the integrity of encoders without compromising model performance. Our method utilizes selected key samples as watermark information and trains a verification network to reconstruct the watermark information, thereby verifying the integrity of the encoder. By comparing the reconstruction results of the key samples, malicious modifications can be effectively detected, as altered models should not exhibit similar reconstruction performance as the original models. Extensive evaluations on various models and diverse datasets demonstrate the effectiveness and fragility of our proposed SSL-Auth.

4.Data-Driven Intelligence can Revolutionize Today's Cybersecurity World: A Position Paper

Authors:Iqbal H. Sarker, Helge Janicke, Leandros Maglaras, Seyit Camtepe

Abstract: As cyber threats evolve and grow progressively more sophisticated, cyber security is becoming a more significant concern in today's digital era. Traditional security measures tend to be insufficient to defend against these persistent and dynamic threats because they are mainly intuitional. One of the most promising ways to handle this ongoing problem is utilizing the potential of data-driven intelligence, by leveraging AI and machine learning techniques. It can improve operational efficiency and saves response times by automating repetitive operations, enabling real-time threat detection, and facilitating incident response. In addition, it augments human expertise with insightful information, predictive analytics, and enhanced decision-making, enabling them to better understand and address evolving problems. Thus, data-driven intelligence could significantly improve real-world cybersecurity solutions in a wide range of application areas like critical infrastructure, smart cities, digital twin, industrial control systems and so on. In this position paper, we argue that data-driven intelligence can revolutionize the realm of cybersecurity, offering not only large-scale task automation but also assist human experts for better situation awareness and decision-making in real-world scenarios.

5.A Feature Set of Small Size for the PDF Malware Detection

Authors:Ran Liu, Charles Nicholas

Abstract: Machine learning (ML)-based malware detection systems are becoming increasingly important as malware threats increase and get more sophisticated. PDF files are often used as vectors for phishing attacks because they are widely regarded as trustworthy data resources, and are accessible across different platforms. Therefore, researchers have developed many different PDF malware detection methods. Performance in detecting PDF malware is greatly influenced by feature selection. In this research, we propose a small features set that don't require too much domain knowledge of the PDF file. We evaluate proposed features with six different machine learning models. We report the best accuracy of 99.75% when using Random Forest model. Our proposed feature set, which consists of just 12 features, is one of the most conciseness in the field of PDF malware detection. Despite its modest size, we obtain comparable results to state-of-the-art that employ a much larger set of features.