The 9th International Workshop on Security and Privacy in Big Data (BigSecurity 2021)

Session BigSecurity-S1

AI Security

9:00 AM — 10:30 AM EDT
May 10 Mon, 9:00 AM — 10:30 AM EDT

Security Vulnerability Detection Using Deep Learning Natural Language Processing

Noah Ziems (Ball State University, USA); Shaoen Wu (Illinois State University, USA)

Detecting security vulnerabilities in software before they are exploited has been a challenging problem for decades. Traditional code analysis methods have been proposed, but are often ineffective and inefficient. In this work, we model software vulnerability detection as a natural language processing (NLP) problem with source code treated as texts, and address the automated software venerability detection with recent advanced deep learning NLP models assisted by transfer learning on written English. For training and testing, we have preprocessed the NIST NVD/SARD databases and built a dataset of over 100,000 files in C programming language with 123 types of vulnerabilities. The extensive experiments generate the best performance of over 93% accuracy in detecting security vulnerabilities.

AutoCEW: An Autonomous Cyberspace Early Warning Framework via Ensemble Learning

Qiang Liu, Yifei Gao, Runhao Liu and Jiayao Wang (National University of Defense Technology, China)

Nowadays, cyberspace is increasingly connected with our daily lives. In the mean time, cyber space is rife with security adversaries due to untrusty entities and connections. Therefore, the use of technical means to alert security threats in their early stages is of great importance. To fill the gap between theoretical and practical works in the field of cybersecurity early warning, we propose an Autonomous Cyberspace Early Warning (AutoCEW) framework via ensemble learning in this paper. Specifically, we design the AutoCEW core ecosystem based on artificial intelligence, and the ecosystem contains three core functions, namely the first-stage threat identification using character features, the second-stage threat identification using statistical features, and cyber threat alerts & early warning. Furthermore, we also implement an AutoCEW prototype system and then demonstrate its good performance in terms of high detecting accuracy and low processing latency over real-world, simulation and synthetic traffic data.

Transferable Adversarial Defense by Fusing Reconstruction Learning and Denoising Learning

Song Gao (Yunnan University, China); Shao Wen Yao (National Pilot School of Software,YunNan University, China); Ruidong Li (National Institute of Information and Communications Technology (NICT), Japan)

Deep neural networks have been demonstrated fragile to adversarial examples. Some powerful defense methods have been proposed. However, these methods usually involve modification in the process of model training, which often require more computational complexity. In this paper, we propose a novel defense method that can be directly applied to unmodified off-the-shelf models. Our method adopts standard denoiser to maintain the original features. But standard denoiser cannot remove adversarial perturbations effectively, which will be progressively amplified by classifier and lead to incorrect classification. Therefore, we add a denoising module into our method, in which the magnified adversarial perturbations are used to guide our approach's training. The proposed method effectively removes adversarial perturbations while maintaining the original characteristics. Consequently, our method has good transferability, it can be reused easily to protect different models after once training. Extensive experiments show that our method has outstanding performances against both white-box and black-box attacks. Especially in protecting different models, our method outperforms the state-of-the-art defenses by a big margin.

Adversarial Machine Learning for Inferring Augmented Cyber Agility Prediction

Eric Muhati and Danda B. Rawat (Howard University, USA)

Security analysts conduct continuous evaluations of cyber-defense tools to keep pace with advanced and persistent threats. To measure defense adaption rate gives a much needed critical cyber agility quantification for proactive security resource adjustments. Applying machine learning to dynamic defense performance metrics has led to successful prediction of cyber agility. Nevertheless, apt and treacherous actors motivated by economic incentives continue to prevail in circumventing machine learning-based protection tools. Adversarial learning, widely applied to computer security, especially intrusion detection, has emerged as a new area of concern for the recently recognized critical cyber-agility prediction. The rationale is, if a sophisticated malicious actor obtains the cyber-agility parameters, correct prediction cannot be guaranteed without a demonstration of white-box attack failures. The challenge lies in recognizing that unconstrained adversaries hold potential cosmic powers. In practice, they could have perfect-knowledge, i.e., a full understanding of the defense tool in use. We address this challenge by proposing an adversarial machine learning approach that achieves accurate cyber-agility forecast through mapped nefarious influence on static defense tools metrics. Considering an adversary would aim at influencing perilous confidence in a defense tool, we demonstrate resilient cyber agility prediction through verified attack signatures in dynamic learning windows. After that, we compare cyber agility prediction under the adversarial influence with and without our proposed dynamic learning windows. Our numerical results show the model's execution degrades without adversarial machine learning. Such a feigned measure of performance could lead to incorrect software security patching.

Automated Fast-flux Detection using Machine Learning and Genetic Algorithms

Sachin Rana and Ahmet Aksoy (University of Central Missouri, USA)

Fast-flux is a technique employed by malicious bots to hide their origin by rapidly changing DNS entries. Although specific features are known to help detect malicious hosts, attackers are becoming more knowledgeable and can spoof these values to make the infected hosts resilient to detection. This paper presents an entirely automated fast-flux detection approach using machine learning and genetic algorithms without expert input. Such an automated approach helps detect the uniqueness in malicious hosts' behavior from their network traffic even when their behavior changes. The presented approach makes fast-flux detection insusceptible to changes in infected hosts as long as a representative dataset is provided, making it more difficult for attackers to hide their hosts. Our approach was able to achieve more than 99% accuracy in classifying benign and malicious hosts.

Session Chair

Anwar Haque (Western Ontario, Canada)

Session BigSecurity-S2

Network Security

11:00 AM — 12:30 PM EDT
May 10 Mon, 11:00 AM — 12:30 PM EDT

A lightweight Data Sharing Scheme with Resisting Key Abuse in Mobile Edge Computing

Jianhong Zhang (College of Sciences, North China University of Technology, China)

To achieve large-scale access control over the shared data, attribute-based encryption(ABE) is a good choice. However, in the existing ABE schemes, a data user which possessing a decryption-key can regenerate a new key since key randomization technique is introduced, which will incur key abuse without any responsibility. In addition, to decrypt the ciphertext, computational complexity of a user is linear to the size of attribute set, it is a formidable challenge for the resource-constrained users. To overcome the problem above, we proposed a lightweight data sharing scheme with Resisting Key Abuse in MEC base on CP-ABE. By using transforming key technique and unforgeability of signature, the proposed scheme can not only resist decryption-key regeneration but also offload decryption computation to MEC server in order to reduce the computation complexity of data user. For a data user, it only takes two exponential operations to decrypt the ciphertext. Security proofs show that our proposed scheme can provide data confidentiality and strong key unforgeability. Compared to several schemes, the proposed scheme is show to have more advantages in terms of computational cost and communication overhead by experiment simulation.

Secure Decentralized Access Control Policy for Data Sharing in Smart Grid

Yadi Ye and Leyou Zhang (Xidian University, China); Yi Mu (City University of Macau, China); Wenting You (Xidian University, China)

Smart grid has improved the security, efficiency of the power system and balanced the supply and demand by intelligent management, which enhanced stability and reliability of power grid. The key point to achieve them is real-time data and consume data sharing by using fine-grained policies. But it will bring the leakage of the privacy of the users and losing of control over data for data owners. The reported solutions can not give the best trade-off among the privacy protection, control over the data shared and confidentiality. In addition, they can not solve the problems of large computation overhead and dynamic management such as users' revocation. This paper aims at these problems and proposes a decentralized attribute-based data sharing scheme. The proposed scheme ensures the secure sharing of data while removing the central authority and hiding user's identity information. It uses attribute-based signcryption (ABSC) to achieve data confidentiality and authentication. Under this model, attribute-based encryption gives the access policies for users and keeps the data confidentiality, and the attribute-based signature is used for authentication of the primary ciphertext-integrity. In addition, the proposed scheme enables user's revocation and public verifiability. Under the random oracle model, the security and the unforgeability against adaptive chosen message attack are demonstrated.

Real-time Packet Loss Detection for TCP and UDP Based on Feature-Sketch

Hua Wu, Ya Liu, Guang Cheng and Xiaoyan Hu (Southeast University, China)

Nowadays, networks are often impaired by cyber attacks, which leads to network quality of service degradation. Packet loss is one of the essential and concerning symptoms during these attacks. And thus the real-time detection of packet loss is conducive to network anomaly monitoring. Existing passive packet loss detection methods mainly study the packet loss for TCP using header information and few focus on that of UDP due to its limited header information. Besides, such Deep Packet Inspection (DPI) based packet loss detection is resource consuming and impractical for high-speed network. To address these problems, we propose a novel framework called LossDetection based on packet sampling and Feature-Sketch to detect packet loss in real-time for both TCP and UDP. The Feature-Sketch analyzes ongoing packet flow to extract bidirectional packet-type-based and payload-length-based features using 13 counters for TCP and 8 counters for UDP with constant memory consumption. The feature set was trained on Random Forest (RF) model and eXtreme Gradient Boosting (XGB) model to construct the relationship between these features and the packet loss pattern. The result shows that our methodology can detect packet loss in real-time with an accuracy of 95%-97% even at a sampling rate of 1/256.

Towards Secure Communication in CR-VANETs Through a Trust-Based Routing Protocol

Sharmin Akter and Mohammad Shahriar Rahman (University of Liberal Arts Bangladesh, Bangladesh); Md Zakirul Alam Bhuiyan (Fordham University, USA); Nafees Mansoor (University of Liberal Arts Bangladesh, Bangladesh)

Cognitive Radio Networks (CRNs) promise efficient spectrum utilization by operating over the unused frequencies where Vehicular Ad-hoc Networks (VANETs) facilitate information exchanging among vehicles to avoid accidents, collisions, congestion, etc. Thus, CR enabled vehicular networks (CR-VANETs), a thriving area in wireless communication research, can be the enabler of Intelligent Transportation Systems (ITS) and autonomous driver-less vehicles. Similar to others, efficient and reliable communication in CR-VANETs is vital. Besides, security in such networks may exhibit unique characteristics for overall data transmission performance. For efficient and reliable communication, the proposed routing protocol considers the mobility patterns, spectrum availability, and trustworthiness to be the routing metrics. Hence, the protocol considers the vehicle's speed, mobility direction, inter-vehicles distance, and node's reliability to estimate the mobility patterns of a node. Besides, a trust-based reliability factor is also introduced to ensure secure communications by detecting malicious nodes or other external threats. Therefore, the proposed protocol detects malicious nodes by establishing trustworthiness among nodes and preserves security. Simulation is conducted for performance evaluation that shows the proposed routing selects the efficient routing path by discarding malicious nodes from the network and outperforms the existing routing protocols.

A system for detecting third-party tracking through the combination of dynamic analysis and static analysis

Jing Sun (Xidian University, China); Huang Zhiqiu (University of Chinese Academy of Sciences, China); Ting Yang (Hebei University of Science & Technology, China); Wenjie Wang (University of Chinese of Academy of Sciences, China); Zhang Yuqing (University of Chinese Academy of Sciences, China)

With the continuous development of Internet technology, people pay more and more attention to private security. In particular, third-party tracking is a major factor affecting privacy security. So far, the most effective way to prevent third-party tracking is to create a blacklist. However, blacklist generation and maintenance need to be carried out manually which is inefficient and difficult to maintain. In order to generate blacklists more quickly and accurately in this era of big data, this paper proposes a machine learning system MFTrackerDetector against third-party tracking. The system is based on the theory of structural hole and only detects third-party trackers. The system consists of two subsystems, DMTrackerDetector and DFTrackerDetector. DMTrackerDetector is a JavaScript-based subsystem and DFTrackerDetector is a Flash-based subsystem. Because tracking code and non-tracking code often call different APIs, DMTrackerDetector builds a classifier using all the APIs in JavaScript as features and extracts the API features in JavaScript through dynamic analysis. Unlike static analysis method, the dynamic analysis method can effectively avoid code obfuscation. DMTrackerDetector eventually generates a JavaScript-based third-party tracker list named Jlist. DFTrackerDetector constructs a classifier using all the APIs in ActionScript as features and extracts the API features in the flash script through static analysis. DFTrackerDetector finally generates a Flash-based third-party tracker list named Flist. DFTrackerDetector achieved 92.98% accuracy in the Flash test set and DMTrackerDetector achieved 90.79% accuracy in the JavaScript test set. MFTrackerDetector eventually generates a list of third-party trackers, which is a combination of Jlist and Flist.

Session Chair

Mohammad Mahdi (University of Delaware, USA)

Session BigSecurity-S3

AI Privacy

1:30 PM — 3:00 PM EDT
May 10 Mon, 1:30 PM — 3:00 PM EDT

Entering Watch Dogs: Evaluating Privacy Risks Against Large-Scale Facial Search and Data

Bahadir Durmaz (Bilkent University, Turkey); Erman Ayday (Case Western Reserve University, USA)

Discovering friends on online platforms have become relatively easier with the introduction of contact discovery and ability to search using phone numbers. Such features conveniently connect users by acting as unique tokens across platforms, as opposed to other attributes, such as user names. Using this feature, in this work, one of our contributions is to explore how an attacker can easily create a massive dataset of individuals residing in a given region (e.g., country) that includes high amount of personal information about such individuals. To identify the active social network accounts of individuals in a given region, we show that brute force phone number verification is possible in popular online services, such as WhatsApp, Facebook Messenger, and Twitter. We also go beyond and show the feasibility of collecting several data points on discovered accounts, including multiple facial data belonging to each account owner along with 23 other attributes. Then, as our main contribution, we quantify the privacy risk for an attacker linking a total stranger (e.g., someone it randomly comes across in public) to one of the collected records via facial features. Our results show that accurate facial search is possible in the constructed dataset and that an attacker can link a randomly taken photo (i.e., a single facial photo) of an individual to their profile with 67% accuracy. This means that an attacker can, on a large scale, create a search engine that is capable of identifying individuals' records efficiently and accurately from just a single facial photo.

LPDBN: A Privacy Preserving Scheme for Deep Belief Network

Yong Zeng, Dong Tong, Qingqi Pei, Jiale Liu and Jianfeng Ma (Xidian University, China)

In recent years, the successful applications of deep learning technology bring serious privacy issues. Current countermeasures can achieve privacy protection by introducing differential privacy mechanisms to convolutional deep belief network, which will inevitably bring huge computational complexity in convolutional kernels. In this paper, we focus on designing a lightweight security privacy protection framework LPDBN, a novel Local differential Privacy binary pattern Deep Belief Network. We use the local binary pattern to extract texture information from the images instead of convolutional kernels, which greatly reduces the time complexity and data dimension. Meanwhile, the proposed framework can improve recognition performance under the same privacy protection intensity. The theorem analysis and experiments show the security and efficiency respectively.

POI Recommendation with Federated Learning and Privacy Preserving in Cross Domain Recommendation

Li-e Wang and Yihui Wang (Guangxi Normal University, China); Yan Bai (University of Washington Tacoma, USA); Peng Liu and Xianxian Li (Guangxi Normal University, China)

Point-of-Interest (POI) recommendation is one of the most popular recommendation methodologies. However, POI data is very sensitive and sparse. Users' reluctance to share their context information due to privacy concerns, along with the cold-start problem caused by data sparsity reduces recommendation efficiency. To address these issues, we propose a POI framework for cross-domain recommendation with federated learning and privacy protection features. It utilizes data in an auxiliary domain in users' interest analysis to alleviate the cold-start problem. Moreover, it applies federated learning by analyzing the users' historical data locally and encrypts latent feature distribution for knowledge migration to protect users' privacy. Experiments on real datasets have shown that our framework improves recommendation accuracy while preserving users' privacy as compared to convolutional neural network-based methods when analyzing users' comments.

ACTracker: A Fast and Efficient Attack Investigation Method Based on Event Causality

Erteng Hu and Anmin Fu (Nanjing University of Science and Technology, China); Zhiyi Zhang, Linjie Zhang, Yantao Guo and Yin Liu (Science and Technology on Communication Networks Laboratory, China)

The emerging advanced persistent threats (APT) have become a significant threat to enterprise network security. Carrying out the attack's causality analysis can help the cyber analyst understand the APT attack process and safely recover the system from the attack. How to quickly perform an efficient causality analysis and generate an attack dependency graph that is easy for analysts to understand has become a problem. In this paper, we propose ACTracker, a fast and efficient attack causality tracker. Firstly, the tracker generates a complete provenance graph based on threat alert and then calculates each provenance path's anomaly score based on the anomaly score of each event. ACTracker quickly constructs a dependency graph describing the causality of attacks by considering the anomaly degree of each provenance path in the provenance graph. We also design a novel statistical method of event frequency to adapt to different scales of corporate network environments and assign anomaly scores to each event based on the event's rarity in the current environment. We evaluate our system by simulating a variety of real-world attacks. The experimental results show that our solution can effectively track attack activities in a short time.

A Novel Negative and Positive Selection Algorithm to Detect Unknown Malware in the IoT

Hadeel S Alrubayyi, Gokop Goteng, Mona Jaber and James Kelly (Queen Mary University of London, United Kingdom (Great Britain))

The Internet of Things (IoT) paradigm is a key enabler to many critical applications, thus demands reliable security measures. IoT devices have limited computational power, hence, are inadequate to carry rigorous security mechanisms. This paper proposes the Negative-Positive-Selection (NPS) method which uses an artificial immunity system technique for malware detection. NPS is suitable for the computation restrictions and security challenges associated with IoT. The performance of NPS is benchmarked against state-of-the-art malware detection schemes using a real dataset. Our results show a 21\% improvement in malware detection and a 65\% reduction in the number of detectors. NPS meets IoT-specific requirements as it outperforms other malware detection mechanisms whilst having less demanding computational requirements.

Session Chair

Yan Bai (University of Washington Tacoma, USA)

Session BigSecurity-S4

Network Privacy

3:30 PM — 5:00 PM EDT
May 10 Mon, 3:30 PM — 5:00 PM EDT

Topology-Theoretic Approach To Address Attribute Linkage Attacks In Differential Privacy

Jincheng Wang and Zhuohua Li (Chinese University of Hong Kong, Hong Kong); Mingshen Sun (Baidu Security, USA); John Chi Shing Lui (Chinese University of Hong Kong, Hong Kong)

Differential Privacy (DP) is well-known for its strong privacy guarantee. In this paper, we show that when there are correlations among attributes in the dataset, only relying on DP is not sufficient to defend against the "attribute linkage attack", which is a well-known privacy attack aiming at deducing participant's attribute information. Our contributions are first, we show that the attribute linkage attack can be initiated with high probability even when data are protected under DP. Second, we propose an enhanced DP standard. Third, by leveraging on topology theory, we design an algorithm "APLKiller" which satisfies this standard. Finally, experiments show that our algorithm not only eliminates the attribute linkage attack, but also achieves better data utility.

Trading Privacy through Randomized Response

Mohammad Mahdi Khalili (University of Delaware, USA); Iman Vakilinia (University of North Florida, USA)

Personal information is valuable for organizations and companies providing targeted and customized services. On the other hand, data owners are not willing to share their private information (e.g., spending habits and monthly purchases) due to privacy concerns. In order to incentivize the data owners to share their data, the organizations have to pay each data owner adequate compensation. In this context, privacy can be considered a personal commodity: the data owners may share their personal information if the organizations pay them sufficiently. In this paper, we consider a mechanism design problem between a data buyer (e.g., a company) and multiple data owners. In the mechanism design problem, the buyer is willing to make a payment for a desired level of accuracy and data quality, and the data owners will release their information after receiving a sufficient amount of compensation. To measure the privacy guarantee of an algorithm, we use the concept of differential privacy and use the randomized response algorithm to generate differentially private data. In contrast to existing works that study a mechanism design problem for generating a single differentially private linear query through the Laplace mechanism, we consider a more general scenario. In particular, the number of queries or the type of query does not affect our incentive mechanism, and our framework is able to generate any type of query without any limitation on how many times the buyer asks for a query.

Toward Automatically Generating Privacy Policy for Smart Home Apps

Youqun Li, Y. Zhang, Haojin Zhu and Suguo Du (Shanghai Jiao Tong University, China)

Modern Smart Home platforms offer various applications, which should follow the platform privacy policies so that end users and regulators are informed of Sensitive Personal Information (SPI) related operations. However, the generalized privacy policies by Smart Home platforms fail to explain specific SPI related operations for individual applications. Meanwhile, according to previous works, potential SPI leaks may occur due to insufficient surveillance. In this paper, we propose the first system to automatically generate fine-grained privacy policies for individual applications through static code analysis and natural language techniques. First, from the code we extract the control flow graph and the SPI data flows. Then, we use a Naive Bayes model to transfer the data flows into verb-object phrases. Finally, we populate a pre-prepared privacy policy template with the previously generated phrases. We evaluate our system on Samsung SmartThings platform. The experimental results show that: 1) Our system can accurately extract SPI related operations from Smart Home applications; 2) The privacy policies created by our system are fine-grained and easily understandable; 3) We demonstrate the efficacy of the proposed system on a real world data-set of almost 250 apps.

Evaluating Contact Tracing Apps for Privacy Preservation and Effectiveness

Rashed Nekvi (University of Western Ontario, Canada); Anwar Haque (Western Ontario, Canada)

During the COVID-19 pandemic, digital technology has been massively investigated to automate and augment the traditional manual Contact Tracing (CT) process. This results in a variety of smartphone-based CT apps, symptom tracking tools, dashboards, and analytical tools to support the comprehensive digital CT program, where CT apps act as the central instrument. However, the variety of digital CT solutions, especially the CT apps, contain different levels of privacy threat and technical capabilities that are directly linked with effectiveness. Sometimes, the choice of a particular method of the digital CT has an opposite effect to privacy and effectiveness, only to adding uncertainty to decision making regarding digital CT. In this paper, we present a novel approach to quantitatively pre-assess CT apps on privacy preservation and effectiveness. This assessment of CT apps would facilitate developing a shared understanding among policymakers. We also implemented a digital evaluation toolkit and show evaluation of CT apps of a variety of specifications using our toolkit. Our research aims to help addressing the issues of selecting the appropriate digital CT currently faced by national policymakers around the world.

Using Dynamic Analysis to Automatically Detect Anti-Adblocker on the Web

Jing Sun (Xidian University, China); Huang Zhiqiu (University of Chinese Academy of Sciences, China); Ting Yang (Hebei University of Science & Technology, China); Wenjie Wang (University of Chinese of Academy of Sciences, China); Zhang Yuqing (University of Chinese Academy of Sciences, China)

With the continuous development of Internet technology, there are more and more advertisements on the website and some of them can track and monitor users. To avoid the leakage of privacy information, many people are using adblockers to remove advertisements on web pages. This behavior of filtering advertisements seriously threatens the benefits of online publishers, and they have begun to detect and counterattack users who use adblockers. Previous work focused on detecting and filtering anti-adblockers to fight against the counterattacks of online publishers. So far, one of the most effective ways to prevent anti-adblockers is to create a blacklist. However, the generation and maintenance of blacklists involve considerable manual work, which is inefficient and difficult to maintain. This paper proposes a machine learning anti-adblocker detection system called ABDetector, which is the first system that can automatically generate blacklists of anti-adblockers. This system greatly reduces the manual workload. Since the difference of code with and without anti-adblocking detection behavior mainly lies in that they call different APIs, thus, we for the first time tried to build a classifier of anti-adblockers using JavaScript APIs as features and use dynamic analysis method to extract features. Unlike static analysis method, the dynamic analysis method can effectively avoid code obfuscation. The accuracy of ABDetector on the test set is 81.46%.

Session Chair

Iman Vakilinia (University of North Florida, USA)

Made with in Toronto · Privacy Policy · INFOCOM 2020 · © 2021 Duetone Corp.