Session 6

Cloud & Storage

Conference
10:10 AM — 11:20 AM JST
Local
Jun 25 Fri, 9:10 PM — 10:20 PM EDT

Towards Private Similarity Query based Healthcare Monitoring over Digital Twin Cloud Platform

Yandong Zheng, Rongxing Lu, Yunguo Guan and Songnian Zhang (University of New Brunswick, Canada); Jun Shao (Zhejiang Gongshang University, China)

0
As the growing proportion of aging population, the demand for sustainable, high quality, and timely healthcare services has become increasingly pressing, especially since the outbreak of COVID-19 pandemic in the early of 2020. To meet this demand, a promising strategy is to introduce cloud computing and digital twin techniques into the healthcare systems, where the cloud server is employed for storing healthcare data and offering efficient query services, and the digital twin is used for building digital representation for patients and leverages the query services of the cloud server to monitor healthcare states of patients. Although several cloud computing and digital twin based healthcare monitoring frameworks have been proposed, none of them has considered the data privacy issue, yet the leakage of the private healthcare information may cause catastrophic losses to patients. Aiming at the challenge, in this paper, we propose an efficient and privacy-preserving similarity query based healthcare monitoring scheme over digital twin cloud platform, named PSim-DTH. Specifically, we first formalize a similarity query based healthcare monitoring model over digital twin cloud platform. Then, we deploy a partition based tree (PB-tree) to index the healthcare data and introduce matrix encryption to propose a privacy-preserving PB-tree based similarity range query (PSRQ) algorithm. Based on PSRQ algorithm, we propose our PSim-DTH scheme. Both security analysis and performance evaluation are extensively conducted, and the results demonstrate that our proposed PSim-DTH scheme is really privacy-preserving and efficient.

Secure Outsourced Top-$k$ Selection Queries against Untrusted Cloud Service Providers

Xixun Yu (Xidian University, China & University of Delaware, USA); Yidan Hu and Rui Zhang (University of Delaware, USA); Zheng Yan (Xidian University & Aalto University, China); Yanchao Zhang (Arizona State University, USA)

0
As cloud computing reshapes the global IT industry, an increasing number of business owners have outsourced their datasets to third-party cloud service providers (CSP), which in turn answer data queries from end users on their behalf. A well known security challenge in data outsourcing is that the CSP cannot be fully trusted, which may return inauthentic or unsound query results for various reasons. This paper considers top-$k$ selection queries, an important type of queries widely used in practice. In a top-$k$ selection query, a user specifies a scoring function and asks for the $k$ objects with the highest scores. Despite several recent efforts, existing solutions can only support a limited range of scoring functions with explicit forms known in advance. This paper presents three novel schemes that allow a user to verify the integrity and soundness of any top-$k$ selection query result returned by an untrusted CSP. The first two schemes support monotone scoring functions, and the third scheme supports scoring functions comprised of both monotonically non-decreasing and non-increasing subscoring functions. Detailed simulation studies using a real dataset confirm the efficacy and efficiency of the proposed schemes and their significant advantages over prior solutions.

EC-Scheduler: A Load-Balanced Scheduler to Accelerate the Straggler Recovery for Erasure Coded Storage Systems

Xinzhe Cao, Gen Yang, Yunfei Gu and Chentao Wu (Shanghai Jiao Tong University, China); Jie Li (Shanghai Jiaotong University, China); Guangtao Xue and Minyi Guo (Shanghai Jiao Tong University, China); Yuanyuan Dong and Yafei Zhao (Alibaba Group, China)

0
Erasure codes (EC) have become a typical technology for distributed storage systems in place of data replication, providing similar data availability but lower storage cost. However, a great number of data computations and migrations during the EC recovery process bring high I/O and network latency penalties. Although several EC recovery methods have been designed to compromise the recovery penalty with high parallelism, the performance of these schemes was usually bounded by the straggler problems due to the various (I/O) performance among different nodes in the storage system. Moreover, the variation of the access popularity from the upper layer application causes the dynamic load fluctuation and asymmetry upon different nodes, which makes the scheduling more difficult during the recovery.

To address the above problem, we propose a dynamic load-balanced scheduling algorithm for straggler recovery called EC-Scheduler. EC-Scheduler adjusts the recovery schedule dynamically with the awareness of continuous load fluctuation on the nodes, guaranteeing high parallelism and load balance ability simultaneously. To demonstrate the effectiveness of EC-Scheduler, we conduct several experiments in a cluster. The results show that, compared to typical recovery schemes such as Fast-PR and EC-Store, EC-Scheduler could achieve a 1.3X speed-up in the recovery process and 10X improvement in recovery load imbalance factor.

A Proactive Failure Tolerant Mechanism for SSDs Storage Systems based on Unsupervised Learning

Hao Zhou, Zhiheng Niu, Gang Wang and Xiaoguang Liu (Nankai University, China); Dongshi Liu, Bingnan Kang, Hu Zheng and Yong Zhang (Huawei, China)

0
As a proactive failure tolerant mechanism in large scale cloud storage systems, drive failure prediction can be used to protect data by early warning before real failures of drives, and therefore improve system dependability and cloud storage service quality. At present, solid state drives (SSDs) are generally widely used in cloud storage systems due to their high performance. SSD failures seriously affect the dependability of the system and the quality of service. Existing proactive failure tolerant mechanisms for storage systems are basically aimed at HDD failure detection and use classification technology (Supervised learning), which relies on enough failure data to establish a classification model. However, the low failure rate of SSDs leads to a serious imbalance in the ratio of positive and negative samples, which brings a big challenge for establishing a proactive failure tolerance mechanism for SSDs storage systems by using classification technology.

In this paper, we propose a proactive failure tolerance mechanism for SSDs storage systems based on unsupervised technology. It only uses data of normal SSDs to train the failure prediction model, which means that our method is not limited by the imbalance in SSDs data. At the core of our method is the idea to use VAE-LSTM to learn the pattern of normal SSDs, in which case faulty SSDs can be alerted when their patterns are very different from normal ones. Our method can provide early warning of failures, thereby effectively protecting data and improving the quality of cloud storage service. We also propose a drive failure cause location mechanism, which can help operators analyze the modes of failure by providing guiding suggestions. In order to evaluate the effectiveness of our method, we use cross-validation and online testing methods on SSDs data from a technology company. The results show that the FDR and FAR of our method outperform the baselines by 17.25% and 2.39% on average.

Session Chair

Claudio Righetti, Telecom Argentina, Argentina

Session 7

System & Memory

Conference
1:40 PM — 2:50 PM JST
Local
Jun 26 Sat, 12:40 AM — 1:50 AM EDT

Supporting Flow-Cardinality Queries with O(1) Time Complexity in High-speed Networks

Qingjun Xiao (SouthEast University of China, China); Xiongqin Hu (Southeast University, China); Shigang Chen (University of Florida, USA)

0

Practical Root Cause Localization for Microservice Systems via Trace Analysis

Zeyan Li (Tsinghua University, China); Junjie Chen (Tianjin University, China); Rui Jiao and Nengwen Zhao (Tsinghua University, China); Zhijun Wang, Shuwei Zhang, Yanjun Wu, Long Jiang and Leiqin Yan (China Minsheng Bank, China); Zikai Wang (Bizseer, China); Zhekang Chen (BizSeer, China); Wenchi Zhang (Bizseer Technology Co., Ltd., China); Xiaohui Nie (Tsinghua University, China); Kaixin Sui (Bizseer Technology Co., Ltd., China); Dan Pei (Tsinghua University, China)

0
Microservice architecture is applied by an increasing number of systems because of its benefits on delivery, scalability, and autonomy. It is essential but challenging to localize root-cause microservices promptly when a fault occurs. Traces are helpful for root-cause microservice localization, and thus many recent approaches utilize them. However, these approaches are less practical due to relying on supervision or other unrealistic assumptions. To overcome their limitations, we propose a more practical root-cause microservice localization approach named TraceRCA. The key insight of TraceRCA is that a microservice with more abnormal and less normal traces passing through it is more likely to be the root cause. Based on it, TraceRCA is composed of trace anomaly detection, suspicious microservice set mining and microservice ranking. We conducted experiments on hundreds of injected faults in a widely-used open-source microservice benchmark and a production system. The results show that TraceRCA is effective in various situations. The top-1 accuracy of TraceRCA outperforms the state-of-the-art unsupervised approaches by 44.8%. Besides, TraceRCA is applied in a large commercial bank, and it helps operators localize root causes for real-world faults accurately and efficiently. We also share some lessons learned from our real-world deployment.

Load Balancing in Heterogeneous Server Clusters: Insights From a Product-Form Queueing Model

Mark van der Boor and Céline Comte (Eindhoven University of Technology, The Netherlands)

0
Efficiently exploiting servers in data centers requires performance analysis methods that account not only for the stochastic nature of demand but also for server heterogeneity. Although several recent works proved optimality results for heterogeneity-aware variants of classical load-balancing algorithms in the many-server regime, we still lack a fundamental understanding of the impact of heterogeneity on performance in finite-size systems. In this paper, we consider a load-balancing algorithm that leads to a product-form queueing model and can therefore be analyzed exactly even when the number of servers is finite. We develop new analytical methods that exploit its product-form stationary distribution to understand the joint impact of the speeds and buffer lengths of servers on performance. These analytical results are supported and complemented by numerical evaluations that cover a large variety of scenarios.

AIR: An AI-based TCAM Entry Replacement Scheme for Routers

Yuchao Zhang and Peizhuang Cong (Beijing University of Posts and Telecommunications, China); Bin Liu (Tsinghua University, China); Wendong Wang (Beijing University of Posts and Telecommunications, China); Ke Xu (Tsinghua University, China)

0
Ternary Content Addressable Memory (TCAM) is an important hardware used to store route entries in routers, which is used to assist routers to make fast decision on forwarding packets. In order to cope with the explosion of route entries due to massive IP terminals brought by 5G and the Internet of Things (IoT), today's commercial TCAM has to keep the corresponding growth in capacity. But large TCAM capacity is causing many problems such as circuit design difficulties, production costs, and high energy consumption, so it is urgent to design a lightweight TCAM with small capacity while still maintains the original query performance. Designing such a TCAM faces two fundamental challenges. Firstly, it is essential to accurately predict the incoming flows in order to cache correct entries in limited TCAM capacity, but prediction on aggregated time-sequential data is challenging in the massive IoT scenarios. Secondly, the prediction algorithm needs to be real-time as the lookup process is in line-rate. In order to address the above two challenges, in this paper, we proposed a lightweight AI-based solution, called AIR, where we successfully decoupled the route entries and designed a parallel-LSTM prediction method. The experiment results under real backbone traffic showed that we successfully achieved comparable query performance by using just 1/8 TCAM size.

Session Chair

Jun Li, City University of New York, USA

Session 8

Traffic Analytics & Engineering

Conference
3:00 PM — 4:10 PM JST
Local
Jun 26 Sat, 2:00 AM — 3:10 AM EDT

Byte-Label Joint Attention Learning for Packet-grained Network Traffic Classification

Kelong Mao (Tsinghua University, China); Xi Xiao (Graduate School at Shenzhen, Tsinghua University, China); Guangwu Hu (Shenzhen Institute of Information Technology, China); Xiapu Luo (The Hong Kong Polytechnic University, Hong Kong); Bin Zhang (Peng Cheng Laboratory, China); Shutao Xia (Tsinghua University, China)

0
Network traffic classification (TC) is to classify network traffic into a specific class which plays a fundamental role in terms of network measurement, network management, and so on. In this work, we focus on packet-grained traffic classification. We find that previous packet-grained methods based on the analogy between traffic packet and image or text are not sufficiently reasonable, leading to a sub-optimal performance on both accuracy and efficiency that still can be largely improved.

In this paper, we devise a new method, called BLJAN, to jointly learn from byte sequence and labels for packet-grained traffic classification. BLJAN embeds the packet's bytes and all labels into a joint embedding space to capture their implicit correlations with a dual attention mechanism. It finally builds a more powerful packet representation with an enhancement from label embeddings to achieve high classification accuracy and interpretability. Extensive experiments on two benchmark traffic classification tasks, including application identification and traffic characterization, with three real-world datasets, demonstrate that BLJAN can achieve high performance (96.2%, 96.7%, and 99.7% Macro F1-scores on three datasets) for packet-grained traffic classification, outperforming six representative state-of-the-art baselines in terms of both accuracy and detection speed.

DarkTE: Towards Dark Traffic Engineering in Data Center Networks with Ensemble Learning

Renhai Xu (Tianjin University, China); Wenxin Li (Hong Kong University of Science and Technology); Keqiu Li and Xiaobo Zhou (Tianjin University, China); Heng Qi (Dalian University of Technology, China)

0
Over the last decade, traffic engineering (TE) has always been a research hotspot in data center networks. For routing flows efficiently and practically, existing TE schemes explore experience-driven heuristics or machine learning (ML) techniques to predict/identify network flows' size information. However, these TE schemes have significant limitations: they either identify the flow size information too late or are unaware of the ML models' prediction errors. In this paper, we present DarkTE, a novel TE solution that can learn to predict flow size information timely for achieving better routing performance while being robust to the prediction errors. At its heart, DarkTE employs an ensemble learning technique (i.e., random forest) to classify flows into mice and elephant flows with high accuracy. It then leverages a confidence-based rate allocation and path selection scheme to mitigate the occasional classification errors. Large-scale simulations demonstrate that DarkTE classifies flows within hundreds of microseconds, and the classification accuracy is at least 86.4% over three different realistic workloads. Further, DarkTE completes flows 2.94 times faster on average and makes more links to experience over 90% bandwidth utilization than the Hedera solution.

BCAC: Batch Classifier based on Agglomerative Clustering for traffic classification in a backbone network

Hua Wu, Xiying Chen, Guang Cheng, Xiaoyan Hu and Youqiong Zhuang (Southeast University, China)

0
Backbone network is the core part of the Internet. Due to the high transmission speed of traffic in the backbone network, Quality of Service (QoS) monitoring of services in the backbone network becomes a highly important and challenging issue. Traffic classification is the basis of QoS monitoring. The existing traffic classification is based on full traffic, which is impractical in high-speed backbone network traffic. This paper presents a method to classify the sampled traffic and gives an example of its application in QoS monitoring. Specifically, we design the Multiple Counter Sketch (MC Sketch) to quickly extract features from the sampled data stream in a backbone, propose the Batch Classifier based on Agglomerative Clustering (BCAC) for unsupervised clustering of traffic, and combine with the supervised machine learning method to train the labeled data in the clustering results to get the classification model. The experimental results of sampled traffic collected on a 10Gbps link show that even when the sampling ratio is 1:1024, the accuracy of our classification model reaches 96.3%. When different block sizes are set, the average clustering time of BCAC is only about one-third of the traditional agglomerative classifier. Moreover, we give an example of applying our traffic classification method to monitor the QoS, and the results show that our method can efficiently and accurately monitor the QoS dynamics of backbone network traffic.

Efficient Fine-Grained Website Fingerprinting via Encrypted Traffic Analysis with Deep Learning

Meng Shen, Zhenbo Gao and Liehuang Zhu (Beijing Institute of Technology, China); Ke Xu (Tsinghua University, China)

0
Fine-grained website fingerprinting (WF) enables potential attackers to infer individual webpages on a monitored website that victims are visiting, by analyzing the resulting traffic protected by security protocols such as TLS. Most existing studies focus on WF at the granularity of website, which takes website homepages as their representatives for fingerprinting. Fine-grained WF can reveal more user privacy, such as online purchasing habits and video-viewing interests, and can also be employed for web censorship. Due to striking similarly of webpages on a same website, it is still an open problem to conduct fine-grained WF in an accurate and time-efficient way.

In this paper, we propose BurNet, a fine-grained WF method using Convolutional Neural Networks (CNNs). To extract differences of similar webpages, we propose a new concept named unidirectional burst, which is a sequence of packets corresponding to a piece of HTTP message. BurNet takes as input unidirectional burst sequences, instead of bidirectional packet sequences, which makes it applicable to different attack scenarios. BurNet employs CNNs to build a powerful classifier, where sophisticated architecture is designed to improve classification accuracy while reducing time complexity in training. We collect real-world datasets from three well-known websites and conduct extensive experiments to evaluate the performance of BurNet. The closed-world evaluation results show that BurNet outperforms the stateof-the-art methods in both attack scenarios. In the more realistic open-world setting, BurNet can achieve 0.99 precision and 0.99 recall. BurNet is also superior to its CNN-based counterparts in terms of training efficiency.

Session Chair

Kui Wu, University of Victoria, Canada

Session 9

Privacy

Conference
4:20 PM — 5:30 PM JST
Local
Jun 26 Sat, 3:20 AM — 4:30 AM EDT

Privacy-Preserving Optimal Recovering for the Nearly Exhausted Payment Channels

Minze Xu, Yuan Zhang, Fengyuan Xu and Sheng Zhong (Nanjing University, China)

0
Payment Channel Network (PCN) is one of the most promising technologies for scaling the capacity of blockchain-based cryptocurrencies and improving the quality of blockchain-based services. However, during the use of PCNs, a significant portion of the payment channels gradually become exhausted, which triggers additional consumption of on-chain resources and makes PCNs less useful. This is a fundamental problem for blockchain-based cryptocurrencies, worthy of a thorough investigation.

In this paper, we propose OPRE, a protocol for OPtimal off-chain REcovering of payment channels, to solve this problem. It is optimal in that it recovers the maximum number of nearly exhausted channels in the PCN. Furthermore, we consider users' privacy concerns and design a privacy-preserving version of this protocol, so that users' balance information does not need to be revealed. This protocol maintains optimality in recovering payment channels while providing cryptographically strong privacy guarantee. In addition to the theoretical design and analysis, we also implement OPRE and experimentally evaluate its performance. The results show that the OPRE protocol is both efficient and effective.

Privacy-Preserving Approximate Top-k Nearest Keyword Queries over Encrypted Graphs

Meng Shen and Minghui Wang (Beijing Institute of Technology, China); Ke Xu (Tsinghua University, China); Liehuang Zhu (Beijing Institute of Technology, China)

0
With the prosperity of graph-based applications, it is increasingly popular for graph nodes to have labels in terms of a set of keywords. The top-k nearest keyword (k-NK) query can find a set of k nearest nodes containing a designated keyword to a given source node. In cloud computing era, graph owners prefer to outsource their graphs to cloud servers, leading to severe privacy risk for conducting k-NK queries. The current studies fail to support efficient and accurate k-NK query under the premise of privacy protection.

In this paper, we propose a new graph encryption scheme Aton, which enables efficient and privacy-preserving k-NK querying. Based on the symmetric-key encryption and particular pseudo-random functions, we construct a secure k-NK query index. Aton is built on a ciphertext sum comparison scheme which can achieve approximate distance comparison with high accuracy. Rigorous security analysis proves that it is CQA-2 secure. Experiments with real-world datasets demonstrate that it can efficiently answer k-NK queries with more accurate results compared with the state-of-the-art.

A Behavior Privacy Preserving Method towards RF Sensing

Jianwei Liu (Zhejiang University & Xi'an Jiaotong University, China); Chaowei Xiao (University of Michigan, ann arbor, USA); Kaiyan Cui (Xi'an Jiaotong University & The Hong Kong Polytechnic University, China); Jinsong Han (Zhejiang University & School of Cyber Science and Technology, China); Xian Xu and Kui Ren (Zhejiang University, China); Xufei Mao (Dongguan University of Tech, China)

0
Recent years have witnessed the booming development of RF sensing, which supports both identity authentication and behavior recognition by analysing the signal distortion caused by the human body. In particular, RF-based identity authentication is more attractive to researchers, because it can capture the unique biological characteristics of users. However, the openness of wireless transmission raises privacy concerns since human behaviors can expose the massive private information of users, which impedes the real-world implementation of RF-based user authentication applications. It is difficult to filter out the behavior information from the collected RF signals.

In this paper, we propose a privacy-preserving deep neural network named BPCloak to erase the behavior information in RF signals while retaining the ability of user authentication. We conduct extensive experiments over mainstream RF signals collected from three real wireless systems, including the WiFi, Radio Frequency IDentification (RFID), and millimeter-wave (mmWave) systems. The experimental results show that BPCloak significantly reduces the behavior recognition accuracy, i.e., 85%+, 75%+, and 65%+ reduction for WiFi, RFID, and mmWave systems respectively, merely with a slight penalty of accuracy decrease when using these three systems for user authentication, i.e., 1%-, 3%-, and 5%-, respectively.

Differential Privacy-Preserving User Linkage across Online Social Networks

Xin Yao (Central South University & Arizona State University, China); Rui Zhang (University of Delaware, USA); Yanchao Zhang (Arizona State University, USA)

0
Many people maintain accounts at multiple online social networks (OSNs). Multi-OSN user linkage seeks to link the same person's web profiles and integrate his/her data across different OSNs. It has been widely recognized as the key enabler for many important network applications. User linkage is unfortunately accompanied by growing privacy concerns about real identity leakage and the disclosure of sensitive user attributes. This paper initiates the study on privacy-preserving user linkage across multiple OSNs. We consider a social data collector (SDC) which collects perturbed user data from multiple OSNs and then performs user linkage for commercial data applications. To ensure strong user privacy, we introduce two novel differential privacy notions, .\epsilon.-attribute indistinguishability and .\epsilon.-profile indistinguishability, which ensure that any two users' similar attributes and profiles cannot be distinguished after perturbation. We then present a novel Multivariate Laplace Mechanism (MLM) to achieve .\epsilon.-attribute indistinguishability and .\epsilon.-profile indistinguishability. We finally propose a novel differential privacy-preserving user linkage framework in which the SDC trains a classifier for user linkage across different OSNs. Extensive experimental studies based on three real datasets confirm the efficacy of our proposed framework.

Session Chair

Shui Yu, University of Technology Sydney, Australia