### Session Session 10: WiFi Sensing Session Session 11: Scheduling Session Session 12: DCNs Session Session 13: Monitoring & Classification Session Session 14: Monitoring & Identification Session Session 15: Routing Session Session 16: Networking Protocol Session Closing: Closing Session

Session Session 10

## WiFi Sensing

Conference
8:30 AM — 9:40 AM CEST
Local
Jun 12 Sun, 2:30 AM — 3:40 AM EDT

### EAScatter: Excitor-Aware Bluetooth Backscatter

Zhanxiang Huang and Wei Gong (University of Science and Technology of China, China)

1
We propose EAScatter, the first excitor-aware Bluetooth backscatter system with stable performance for different Bluetooth excitors. We are the first to point out that the backscatter tag should fit the guard interval for different Bluetooth excitors. EAScatter uses a connection-based identification method, which can identify excitor during Bluetooth connection according to different shortest high-level lengths. Thus, it can select a specific optimal guard interval for each excitor. Moreover, it introduces Plus-one modulation, which can further improve goodput.
We built a prototype of EAScatter and evaluated it with extensive experiments. EAScatter can achieve up to 98% device excitor identification accuracy. Using TI CC1352 as an excitor, it has a 25x PRR gain and a 28x goodput gain over the state-of-the-art RBLE.

### SpiroFi: Contactless Pulmonary Function Monitoring using WiFi Signal

Yu Gu, Meng Wang, Peng Zhao and Yantong Wang (Hefei University of Technology, China); Hao Zhou (University of Science and Technology of China, China); Yusheng Ji (National Institute of Informatics, Japan); Celimuge Wu (The University of Electro-Communications, Japan)

2
Human pulmonary function declines with age. Elders, especially those with lung or cardiovascular diseases, yearn for daily lung function tests for timely diagnosis and treatment. However, current clinical spirometers are cumbersome and expensive while home-use portable ones' accuracy is questionable. Moreover, both kinds require contact measurements and could potentially cause cross infection, especially hazardous for contagious diseases like COVID-19. To this end, we propose SpiroFi, a contactless system that leverages WiFi Channel State Information (CSI) for convenient yet accurate pulmonary function testing (PFT) out of clinic. The key enabler underlying SpiroFi is a set of algorithms that can extract chest wall movement from WiFi signal variations and interpret such information into lung function indices. We have realized SpiroFi on low-cost commodity WiFi devices and tested it in a home-like site where it achieves 2.55% monitoring error over healthy youths. Then, with the Ethics Committee (EC) approval, we conducted a 2-month clinic study in a city hospital over elders with basic diseases. SprioFi still yields 6.05% monitoring error despite elders' degenerated pulmonary function and body control. Also, the correlation between lung function and age as well as chronic diseases has been revealed, highlighting the importance of daily PFT for the elderly.

### Blind-area Elimination in Video Surveillance Systems by WiFi Sensing with Minimum QoS Loss

Linqing Gui, Wenyang Yuan and Fu Xiao (Nanjing University of Posts and Telecommunications, China)

2
Video surveillance systems have demonstrated their great importance in security protection these years. However, due to limited budget, installing cameras in every place of surveillance region is not practical and then blind areas become inevitable. As a result, eliminating blind areas with the lowest cost has become a tough challenge. To the best of our knowledge, this is the first work that fixes the blind spots of video surveillance systems based on WiFi sensing technique. By taking either existing WiFi infrastructure or additional cheap WiFi device as the sensing device, this paper attempts to eliminate blind spots with tiny hardware cost. Moreover, in order to completely fix blind area with minimum loss of video communication QoS, the WiFi sensing device's location boundary that satisfies the above objective is modeled and estimated. To that end, a visitor-disturbed channel model is first derived for precisely describing the inherent relation between the appearance of visitor and the change of wireless channel. Then a location boundary model satisfying both blind-area elimination and QoS maximization is further derived. Based on the derived model, a practical system is designed to estimate the real location boundary. The simulation and experiment results have not only verified the correctness of our derived location boundary model, but also showed its good performance on both blind-area elimination and communication QoS optimization.

### WiVi: WiFi-Video Cross-Modal Fusion based Multi-Path Gait Recognition System

Jinmeng Fan, Hao Zhou and Fengyu Zhou (University of Science and Technology of China, China); Xiaoyan Wang (Ibaraki University, Japan); Zhi Liu (The University of Electro-Communications, Japan); Xiang-Yang Li (University of Science and Technology of China, China)

1
This talk does not have an abstract.

###### Session Chair

Zhi Liu, The University of Electro-Communications

Session Session 11

## Scheduling

Conference
9:40 AM — 10:50 AM CEST
Local
Jun 12 Sun, 3:40 AM — 4:50 AM EDT

### TASP: Enabling Time-triggered Task Scheduling in TSN-based Mixed-criticality system

Xuyan Jiang (National University of Defense Technology, China); Yiming Zhang (NUDT & NiceX Lab, China); Wenwen Fu, Xiangrui Yang and Yinhan Sun (National University of Defense Technology, China); Zhigang Sun (National Unversity of Defense Technology, China)

2
Distributed mixed-criticality systems (DMCS) have been widely used in various critical domains such as self-driving cars and space crafts. To guarantee the end-to-end QoS (i.e., deadline/jitter requirements) of sensing-controlling-actuating control loops (CL) applications,, DMCS adopts Time-Sensitive Networking (TSN), an emerging real-time Ethernet technology, for communication between end systems (ES). TSN provides a synchronized global clock and guarantees bounded delay, making it possible for DMCS to collaboratively schedule the computation (on ES) and communication (in TSN) to meet the QoS requirement. However, as modern DMCS tend to use fully-fledged Linux distributions (rather than a custom real-time OS) on ES to enjoy Linux's mature ecosystem, it is challenging for DMCS to realize TSN-based QoS guarantees because the priority-based scheduler of Linux on ES is incompatible with TSN.

In this paper we propose TAsk Scheduling Puppeteer (TASP), a mechanism that schedules DMCS tasks based on TSN without modifying the Linux OS. The key idea of TASP is to manipulate task scheduling by controlling the timing of CL packet submissions at the interface between TSN and ES. Specifically, TASP refines time-triggered task scheduling with two parameters, namely, Fore Guardband (ForeGB) and Back Guardband (BackGB). During the ForeGB period before a CL packet's submission, TASP forbids any packets' submission; while during the BackGB period after a CL packet's submission, TASP forbids any other CL packets' submission. ForeGB and BackGB can ensure that there is at most one schedulable task on the ES at any time and thus Linux has no choice but to schedule the only task, making the Linux scheduler a puppet. We have deployed TASP and evaluated it in real-world TSN switches based on an open-source TSN project. The TASP-enabled ES can realize task scheduling precisely based on the TSN global time, which outperforms the original ES by reducing end-to-end jitter from milliseconds to microseconds.

### BUDA: Budget and Deadline Aware Scheduling Algorithm for Task Graphs in Heterogeneous Systems

Hamza Djigal, Linfeng Liu, Jian Luo and Jia Xu (Nanjing University of Posts and Telecommunications, China)

2

### Rethinking the use of network cycle in Time-Sensitive Networking (TSN) flow scheduling

Jiashuo Lin (Southern University of Science and Technology, China); Weichao Li (Peng Cheng Laboratory, China); Xingbo Feng (City University of Hong Kong, China); Shuangping Zhan, Jingbin Feng and Jian Cheng (Peng Cheng Laboratory, China); Tao Wang (Guangdong University of Technology, China); Qing Li (Peng Cheng Laboratory, China); Yi Wang (Southern University of Science and Technology, China); Fuliang Li (Northeastern University, China); Bo Tang (Southern University of Science and Technology, China)

2
Time-Sensitive Networking (TSN) is an emerging network architecture that provides bounded latency and reliable network services for time-sensitive applications. Since time-triggered flows in TSN are typically periodic, a concept of \textit{network cycle} is widely used in both standards and academic researches. However, although network cycle has gained popularity, its rationale has not yet been analyzed systematically.

In this paper, we mathematically evaluate the performance of several flow scheduling algorithms in terms of flow schedulability with and without employing network cycle. We observe that only when the network cycle is set to a proper value can the performance of flow scheduling be significantly improved. To better evaluate the scheduling effect, a novel assessment metric and a goal-based optimization algorithm are introduced. Our experiment results show that the network cycle-based algorithm can achieve a considerable improvement (40\% - 170\% improvement in the number of scheduled flows) compared to the ones with network cycle disabled.

### Online Traffic Allocation Based on Percentile Charging for Practical CDNs

Huan Chen, Huiyou Zhan and Haisheng Tan (University of Science and Technology of China, China); Huang Xu and Weihua Shan (Huawei Technologies Co., Ltd., China); Shiteng Chen (Chinese Academy of Sciences, China); Xiang-Yang Li (University of Science and Technology of China, China)

1
With the explosion of data transmitted over the Internet, Content Delivery Networks (CDNs) carry massive network traffic globally and suffer an increasingly higher bandwidth cost. A critical issue for CDN service providers is how to allocate network traffic among CDN facilities to reduce the total bandwidth cost without violating the quality of service. This work studies online traffic allocation in CDNs to minimize the bandwidth cost under the 95th percentile charging model. Specifically, we here take into account practical deployment issues in large-scale CDN systems, e.g., allocation granularity and deviation. We first theoretically prove the approximation hardness of the traffic allocation problem. We then propose a novel prediction-based algorithm named OnTPC, which effectively addresses constraints raised in practical deployment. Extensive experiments demonstrate that OnTPC outperforms state-of-the-art baselines and is expected to save over a million dollars per month for our large-scale commercial CDN collaborator. Moreover, the performance of OnTPC is consistently outstanding under various settings, and specifically robust to large allocation deviation.

###### Session Chair

Jianping Wang, City University of Hong Kong

Session Session 12

## DCNs

Conference
11:00 AM — 12:10 PM CEST
Local
Jun 12 Sun, 5:00 AM — 6:10 AM EDT

### NQ/ATP: Architectural Support for Massive Aggregate Queries in Data Center Networks

Yixi Chen (Tsinghua University, China); Wenfei Wu (Peking University, China); Ying Zhang (Facebook, USA); Shan-Hsiang Shen (National Taiwan University of Science and Technology, Taiwan)

2
Network queries become increasingly challenging for online service providers with massive network devices and massive network queries due to the tradeoff between system scale and query granularity. We re-architect the traditional threetier architecture, i.e., data collection, data storage, and data query, for aggregate queries, and build a system named NQ/ATP. NQ/ATP offloads the aggregation operation in network queries onto network switches, which accelerates the query execution and frees up network resources. NQ/ATP further devises a route learning mechanism, query hierarchy load balancing policy, and hierarchy clustering mechanism to save forwarding table entries on switches, which better supports massive queries. The evaluation shows that NQ/ATP can support network aggregate queries with higher capacity, less traffic volume, finer granularity, and better scalability than traditional three-tier polling architectures. The three optimizations can effectively reduce the forwarding table usage by up to 97.55%.

### BubbleTCAM: Bubble Reservation in SDN Switches for Fast TCAM Update

Cong Luo and Chuhao Chen (Fudan University, China); Hao Mei (FuDan University, China); Ruyi Yao (Fudan University, China); Ying Wan (Tsinghua University, China); Wenjun Li (Harvard University, USA); Sen Liu (Fudan University, China); Bin Liu (Tsinghua University, China); Yang Xu (Fudan University, China)

2
This talk does not have an abstract.

### UFLB: A Unified Framework for Modeling and Analyzing Load Balancing Methods in DCNs

Yawen Chen (Tianjin University, China); Deke Guo (National University of Defense Technology, China); Bangbang Ren (National University of Defence Technology, China); Junjie Xie and Lailong Luo (National University of Defense Technology, China)

1
Data centers usually employ scale-out network topologies to provide sufficient network bandwidth for applications. The traditional equal-cost multi-path (ECMP) routing method is proposed to tackle the serious load imbalance problem across all links. However, it does not achieve the desired performance and still incurs low network throughput. Consequently, researchers recently redesigned some load balancing mechanisms for data center networks (DCNs) from different design dimensions. However, it remains open to systematically measure and evaluate their performance under various settings. It is impractical for evaluators to implement or simulate involved load balancing mechanisms. In this paper, we propose a unified framework, UFLB, which can well model and emulate representative load balancing mechanisms for data center networks in a lightweight way. This framework has overcome three significant challenges: model traffic distribution in the symmetry as well as asymmetry data center networks, characterize mainstream load balancing methods and systematically combine them with high accuracy. We evaluate the effectiveness of our model under not only general settings of data center networks but also some special settings, such as various link failures and asymmetric topologies. The results indicate that the deviation rate of UFLB is within 15% against the implementation of load balancing mechanisms, such as ECMP, CONGA, DRILL, HERMES, PRESTO, in NS2, while it can be several orders of magnitude faster.

### Flexible and Efficient Multicast Transfers in Inter-Datacenter Networks

Long Luo, Linjian Yu, Tie Ma and Hongfang Yu (University of Electronic Science and Technology of China, China)

2

###### Session Chair

Fangming Liu, Huazhong University of Science and Technology

Session Session 13

## Monitoring & Classification

Conference
1:00 PM — 2:10 PM CEST
Local
Jun 12 Sun, 7:00 AM — 8:10 AM EDT

### DUET: Joint Deployment of Trucks and Drones for Object Monitoring

Lihao Wang (Nanjing University, China); Weijun Wang (Nanjing University & University of Goettingen, China); Haipeng Dai (Nanjing University & State Key Laboratory for Novel Software Technology, China); Jiaqi Zheng (Nanjing University, China); Bangbang Ren (National University of Defence Technology, China); Shuyu Shi (University of Nanjing, China); Rong Gu (Nanjing University, China)

2
The limitation on the flight range motivates a hybrid monitoring system, wherein trucks carrying drones drive to pre-planned positions and then free drones for task execution. While the flight range limitation is mitigated, it is challenging to determine the destination of trucks and drones and set airborne cameras. This paper optimizes the joint Deployment of trUcks and dronEs for objecT monitoring (DUET), that is, deploy a set of trucks where each truck carries drones, and each drone is equipped with a varifocal camera such that the overall monitoring utility for target objects is maximized. To tackle the DUET problem, we first model the hybrid system and monitoring utility; then, discretize the solution space of DUET with performance bound. In this way, the problem is transformed into a two-level combinatorial optimization problem satisfying submodularity. To address it, a two-level greedy algorithm with approximation ratio is proposed to select deployment strategies. After the strategy selection, an optimal method is devised to carefully adjust the strategy for energy saving and communication improvement without loss of monitoring utility. Both simulations and field experiments are conducted to evaluate the proposed framework, which outperforms baseline algorithms on monitoring utility by at least 28.4% and 40%, respectively.

### Scorpius: Proactive Code Preparation to Accelerate Function Startup

Heng Yu, Junxian Shen, Han Zhang, Jilong Wang, Congcong Miao and Mingwei Xu (Tsinghua University, China)

1
Massive enterprises deploy their applications on public clouds to relieve infrastructure management burden. However, applications are faced with highly fluctuating workloads, while clouds provision exclusive resources at coarse time granularity, resulting in severely low resource efficiency. Serverless computing enables statistically multiplexed resources, which has the potential to improve efficiency. However, serverless platforms could consume several seconds to start functions and the long startup latency can severely hurt the performance of applications. In this paper, we measure the serverless platforms and find that most startup latency is occupied by code preparation. To reduce the code preparation latency with little resource overhead, we propose Scorpius, a serverless platform that combines two optimization categories: (1) To reduce the code size, Scorpius proposes to proactively prepare partial libraries over servers and run functions on the server with most library sharing. (2) To advance the start time, Scorpius proposes to predict the function overload with a simple model and proactively scale code to more servers. We have implemented a prototype of Scorpius and conducted extensive experiments. Evaluation results demonstrate that compared with state-of-the-art methods, Scorpius can simultaneously reduce code preparation latency and storage overhead by 88% and 79%, respectively.

### FROD: An Efficient Framework for Optimizing Decision Trees in Packet Classification

Longlong Zhu and Jiashuo Yu (Fuzhou University, China); Jiayi Cai (FuZhou University, China); Jinfeng Pan and ZhiGao Li (Fuzhou University, China); Zhengyan Zhou (Zhejiang University, China); Dong Zhang (Fuzhou University, China); Chunming Wu (Zhejiang University, China)

2
To perform efficient packet classification, decision tree based methods use hand-tuned heuristics to conduct decision trees. Then the performance testing and optimization are executed to ensure an excellent searching speed and space overhead. Specifically, when the performance is below expectation, existing solutions attempt to optimize the algorithms, such as designing more sophisticated heuristics. However, reconstruction or adjustment for algorithms produces an intolerable time overhead due to the long optimization period, caused by uncertain performance benefits and high pre-processing time. In this paper, we propose FROD, an efficient framework to directly optimize the decision trees for packet classification. FROD raises a meticulous evaluation to accurately appraise decision trees constructed by different heuristics. It then seeks out the bottleneck components via a lightweight heuristic. After that, FROD searches the optimal division for inferior components considering structural constraints and characteristics of traffic distribution. Evaluation on ClassBench shows that FROD benefits existing decision tree based solutions in classification time by 37% and memory footprint by 13% at the median, and reduces classification time by up to 61%.

### PextCuts: A High-performance Packet Classification Algorithm with Pext CPU Instruction

Chunyang Zhang (Institute of Computing Technology, Chinese Academy of Sciences, China); Gaogang Xie (CNIC Chinese Academy of Sciences & University of Chinese Academy of Sciences, China); Peng He (ByteDance, China)

2
Packet classification is the most essential component for switches and firewalls to perform network functions. In Software Defined Network, the growing scale of traffic requires the packet classification algorithm to perform high-speed lookup. Even though a lot of algorithms are proposed, the lookup performance is still the bottleneck because of the inefficient and unscientific schemes to cut rules and split trees. In this paper, we propose a novel decision-tree-based algorithm PextCuts. First, to efficiently cut rules, PextCuts applies one $pext$ CPU instruction to select discontiguous bits rather than contiguous bits. Second, to scientifically split trees, PextCuts applies the dynamic programming method to split each field into multiple sizes rather than large and small sizes. Compared to ten representative algorithms, PextCuts has the highest lookup speed with the minimal numbers of average memory accesses, maximal memory accesses, and tree height simultaneously. It also consumes the least memory cost and the shortest construction time. For the state-of-the-art algorithm ByteCuts, PextCuts achieves 2.1x lookup speed with only 57% memory cost and 10% construction time. In addition, we implement PextCuts in DPDK to perform packet classification with optional fields and achieve 3.0x lookup speed.

###### Session Chair

Haipeng Dai, Nanjing University

Session Session 14

## Monitoring & Identification

Conference
2:10 PM — 3:20 PM CEST
Local
Jun 12 Sun, 8:10 AM — 9:20 AM EDT

### Progressive Construction of k-identifiable Networks

Yongshuo Wan (Hong Kong); Cuiying Feng (City University of Hong Kong, Hong Kong); Kui Wu (University of Victoria, Canada); Jianping Wang (City University of Hong Kong, Hong Kong)

3
Since the inception of networking technology, network topology design has been a fundamental step for any interconnected system. This classical problem has diverse forms due to various construction criteria. One special criterion, the ease of monitoring the network (termed as monitorability of the network), has recently attracted much attention in the era of Industry 4.0 when many complex private networks need to be built for new industrial services. This paper extends a quantitative measure of network monitorability, k-identifiability, based on which a new form of network topology design problem is formulated. We prove that this network design problem is intractable. To solve it, we systematically analyze the topological features that are helpful for reducing the complexity of network construction. Based on the analysis, we propose a dual-heuristic method that runs two heuristics in parallel and selects the better topology as the preliminary construction result. Moreover, we design an integrated algorithm that reduces unnecessary edges as the final construction result. We compare our dual-heuristic algorithm with the theoretical optimal solution in small-scale networks where the brute-force search is feasible. The results demonstrate the near-optimality of our method. We also illustrate the capability of our method in designing large-scale networks.

### Order-preserved Tensor Completion For Accurate Network-wide Monitoring

Xiaocan Li (Hunan University, China); Kun Xie (Hunan University, China); Xin Wang (Stony Brook University, USA); Gaogang Xie (Institute of Computing Technology, Chinese Academy of Sciences, China); Kenli Li and Dafang Zhang (Hunan University, China); Jigang Wen (Chinese Academy of Science & Institute of Computing Technology, China)

1
Network-wide monitoring is important for many network functions. However, monitoring data are often incomplete due to the need of sampling to reduce high measurement cost, system failure, and unavoidable transmission loss under severe communication. Instead of only targeting to estimate all missing monitoring data entries with a small set of measurement samples, we study a new order-preserved monitoring data estimation problem to accurately estimate the missing data entries while preserving the data entries' order in the dataset. We propose a novel order-preserved tensor completion model that integrates both the low rank property and the order information into a joint learning problem to estimate the missing data. With well designed non-convex function to directly approximate the tensor rank and order-preserved constraint under the linear self-recovery method, our model can not only more accurately capture the low-rank property of monitoring data to increase the estimation performance of missing data, but also can capture the order information in monitoring data to ensure the estimation accuracy. Extensive experiments using four real datasets demonstrate that compared with the state-of-the-art tensor completion algorithms, our proposed algorithm can provide more accurate estimation and keep the value order of recovered entries to more effectively retrieve top-k large entries.

### Accurately Identify Time-decaying Heavy Hitters by Decay-aware Cuckoo Filter along Kicking Path

Qingjun Xiao (SouthEast University of China, China); Haotian Wang and Guannan Pan (Southeast University, China)

1
In high-speed networks, flow-level traffic measurement is an essential tool to understand how network bandwidth is being utilized. It can be used to identify anomalous traffic behaviors due to operational or security issues. Perhaps the most important measurement task is to track the \emph{heavy hitters} (HHs), i.e., the flows occupying the greatest shares of bandwidth. But most existing solutions have no concept of time window: Whenever a measurement period ends, the data sketch, which is deployed in the data plane for monitoring HHs, must be transferred to the control plane and then reset to zeros. It is better to capture network conditions of the continuous recent past by designing a HHs measurement solution that can support time-decaying window. As a result, recently several related works are devoted to tracking the \emph{time-decaying heavy hitters}, including time-decaying Count-Min and time-decaying Space-Saving. However, their memory-accuracy tradeoff is still suboptimal. In this paper, we attain higher performance by proposing a new algorithm named \emph{Decay-Aware Cuckoo Filter along Kicking Path} (DAKP-CF). It can be regarded as a variant of cuckoo filter (an improved version of hash table with better memory efficiency), by transforming each bucket into a bucket-level min-heap. Its key advantage is that, when we update the table as a packet arrive, it can discover and replace the most time-decayed flow along the kicking path of a cuckoo filter. We deliberately avoid scanning the entire table to keep the high time efficiency. The experiment results show that our DAKP-CF can reach the same identification accuracy as existing methods with roughly 25\% memory cost. In addition, we have built a prototype of DAKP-CF algorithm using P4-programmable BMv2 software switch.

### Improving disk failure detection accuracy via data augmentation

Wang Wang (University of Chinese Academy of Science, China); Xuehai Tang, Biyu Zhou and Wenjie Xiao (Institute of Information Engineering, Chinese Academy of Sciences, China); Jizhong Han (IIE, CAS, China); Songlin Hu (IIE, China)

1
Frequently happening of disk failures seriously affects the dependability and service quality of cloud data centers. Recently, machine learning (ML) based methods are popularly adopted to proactively predict forthcoming disk failures via supervised learning. However, the high imbalance of failure samples and healthy samples is a huge obstacle for existing detection methods to establish high performance detection model. This paper presents a data augmentation method MSGMD, which can efficiently generate high quality failure samples to alleviate the data imbalance of the training set, so as to effectively improve the performance of any supervised failure detection models. First, MSGMD converts failure samples (multivariate time series)
into multiple univariate time series via decomposing the spatial relations among features. Then it learns the temporal correlation of each feature via a policy-based reinforcement learning model trained in an adversarial way. After that, it generates failure samples by combining feature series sampled from learned distribution. Finally, it filters out low quality generated samples with a confidence-based method. Experimental results on real-world datasets show that, through data augmentation, MSGMD can improve the FDR and F1-Score of the state-of-the-art disk failure detection model by 31.59% and 30.74% respectively on average.

###### Session Chair

Kun Wang, UCLA

Session Session 15

## Routing

Conference
3:30 PM — 4:40 PM CEST
Local
Jun 12 Sun, 9:30 AM — 10:40 AM EDT

### QoS Routing Using Dominant-Distance Vectors

Jj Garcia-Luna-Aceves (University of California at Santa Cruz, USA); Brad Smith (University of California, Santa Cruz, USA); Judith Samson (University of California Santa Cruz, USA)

5
The Dominant-Distance Routing Information Protocol (DRIP) is introduced for quality-of-service (QoS) routing based on multiple criteria and is proven to be loop-free at every instant and capable of converging to optimal routes if they exist. DRIP is based on the exchange of updates and queries stating reference routing-metric values for destinations. Simulation experiments based on ns3 are used to compare DRIP against the Non-Restarting Vectoring Protocol recently proposed by Sobrinho and Ferreira, as well as OSPF and RIPv2. The results demonstrate that DRIP provides loop-free routing based on multiple performance and policy criteria as efficiently as routing protocols for shortest-path routing.

### The Differentiated Reliable Routing Mechanism for 5GB5G

Yaoguang Lu, Xingwei Wang, Bo Yi and Min Huang (Northeastern University, China)

3
In order to support the 5GB5G network with huge connections, large traffic, and low latency, the evolution of the IP network to the IPv6 network is an inevitable trend. Although IPv6 network has significantly improved network performance, there are still a series of network failures. After the failure occurs, it is necessary for the routing algorithm to create a backup path to ensure the continuous provision of network services. So, the selection of the backup path directly determines the performance of the network service. SRv6 takes advantage of the programmable capability of the IPv6 extension header to implement differentiated reliable routing, satisfy the QoS requirements of different network services, and improve user experience. First, we design the network model for single link failure, define the network service, and design the two-dimensional flow table. Then, the backup path traversal algorithm is designed to obtain the universal set of backup paths in the network, and the optimal backup path for a specific network service is selected through comprehensive evaluation method. Experimental results show that the differentiated reliable routing mechanism proposed in this paper exhibits better performance in terms of bandwidth, delay, jitter and packet loss rate compared with the benchmark mechanism.

### Online Entanglement Routing in Quantum Networks

Lan Yang, Yangming Zhao and Hongli Xu (University of Science and Technology of China, China); Chunming Qiao (University at Buffalo, USA)

3
Quantum Data Networks (QDNs) typically leverage teleportation to reliably send data quantum bits (called qubits) to their destinations. To teleport a data qubit from Alice to Bob, one entanglement connection between Alice and Bob needs to be established. Accordingly, we aim to establish as many entanglement connections as possible with limited quantum resources in order to maximize the network throughput. Conventional methods assume a known traffic matrix and calculate the paths to establish all the requested entanglement connections in one batch using a centralized algorithm. However, these methods are not scalable in large scale QDNs since teleportation has to complete within a very short time slot due to decoherence of the entanglement. To address this issue, we propose an Online Entanglement Routing (OER) scheme which determines the entanglement paths for each request when it arrives. In addition, OER pursues work conservation and fairness among all requests in the QDNs. Through extensive simulations, we demonstrate that OER not only outperforms two representative heuristics by up to 61.27% and 52.79%, respectively, in terms of average request completion time, but also achieves a better fairness performance than these two counterparts.

### Break the Blackbox! Desensitize Intra-domain Information for Inter-domain Routing

Peizhuang Cong, Yuchao Zhang, Lei Wang, Hao Ni and Wendong Wang (Beijing University of Posts and Telecommunications, China); Gong Xiangyang (Beijing University of Posts and Telecommunications P.R. China, China); Tong Yang (Peking University, China); Dan Li and Ke Xu (Tsinghua University, China)

1
Along with the ever-increasing amount of data generated from edge networks, cross domain (also known as Autonomous Systems, AS) transmission problem has attracted more and more attention. As mature and widely used inter-domain routing protocols, BGP-based solutions often use the number of domains (i.e. AS hops) of each path to make inter-domain routing decisions, which is simple and effective, but usually can not get the optimal routing results due to the lack of real state/information within ASes. These protocols choose the path with less AS hops as the forwarding path, even if the total latency or cost of the domains on this path is higher. While to solve this problem, directly access to intra-domain information as the assistance to make routing decisions is impractical due to data privacy. In this paper, we propose DIT, which makes near-optimal inter-domain routing decisions with desensitized intra-domain information. To do so, we design a homomorphic encrypted-based private number comparison scheme to export intra-domain information safely and thus assist in routing decisions. We conduct a series of experiments according to five real network topologies with nearly 900 simulated flows, and the results show that DIT reduces the average forwarding hops by about 45% and the flow completion time by about 60%.

###### Session Chair

Qi Li, Tsinghua University

Session Session 16

## Networking Protocol

Conference
4:40 PM — 5:50 PM CEST
Local
Jun 12 Sun, 10:40 AM — 11:50 AM EDT

### ACCeSS: Adaptive QoS-aware Congestion Control for Multipath TCP

Xiaolan Ji and Biao Han (National University of Defense Technology, China); Ruidong Li (Kanazawa University, Japan); Cao Xu and Yahui Li (National University of Defense Technology, China); Jinshu Su (National University of Defence Technology, China)

1
Multipath TCP (MPTCP) enables multi-home devices to establish multiple paths for simultaneous data transmission. However, due to diverse Quality of Service (QoS) requirements in real network, existing multipath congestion control algorithms (CCAs) fail to fast adapt to dynamic traffic, which leads to performance degradation, especially in heterogeneous network environments. To tackle these problems, in this paper, we first observe the performance limitations of current multipath CCAs by conducting extensive experiments. Then we propose ACCeSS, an adaptive QoS-aware multipath congestion control framework, which is able to promptly adapt to network changes and QoS requirements with a novel control policy optimization phase. In order to adjust and stimulate improvement of the preferred performance metric, ACCeSS exploits Random Forest Regressing (RFR) method to perform QoS-specific utility function optimization. ACCeSS is implemented and compared with other multipath CCAs in Linux kernel. Performances of ACCeSS are evaluated in both emulated and real-world networks, which reveal that ACCeSS outperforms classic multipath CCAs and the state-of-the-art learning based multipath CCA with better adaptive capability of QoS.

### Demystifying and Mitigating TCP Capping

Qingkai Meng and Fengyuan Ren (Tsinghua University, China); Tong Zhang (Nanjing University of Aeronautics and Astronautics, China); Danfeng Shan (Xi'an Jiaotong University, China); Yajun Yang (Tencent Technology Shenzhen Company, China)

1
Today's Internet user experience greatly depends on some user-perceived network metrics, such as throughput and latency. To improve these metrics, many Internet content providers build the content delivery network (CDN) to provide their services. Generally, CDNs adopt TCP as their transport protocol. A recent line of work improves TCP by proposing novel congestion control algorithms. However, we measure TCP performance in our production CDN and identify an interesting phenomenon termed TCP capping. When the flows experience TCP capping, the fixed-size receive window (rwnd) restricts these flows from fully utilizing network bandwidth. Through in-depth analysis, we demystify that the root cause of TCP capping is an inappropriate constraint on rwnd due to not considering the receiver's processing capability. To mitigate it, this paper proposes server-side scheme Apollo and client-side scheme Artemis for Internet content providers and users, respectively. Apollo probes the receiver's processing capability and assists the sender in sending packets. And Artemis adjusts the receive buffer in light of the receiver's processing capability. In our evaluation, compared to vanilla TCP, TCP (w/ Apollo) and TCP (w/ Artemis) shorten flow completion time by up to 91.8% and 94.9%, respectively.

### Consistent and Fine-Grained Rule Update with In-Network Control for Distributed Rate Limiting

Yongchao He (Tsinghua University, China); Wenfei Wu (Peking University, China)

3
Coexisting applications contend for limited WAN bandwidth when communicating over distributed data centers in private clouds. Online service providers deploy distributed rate limiting systems to dynamically estimate each application instance's bandwidth demand, and update rate limiting rules to provide performance isolation and bandwidth guarantees for applications with different priorities. However, the isolation violation caused by the inconsistent update of rate limiting rules among servers would eventually violates the Service Level Agreement requirements. Motivated by the observation that In-Network Control can update rate limiting rules consistently with ultra-low latency for hundreds of thousands of end-hosts through in-band control messages, this paper presents DistRL, a dynamic distributed rate limiting system that can achieve fine-grained consistent updates. The main idea of DistRL is to replace the traditional controller with programmable switches to improve communication efficiency, thereby achieving finer-grained consistent updates. The evaluation shows that DistRL can support sub-second distributed updates of rate limiting rules without isolation violation for O(100000) servers.

### HPUCache: Toward High Performance and Resource Utilization in Clustered Cache via Data Copying and Instance Merging

Wang Qianli (University of Science and Technology of China (USTC), China); Si Wu, Yongkun Li and Yinlong Xu (University of Science and Technology of China, China); Fei Chen, Pengcheng Wang and Lei Han (Huawei Technologies, China)

2
As one of the most popular in-memory cache systems, Redis provides low-latency and high-performance data access for modern internet services. However, in large-scale Redis systems, the access skewness and locality in storage workloads induce a small amount of hot-spot instances with degraded system performance, while massive cold instances with low CPU utilization. This paper proposes HPUCache to address the hot-spots via data copying and cold instances via instance merging. HPUCache fully utilizes the cached mapping in Redis client, and dynamically updates this mapping to enable access to the multiple data copies. Hence it can manage multiple copies achieving both Redis client compatibility and high user access performance. It also proposes an asynchronous instance merging strategy based on disk snapshots and temporal caches, which separates the massive data movement from the critical user access path to achieve high performance instance merging. We integrate HPUCache into Redis. Experiments under two types of workloads show that, compared to the native Redis design, HPUCache achieves 2.4x and 3.5x performance gains, 2x and 6x CPU utilization gains respectively.

###### Session Chair

Dan Wang, The Hong Kong Polytechnic University, China

Session Closing

## Closing Session

Conference
5:50 PM — 6:00 PM CEST
Local
Jun 12 Sun, 11:50 AM — 12:00 PM EDT

###### Session Chair

Yan Zhang, University of Oslo, Norway