## Technical Sessions

Session Session 1

## QoS in AI/ML

Conference
10:40 AM — 12:10 PM CEST
Local
Jun 10 Fri, 4:40 AM — 6:10 AM EDT

### AIQoSer: Building the efficient Inference-QoS for AI Services

Jianxin Li, Tianchen Zhu, Haoyi Zhou, Qingyun Sun, Chunyang Jiang, Shuai Zhang and Chunming Hu (Beihang University, China)

3
The AI inspired methods have entirely changed the network QoS landscape and brought better demand-guided experiences for the end-users.
However, the increasing demands of satisfactory experiences require larger AI models, whose inference efficiency becomes the non-negligible drawback in the time-sensitive network QoS. In this work, we defined this challenge as the inference-QoS (iQoS) problem of the network QoS itself, which balances inference efficiency and performance for AI services. We design a unified iQoS metric to evaluate the AI-enhanced QoS frameworks with considerations on model performance, inference latency, and input scale. Then, we propose a two-stage pipeline as the exemplar for leveraging the iQoS metric in QoS-aware AI services: (i) enhance reconstruction ability, pretraining masked autoencoder extracts intrinsic data correlations by multi-scale masking; (ii) improve inference efficiency, forecasting masked decoder uses the data scale pruning in terms of spatial and temporal dimension for prediction. Comprehensive experiments on our method demonstrate its superior inference latency and overwhelming traffic matrix prediction performance.

### Accelerating Blockchain-enabled Distributed Machine Learning by Proof of Useful Work

Yao Du and Cyril Leung (The University of British Columbia, Canada); Zehua Wang (The University of British Columbia, Vancouver, Canada); Victor C.M. Leung (Shenzhen University, China & The University of British Columbia, Canada)

1
In Internet of Things (IoT) employing centralized machine learning, security is a major concern due to the heterogeneity of end devices. Decentralized machine learning (DML) with blockchain is a potential solution. However, blockchain with proof-of-work (PoW) consensus mechanism wastes computing resources and adds latency to DML. Computing resources can be utilized more efficiently with proof-of-useful-work (uPoW), which secures transactions by solving real-world problems. We propose a novel uPoW method that exploits PoW mining to accelerate DML through a task scheduling framework for multi-access edge computing (MEC) systems. To provide a good quality-of-service for the system, we minimize the latency by solving a multi-way number partitioning problem in the extended form. A novel uPoW-based mechanism is proposed to schedule DML tasks among MEC servers effectively. Simulation results show that our proposed blockchain strategies accelerate DML significantly compared with benchmarks.

### An Online Approach for DNN Model Caching and Processor Allocation in Edge Computing

Zhiqi Chen, Sheng Zhang, Zhi Ma, Shuai Zhang and Zhuzhong Qian (Nanjing University, China); Mingjun Xiao (University of Science and Technology of China, China); Jie Wu (Temple University, USA); Sanglu Lu (Nanjing University, China)

1
Edge computing is a new computing paradigm rising gradually in recent years. Applications, such as object detection, virtual reality and intelligent cameras, often leverage Deep Neural Networks (DNN) inference technology. The traditional paradigm of DNN inference based on cloud suffers from high delay because of the limited bandwidth. From the perspective of service providers, caching DNN models on the edge brings several benefits, such as efficiency, privacy, security, etc.. The problem we concerned in this paper is how to decide the cached models and how to allocate processors of edge servers to reduce the overall system cost. To solve it, we model and study the DNN Model Caching and Processor Allocation (DMCPA) problem, which considers user-perceived delay and energy consumption with limited edge resources. We model it as an integer nonlinear programming (INLP) problem, and prove its NP-Completeness. Since it is considered as a long-term average optimization problem, we leverage the Lyapunov framework to develop a novel online algorithm DMCPA-GS-Online with Gibbs Sampling. We give the theoretical analysis to prove that our algorithm is near-optimal. In experiments, we study the performance of our algorithm and compare it with other baselines. The simulation results with the trace dataset from real world demonstrate the effectiveness and adaptiveness of our algorithm.

### PRM: An Efficient Partial Recovery Method to Accelerate Training Data Reconstruction for Distributed Deep Learning Applications in Cloud Storage Systems

Piao Hu, Yunfei Gu, Ranhao Jia, Chentao Wu and Minyi Guo (Shanghai Jiao Tong University, China); Jie Li (Shanghai Jiaotong University, China)

1
Distributed deep learning is a typical machine learning method running in distributed environment such as cloud computing systems. The corresponding training, validation and test datasets are very large in general (e.g., several TBs), which needs to be stored across multiple data nodes. Due to the high disk failure ratio in cloud storage systems, one of the critical issues for distributed deep learning is how to effciently tolerate disk failures in the training procedures. These failures can lead to a large amount of data loss, which decreases the training accuracy and slows down the training process. Although several recovery methods are proposed to accelerate the data reconstruction, the related overhead is extremely high, such as high CPU/GPU utilization, a large number of I/Os, etc. To address the above problems, we propose a novel Partial Recovery Method (called PRM), which is an adaptive recovery method to accelerate data reconstruction for distributed deep learning applications in cloud storage systems. The key idea of PRM is combining the advantages of erasure coding's ability to obtain global information on the data distribution with the AI's ability to recover partial lost data, which can sharply reduce the overhead with acceptable training accuracy. To demonstrate the effectiveness of the PRM approach, we conduct several experiments. The results show that, compared to the state-of-the-art full or approximate recovery methods, PRM decreases the average network transmission overhead by up to 64.50%, and reduces the recovery time by up to 55.90%, respectively.

### Quality-aided Annotation Service Selection in MLaaS Market

Shanyang Jiang and Lan Zhang (University of Science and Technology of China, China)

1
The vibrant markets offering data annotation services are fast-growing and play an important part in machine learning. While many multi-label prediction services are available, it is challenging for consumers to decide which services to use for their own tasks and budgets due to the heterogeneity in those services' labeling categories, labeling quality and price. In this paper, we focus on a practical problem of obtaining high-quality multi-label annotation data from multiple services within a budget constraint. We propose a framework that firstly parameterizes the labeling generation based on the constructed Probabilistic Graph Model, and designs an Expectation-Maximization(EM)-based iteration algorithm to estimate the service labeling quality and task truth distribution. Then we transform the annotation service selection strategy into an adaptive submodular maximization coverage problem, which motivates us to design a adaptive random greedy algorithm with a constant approximation ratio $1-1/e$. We evaluate our design on both real-world experiments and a series of simulations on various machine learning models and real datasets. These experiments will show that our method has more accuracy and reliability improvements.

###### Session Chair

Celimuge Wu, The University of Electro-Communications, Japan

Session Session 2

## Federated Learning

Conference
2:00 PM — 3:10 PM CEST
Local
Jun 10 Fri, 8:00 AM — 9:10 AM EDT

### How Asynchronous can Federated Learning Be?

Ningxin Su and Baochun Li (University of Toronto, Canada)

2
As a practical paradigm designed to involve large numbers of edge devices in distributed training of deep learning models, federated learning has witnessed a significant amount of research attention in recent years. Yet, most existing mechanisms on federated learning assumed either fully synchronous or asynchronous communication strategies between clients and the federated learning server. Existing designs that were partially asynchronous in their communication were simple heuristics, and were evaluated using the number of communication rounds or updates required for convergence, rather than the wall-clock time in practice.

In this paper, we seek to explore the entire design space between fully synchronous and asynchronous mechanisms of communication. Based on insights from our exploration, we pro- pose PORT, a new partially asynchronous mechanism designed to allow fast clients to aggregate asynchronously, yet without waiting excessively for the slower ones. In addition, PORT is designed to adjust the aggregation weights based on both the staleness and divergence of model updates, with provable convergence guarantees. We have implemented PORT and its leading competitors in ANONYMOUS, an open-source scalable federated learning research framework designed from the ground up to emulate real-world scenarios. With respect to the wall-clock time it takes for converging to the target accuracy, PORT outperformed leading competitors, including FedBuff and FedAsync, by at least 31% in our experiments.

### Federated Graph Learning with Periodic Neighbour Sampling

Bingqian Du and Chuan Wu (The University of Hong Kong, Hong Kong)

1
Graph Convolutional Networks (GCN) proposed recently have achieved promising results on various graph learning tasks. Federated learning (FL) for GCN training is needed when learning from geo-distributed graph datasets. Existing FL paradigms are inefficient for geo-distributed GCN training since neighbour sampling across geo-locations will soon dominate the whole training process and consume large WAN bandwidth. We derive a practical federated graph learning algorithm, carefully striking the trade-off among GCN convergence error, wall-clock runtime, and neighbour sampling interval. Our analysis is divided into two cases according to the budget for neighbour sampling. In the unconstrained case, we obtain the optimal neighbour sampling interval, that achieves the best trade-off between convergence and runtime; in the constrained case, we show that determining the optimal sampling interval is actually an online problem and we propose a novel online algorithm with bounded competitive ratio to solve it. Combining the two cases, we propose a unified algorithm to decide the neighbour sampling interval in federated graph learning, and demonstrate its effectiveness with extensive simulation over graph datasets from real applications.

### Adaptive Clustered Federated Learning for Clients with Time-Varying Interests

Ne Wang, Ruiting Zhou, Lina Su and Guang Fang (Wuhan University, China); Zongpeng Li (Tsinghua University, China)

1
Clustered Federated Learning (FL) addresses heterogeneous objectives from different client groups, by capturing the intrinsic relationship between data distributions of clients. This work aims to minimize the completion time of clustered FL training while guaranteeing convergence, given the following challenges. First, clients' data distributions are not static since their interests are usually time-varying. Obsolete data may incur training failures, requiring detection of distribution changes at runtime. Second, even with the same distribution, client datasets may have different contributions to model accuracy. Besides, the training data typically arrive at clients dynamically, which brings uncertainties to assessing the quality of client data. Third, the execution environments of clients and networks are often unstable and stochastic, leading to uncertainties in calculating computation and communication time. Given the above challenges, we propose {\em Acct} with two innovations: i) change detection: we first model the time-varying interests of clients as piecewise stationary based on practical observations, then apply generalized likelihood ratio detectors to FL for detecting changes in client distributions; ii) client selection: we adopt the multi-armed bandit (MAB) technique to account for the uncertainties in measuring data quality, computation and communication time. Based on the upper confidence bound (UCB) method, we construct a novel double UCB'' policy to adaptively select clients with high data quality and low computation and communication overhead. We rigorously prove the convergence of {\em Acct} and sub-linear regret regarding the proposed client selection policy. Finally, we implement {\em Acct} using PyTorch and conduct experiments showing that {\em Acct} reduces the completion time by almost 70\% compared with three state-of-the-art FL frameworks.

### FedSyL: Computation-Efficient Federated Synergy Learning on heterogeneous IoT devices

Hui Jiang (Institute of Computing Technology, Chinese Academy of Sciences & University of Chinese Academy of Sciences, China); Min Liu and Sheng Sun (Institute of Computing Technology, Chinese Academy of Sciences, China); Yuwei Wang (Institute of Computing Technology Chinese Academy of Sciences, China); Xiaobing Guo (Lenovo Corporate Research & Development, China)

1
As a popular privacy-preserving model training technique, Federated Learning (FL) enables multiple end-devices to collaboratively train Deep Neural Network (DNN) models without exposing local privately-owned data. According to the FL paradigm, resource-constrained end-devices in IoT should perform model training which is computation-intensive, whereas the edge server occupied with powerful computation capability only performs model aggregation. Due to the above unbalanced computation pattern, IoT-oriented FL is time-consuming and inefficient. In order to alleviate the computation burden of end-devices, recent countermeasures introduce the edge server to assist end-devices in model training. However, existing works neither efficiently address the computation heterogeneity across end-devices nor reduce the leakage risk of data privacy. To this end, we propose a Federated Synergy Learning (FedSyL) paradigm which innovatively strikes a balance between training efficiency and data leakage risk. We explore the complicated relationship between the local training latency and multi-dimensional training configurations, and design a uniform training latency prediction method by applying the polynomial quadratic regression analysis. Additionally, we design the optimal model offloading strategy with the consideration of resource limitation and computation heterogeneity of end-devices, so as to accurately assign capability-matched device-side sub-models for heterogeneous end-devices. We implement FedSyL on a real test-bed comprising multiple heterogeneous end-devices. Experimental results demonstrate the superiority of FedSyL on training efficiency and privacy protection.

###### Session Chair

Baochun Li, University of Toronto, Canada

Session Session 3

## Inter-Networking Communications

Conference
3:20 PM — 4:30 PM CEST
Local
Jun 10 Fri, 9:20 AM — 10:30 AM EDT

### Towards Sustainable Multi-Tier Space Networking for LEO Satellite Constellations

Yi Ching Chou (Simon Fraser University, Canada); Xiaoqiang Ma (Douglas College, Canada); Feng Wang (University of Mississippi, USA); Sami Ma (Simon Fraser University, Canada); Sen Hung Wong (Hong Kong University of Science and Technology, Hong Kong); Jiangchuan Liu (Simon Fraser University, Canada)

1
For the recent two years, companies such as Starlink, Kuiper, and Telesat are launching low earth orbit (LEO) satellites to form LEO satellite mega-constellations. Unfortunately, the LEO satellite mega-constellations are not sustainable in the long term since their large size makes LEO congested, causing issues such as satellite brightness, satellite conjunction, and space debris. The issues become worse as LEO satellites have shorter battery lifespans and experience drag force, which shortens the satellite life and produces more space debris objects when LEO satellites reach the end of life. To address the issues, we propose deploying higher-orbit satellites to form a sustainable multi-tier satellite network (SMTSN) instead of launching a massive number of LEO satellites. In this paper, we model the costs and gains for routing traffic with our SMTSN framework. We propose a solution to find the optimal routing paths and an efficient distributed coverage-aware (EDCA) algorithm to predict the number of skipped LEO satellites when the traffic is routed through a higher-orbit satellite. We run extensive simulations to compare the LEO satellite constellations with and without our SMTSN framework, and the results show a significant improvement in the battery cell cycle life consumption with the SMTSN framework.

### DoCile: Taming Denial-of-Capability Attacks in Inter-Domain Communications

Marc Wyss, Giacomo Giuliari and Markus Legner (ETH Zürich, Switzerland); Adrian Perrig (ETH Zürich Switzerland & Carnegie Mellon University, USA)

1
In recent years, much progress has been made in the field of Internet bandwidth reservation systems. While early designs were neither secure nor scalable, newer proposals promise attack resilience and Internet-wide scalability by using cryptographic access tokens (capabilities) that represent permissions to send at a guaranteed rate. Once a capability-based bandwidth reservation is established, the corresponding traffic is protected from both naturally occurring congestion and distributed denial-of-service attacks, with positive consequences on the end-to-end quality of service (QoS) of the communication. However, high network utilization, possibly caused by adversaries, can still preclude the initial unprotected establishment of capabilities. To prevent such denial-of-capability (DoC) attacks, we present DoCile, a framework for the protection of capability establishment on Internet paths, irrespective of network utilization. We believe that DoCile, deployed alongside a capability-based bandwidth reservation system, can be the foundation of the next generation of secure and scalable QoS protocols.

### Geographic Low-Earth-Orbit Networking without QoS Bottlenecks from Infrastructure Mobility

Lixin Liu, Hewu Li, Yuanjie Li, Zeqi Lai, Yangtao Deng and Yimei Chen (Tsinghua University, China); Wei Liu (Tsinghua, China); Qian Wu (Tsinghua University, China)

2
Low-earth-orbit (LEO) satellite mega-constellations promise broadband, low-latency network infrastructure from space for terrestrial users in remote areas. However, they face new QoS bottlenecks from infrastructure mobility due to the fast-moving LEO satellites and earth's rotations. Both cause frequent space-ground link churns and challenge the network latency, bandwidth, and availability at the global scale. Today's LEO networks mask infrastructure mobility with fixed anchors (ground stations) but cause single-point bandwidth/latency bottlenecks. Instead, we design LBP to remove the LEO network's QoS bottlenecks from infrastructure mobility. LBP removes remote terrestrial fixed anchors via geographic addressing for shorter latencies and more bandwidth. It adopts local, orbit direction-aware geographic routing to avoid global routing updates for high network availability. LBP further shortens the routing paths by refining handover policies by satellites' orbital directions. Our experiments in controlled testbeds and trace-driven emulations validate LBP's 1.64× network latency reduction, 9.66× more bandwidth, and improve network availability to 100%.

### FABMon: Enabling Fast and Accurate Network Available Bandwidth Estimation

Tao Jin (Tsinghua Shenzhen International Graduate School, China); Weichao Li and Qing Li (Peng Cheng Laboratory, China); Qianyi Huang (Southern University of Science and Technology & Peng Cheng Laboratory, China); Yong Jiang (Graduate School at Shenzhen, Tsinghua University, China); Shutao Xia (Tsinghua University, China)

1
Characterizing the end-to-end network available bandwidth (ABW) is an important but challenging task. Although a number of ABW estimation tools have been introduced over the past two decades, applying them to the real-world networks is still difficult because of the biased results, heavy load, and long measurement time. In this paper, we propose a novel Burst Queue Recovery (BQR) model to infer the ABW.
BQR first induces an instant network congestion and then observes the one-way delay (OWD) variation until the tight link recovers from the congestion. By correlating the OWDs with the queue length variation, BQR can calculate the ABW accurately. Compared to the traditional probe gap model (PGM) and probe rate model (PRM), our theoretical analysis and simulations show that BQR is more tolerant to the transient traffic burst and supports the scenarios with multiple congestible links. Based on the model, we build FABMon, a fast and accurate ABW estimation tool. Our experiments show that FABMon can measure ABW within 50 milliseconds, and achieve much more accurate measurement results than the existing tools with a very small volume of probe packets.

###### Session Chair

Foivos Ioannis Michelinakis, SimulaMet, Norway

Session Session 4

## Congestion Control

Conference
4:30 PM — 5:40 PM CEST
Local
Jun 10 Fri, 10:30 AM — 11:40 AM EDT

### A Control-Theoretic and Online Learning Approach to Self-Tuning Queue Management

Jiancheng Ye (Huawei, Hong Kong); Kechao Cai (Sun Yat-Sen University, China); Dong Lin (Huawei, Hong Kong); Jiarong Li (Tsinghua University, China); Jianfei He (City University of Hong Kong, Hong Kong); John C.S. Lui (The Chinese University of Hong Kong, Hong Kong)

3
There is a growing trend that network applications not only require higher throughput, but also impose stricter delay requirements. The current Internet congestion control, which is driven by active queue management (AQM) algorithms interacting with the Transmission Control Protocol (TCP), has been playing an important role in supporting network applications. However, it still exhibits many open issues. Most of AQM algorithms only deploy a single-queue structure that cannot differentiate flows and easily leads to unfairness. Moreover, the parameter settings of AQM are often static, making them difficult to adapt to the dynamic network environments. In this paper, we propose a general framework for designing "self-tuning" queue management (SQM), which is adaptive to the changing environments and provides fair congestion control among flows. We first present a general architecture of SQM with fair queueing and propose a general fluid model to analyze it. To adapt to the stochastic environments, we formulate a stochastic network utility maximization (SNUM) problem, and utilize online convex optimization (OCO) and control theory to develop a distributed SQM algorithm which can self-tune different queue weights and control parameters. Numerical and packet-level simulation results show that our SQM algorithm significantly improves queueing delay and fairness among flows.

### When Power-of-d-Choices Meets Priority

Jianyu Niu (Southern University of Science and Technology, China); Chunpu Wang (Canada); Chen Feng (University of British Columbia, Canada); Hong Xu (The Chinese University of Hong Kong, Hong Kong)

0
Power-of-d-choices (Pod) is a popular load balancing strategy, which has received much attention from both academia and industry. However, much prior work on Pod has focused on uniform tasks without priorities. In reality, tasks may have different priorities according to their service sensitivity, pricing or importance. In this work, we distinguish two types of priorities in Pod: scheduling and service priorities. We propose Pod-SSP, which is a Pod algorithm with Scheduling and Service Priorities. To better understand the impact of priorities on the performance of tasks, we consider two simple variants of Pod-SSP: Pod with SCheduling Priorities (Pod-SCP) and Pod with SErvice Priorities (Pod-SEP). Utilizing mean-field approximation, we systematically study the performance of these protocols in the large-system regime. Our theoretical and simulation results show that high-priority tasks can have a more than 3x better delay relative to a system running the original Pod algorithm, and meanwhile, low-priority tasks only slightly sacrifice its delay.

### On the Joint Optimization of Function Assignment and Communication Scheduling toward Performance Efficient Serverless Edge Computing

Yuepeng Li and Deze Zeng (China University of Geosciences, China); Lin Gu (Huazhong University of Science and Technology, China); Kun Wang (University of California, Los Angeles, USA); Song Guo (The Hong Kong Polytechnic University, Hong Kong)

1
Serverless edge computing is booming as an efficient carrier of deploying complex applications composed of dependent functions, whose assignment decisions highly influence the application performance. Although similar problem has been widely studied, none of existing approaches considers the diversity of communication styles, which is specially introduced in serverless computing and also imposes high influence to the performance efficiency. We compare two communication styles, called direct-passing and remote-storage, to transmit intermediate data between functions. We find that there is no single communication style that can prevail under all scenarios and the optimal selection depends on several factors, such as fanout degree, data size, and network bandwidth. Hence, how to select the appropriate communication style for each inter-function communication link, together with the function assignment decision, is essential to the application performance. To this end, we propose a Priority- based ASsignment and Selection (PASS) algorithm with joint consideration of function assignment and communication style selection. We theoretically analyze the approximation ratio of PASS algorithm and extensive experiments on real-world applications show that PASS can averagely reduce the completion time by 24.1% in comparison with state-of-the-art approaches.

### Congestion-Aware Modeling and Analysis of Sponsored Data Plan from End User Perspective

Yi Zhao and Qi Tan (Tsinghua University, China); Xiaohua Xu (University of Science and Technology of China, China); Hui Su (Tsinghua University, China); Dan Wang (The Hong Kong Polytechnic University, Hong Kong); Ke Xu (Tsinghua University, China)

1
The past decade has witnessed the rapid expansion of demands for mobile traffic, while the traditional mobile traffic pricing schemes cannot suit for such demands. Sponsored Data Plan (SDP), which can increase the revenue of all stakeholders in the market through transferring some of the revenue from content providers (CPs) to end users (EUs), is more suitable. However, existing studies have focused more on Internet Service Providers (ISPs) and CPs, ignoring the influence of EUs (e.g., the inherent attribute differences of EUs and the interaction among EUs) on the market under SDP. Regarding the difficulty of modeling the abstract property about interaction among EUs, we utilize network congestion as the medium and construct the congestion-aware SDP model based on Stackelberg game. The newly proposed model can not only analyze how network congestion affects SDP mechanism, but also elucidate the impact of interactions among EUs. More specifically, through theoretical analysis, we prove that there is a unique dynamic equilibrium in the interaction among EUs (i.e., the traffic consumption of different EUs). By taking into account network congestion, the newly proposed model also more accurately and realistically describes the optimal strategies and computation methods of all stakeholders in the market. Moreover, simulation experiments demonstrate that the positive effect brought by SDP is not as obvious as before, and EUs influence each other instead of being independent of each other. Overall, this paper emphasizes the non-negligible influence of EUs and promotes a deeper understanding of SDP mechanism, which can guide the relevant stakeholders to optimize their own decision-making details.

###### Session Chair

Deze Zeng, China University of Geosciences