IEEE INFOCOM 2024

YinYangRAN: Resource Multiplexing in GPU-Accelerated Virtualized RANs

Leonardo Lo Schiavo (Universidad Carlos III de Madrid & IMDEA Networks Institute, Spain); Jose A. Ayala-Romero (NEC Laboratories Europe GmbH, Germany); Andres Garcia-Saavedra (NEC Labs Europe, Germany); Marco Fiore (IMDEA Networks Institute, Spain); Xavier Costa-Perez (ICREA and i2cat & NEC Laboratories Europe, Spain)

0

RAN virtualization is revolutionizing the telco industry, enabling 5G Distributed Units to run using general-purpose platforms equipped with Hardware Accelerators (HAs). Recently, GPUs have been proposed as HAs, hinging on their unique capability to execute 5G PHY operations efficiently while also processing Machine Learning (ML) workloads. While this ambivalence makes GPUs attractive for cost-effective deployments, we experimentally demonstrate that multiplexing 5G and ML workloads in GPUs is in fact challenging, and that using conventional GPU-sharing methods can severely disrupt 5G operations. We then introduce YinYangRAN, an innovative O-RAN-compliant solution that supervises GPU-based HAs so as to ensure reliability in the 5G processing pipeline while maximizing the throughput of concurrent ML services. YinYangRAN performs GPU resource allocation decisions via a computationally-efficient approximate dynamic programming technique, which is informed by a neural network trained on real-world measurements. Using workloads collected in real RANs, we demonstrate that YinYangRAN can achieve over 50% higher 5G processing reliability than conventional GPU sharing models with minimal impact on co-located ML workloads. To our knowledge, this is the first work identifying and addressing the complex problem of HA management in emerging GPU-accelerated vRANs, and represents a promising step towards multiplexing PHY and ML workloads in mobile networks.

Speaker

Speaker biography is not available.

A Lightweight Path Validation Scheme in Software-Defined Networks

Bing Hu and Yuanguo Bi (Northeastern University, China); Kui Wu (University of Victoria, Canada); Rao Fu (Northeastern University & No Company, China); Zixuan Huang (Northeastern University, China)

0

Software-Defined Networks (SDN) imbue traditional networks with unmatched agility and programmability by segregating the control and data planes. However, this separation enables adversaries to tamper with data plane forwarding behaviours, thereby violating control plane policies and overall security guidelines. In response, we propose a Lightweight Path Validation Scheme (L-PVS) tailored for SDN. Firstly, we put forth a streamlined packet forwarding path validation scheme that verifies the paths traversed by packets, alongside a theoretical analysis of this validation process. Subsequently, we amplify the scheme with a network flow path validation to boost the validation efficiency. To alleviate the storage load on switches during this flow path validation, we advocate for a storage optimization method that aligns switch storage overhead with network flows rather than individual packets. Furthermore, we formulate a path partition scheme and introduce a Greedy-based KeySwitch Node Selection Algorithm (GKSS) to pinpoint optimal switches for path partition, significantly reducing overall data plane storage usage. Lastly, we propose an anomaly switch identification method utilizing temporary KeySwitch nodes when unexpected forwarding behaviours emerge. Evaluation results verify that L-PVS facilitates path validation with a reduced validation header size, while also minimizing the impact on processing delay, and switch storage overhead.

Speaker

Speaker biography is not available.

CloudPlanner: Minimizing Upgrade Risk of Virtual Network Devices for Large-Scale Cloud Networks

Xin He, Enhuan Dong and Jiahai Yang (Tsinghua University, China); Shize Zhang (Alibaba Group, China); Zhiliang Wang (Tsinghua University, China); Zejie Wang (Alibaba Group, China); Ye Yang (Alibaba Cloud & Zhejiang University, China); Jun Zhou, Xiaoqing Sun, Enge Song, Jianyuan Lu and Biao Lyu (Alibaba Group, China); Shunmin Zhu (Tsinghua University and Alibaba Group, China)

0

Cloud networks continuously upgrade softwarized virtual network devices (VNDs) to meet evolving tenant demands. However, such upgrades may result in unexpected failures. An intuitive idea to prevent upgrade failures is to resolve all compatibility issues before deployment, but it is impractical to replicate all deployed VND cases and test them with lots of replayed real traffic for the VND developers. As a result, the operations team takes upgrade risk to test upgrades by gradually deploying them. Although careful upgrade schedule planning is the most common method to minimize upgrade risk, to the best of our knowledge, no VND upgrade schedule planning scheme has been adequately studied for large-scale cloud networks. To fill this gap, we propose CloudPlanner, the first VND upgrade schedule planning scheme aiming to minimize the VND upgrade risk for large-scale cloud networks. CloudPlanner prioritizes upgrading VNDs that are more likely to trigger failures based on expert knowledge and historical failure-trigger VND properties and limits the number of tenants associated with simultaneously upgraded VNDs. We also propose a heuristic solver which can quickly and greedily plan schedules. Using real-world data from production environments, we demonstrate the benefits of CloudPlanner through extensive experiments.

Speaker

Speaker biography is not available.

A Practical Near Optimal Deployment of Service Function Chains in Edge-to-Cloud Networks

Rasoul Behravesh (Fondazione Bruno Kessler, Italy); David Breitgand (IBM Research -- Haifa, Israel); Dean H Lorenz (IBM Research - Haifa, Israel); Danny Raz (Technion - Israel Institute of Technology & Google, Israel)

0

Mobile edge computing opens a plethora of opportunities to develop new applications, offering much better quality of experience to the users. A fundamental problem that has been thoroughly studied in this context is deployment of Service Function Chains (SFCs) into a physical network on the spectrum between edge and cloud. This problem is known to be NP-hard. Because of its practical importance high quality sub-optimal solutions are of great interest.

In this paper, we consider this well known problem and propose a novel near-optimal heuristic that is extremely efficient and scalable. We evaluate our solution to the state-of-the-art heuristics and fractional optimum. In our large scale evaluations, we use realistic topologies which are previously reported in the literature. We demonstrate that the execution time offered by our solution grows slowly as the the number of the Virtual Network Function (VNF) forwarding graph embedding requests grows, and it handles one million requests in slightly more than 30 seconds for 80 node topology.

Speaker

Speaker biography is not available.

Session Chair

Vaji Farhadi (Bucknell University, USA)

Enter Zoom

Causality Correlation and Context Learning Aided Robust Lightweight Multi-Tab Website Fingerprinting Over Encrypted Tunnel

Siyang Chen and Shuangwu Chen (University of Science and Technology of China, China); Huasen He (Univerisity of Science and Technology of China, China); Xiaofeng Jiang, Jian Yang and Siyu Cheng (University of Science and Technology of China, China)

0

Encrypted tunnels are increasingly applied to privacy protection, however, a passive eavesdropper can still infer which website a user is visiting via website fingerprinting (WF). State-of-the-art WF suffers from several critical challenges in a realistic multi-tab scenario, where the number of concurrent tabs is dynamic and uncertain, training a separate model for each website is too overweight to deploy, and the robustness against the packet loss, duplication and disorder caused by dynamic network conditions is rarely considered. In this paper, we propose a robust lightweight multi-tab WF method, named RobustWF. Due to the causality relationship between user's request and website's response, RobustWF employs causality correlation to associate the interactive packets belonging to the same website together, which form a causality chain. Then, RobustWF utilizes context learning to capture the dependencies between the causality chains. The missing of some specific details does not have a significant impact on the overall structure of target web, thus enhancing the robustness of RobustWF. To make the model lightweight enough, RobustWF trains an integrated model to adapt to the dynamic number of concurrent tabs. The experimental results demonstrate that the accuracy of RobustWF improves 14% in dynamic multi-tab WF scenario compared to the State-of-the-art method.

Speaker Siyang Chen(University of Science and Technology of China)

Siyang Chen received the B.S. degree from the University of Science and Technology of China (USTC) in 2019. He is currently working toward the Ph.D. degree in the School of Information Science and Technology, USTC. His recent research interests include network security and website fingerprinting.

Thor: A Virtual Payment Channel Network Construction Protocol over Cryptocurrencies

Qiushi Wei and Dejun Yang (Colorado School of Mines, USA); Ruozhou Yu (North Carolina State University, USA); Guoliang Xue (Arizona State University, USA)

0

Although Payment Channel Network (PCN) has been proposed as a second-layer solution to the scalability issue of blockchain-based cryptocurrencies, most developed systems still struggle to handle the ever-increasing usage. Virtual payment channel (VPC) has been proposed as an off-chain technique that avoids the involvement of intermediaries for payments in a PCN. However, there is no research on how to efficiently construct VPCs while considering the characteristics of the underlying PCN. To fill this void, this paper focuses on the VPC construction in a PCN. More specifically, we propose a metric, Capacity to the Number of Intermediaries Ratio (CNIR), to consider both the capacity of the constructed VPC and the collateral locked by the involved users. We first study the VPC construction problem for a single pair of users and design an efficient algorithm that achieves the optimal CNIR. Based on this, we propose Thor, a protocol that constructs a virtual payment channel network for multiple pairs. Evaluation results show that Thor can construct VPCs with maximum CNIR in single-pair cases and efficiently construct VPCs with high CNIRs for multi-pair cases, compared to baseline algorithms.

Speaker

Speaker biography is not available.

vCrypto: a Unified Para-Virtualization Framework for Heterogeneous Cryptographic Resources

Shuo Shi (Shanghai Jiao Tong University, China); Chao Zhang (Alibaba Group, China); Zongpu Zhang and Hubin Zhang (Shanghai Jiao Tong University, China); Xin Zeng and Weigang Li (Intel, China); Junyuan Wang (Intel Asia-Pacific Research & Development Ltd., China); Xiantao Zhang and Yibin Shen (Alibaba Group, China); Jian Li and Haibing Guan (Shanghai Jiao Tong University, China)

0

Transport Layer Security (TLS) connections involve costly cryptographic operations which incur significant resource consumption in the cloud. Hardware accelerators are affordable substitutes of expensive CPU cores to accommodate with the constantly increasing security requirements of datacenters. Existing accelerators virtualization mainly relies on passthrough of Single Root I/O Virtualization (SR-IOV) devices. However, deficiency of service accessibility, functionality and availability make device passthrough not an optimal solution for heterogeneous accelerators with different capabilities. To make up the gap, we propose vCrypto, a unified para-virtualization framework for heterogeneous cryptographic resources. vCrypto supports stateful crypto requests offloading and result retrieval with session lifecycle management and event driven notification. vCrypto transparently integrates virtual crypto device capabilities into the OpenSSL framework to benefit existing applications that are based on crypto library APIs without modification. Multiple physical resources can be partitioned flexibly and scheduled cooperatively to enhance the functionality, performance and robustness of virtual crypto service. Finally, vCrypto achieves an optimized performance with two layers polling and memory sharing mechanism. The comprehensive experiments show that with the same cryptographic resources used, vCrypto framework can provide 2.59x to 3.36x higher AES-CBC-HMAC-SHA1 throughput compared to passthrough SR-IOV device.

Speaker Shuo Shi (Shanghai Jiao Tong University)

Efficient and Straggler-Resistant Homomorphic Encryption for Heterogeneous Federated Learning

Nan Yan, Yuqing Li and Jing Chen (Wuhan University, China); Xiong Wang (Huazhong University of Science and Technology, China); Jianan Hong (Shanghai Jiao Tong University, China); Kun He (Wuhan University, China); Wei Wang (Hong Kong University of Science and Technology, Hong Kong)

1

Cross-silo federated learning (FL) enables multiple institutions (clients) to collaboratively build a global model without sharing their private data. To prevent privacy leakage during aggregation, homomorphic encryption (HE) is widely used to encrypt mode updates, yet incurs high computation and communication overheads. To reduce these overheads, packed HE (PHE) has been proposed to encrypt multiple plaintexts into a single ciphertext. However, the original design of PHE does not consider the heterogeneity among different clients, an intrinsic problem in cross-silo FL, often resulting in undermined training efficiency with slow convergence and stragglers. In this work, we propose FedPHE, an efficiently packed homomorphically encrypted FL framework with secure weighted aggregation and client selection to tackle the heterogeneity problem. Specifically, using CKKS with sparsification, FedPHE can achieve efficient encrypted weighted aggregation by accounting for contributions of local updates to the global model. To mitigate the straggler effect, we devise a sketching-based client selection scheme to cherry-pick representative clients with heterogeneous models and computing capabilities. We show, through rigorous security analysis and extensive experiments, that FedPHE can efficiently safeguard clients' privacy, achieve a training speedup of 1.85-4.44\times, cut the communication overhead by 1.24-22.62\times, and reduce the straggler effect by up to 1.71-2.39\times.

Speaker Nan Yan (Wuhan University)

Nan Yan received the B.S. degree from the School of Cyber Science and Engineering, Shandong University, Tsingtao, China, in 2023. He is currently pursuing the M.S. degree with the School of Cyber Science and Engineering in Wuhan University, Wuhan, China. His current research interests include federated learning, and privacy-preserving computing.

Session Chair

Aveek Dutta (University at Albany, SUNY, USA)

Enter Zoom

A Parallel Algorithm and Scalable Architecture for Routing in Benes Networks

Rami Zecharia and Yuval Shavitt (Tel-Aviv University, Israel)

0

Benes/CLOS architectures are common scalable interconnection networks widely used in backbone routers, data centers, on-chip networks, multi-processor systems, and parallel computers. Recent advances in Silicon Photonic technology, especially MZI technology, have made Benes networks a very attractive scalable architecture for optical circuit switches. Numerous routing algorithms for Benes networks were developed starting with linear algorithms having time complexity of $O(N\log_{2}N)$ steps. Parallel routing algorithms were developed to satisfy the stringent timing requirements of high-performance switching networks and have time complexity of $O((\log_{2}N)^{2})$. However, their implementation requires $O(N^2\log_{2}N)$ wires (termed connectivity complexity), and thus are difficult to scale. We present a new routing algorithm for Benes networks combined with a scalable hardware architecture that supports full and partial input permutations. The processing time of the algorithm is limited to $O((\log_{2}N)^{2})$ steps (iterations) by potentially forfeiting routing of a few input demands; however guarantees close to 100\% utilization for both full and partial input permutations. The algorithm and architecture allow a reduction of the connectivity complexity to $O(N^{2})$, a $\log N$ improvement over previous solutions. We prove the algorithm correctness, and analyze its performance analytically and with large scale simulations.

Speaker

Speaker biography is not available.

Nonblocking Conditions for Flex-grid OXC-Clos Networks

Yibei Yao (Shanghai Jiao Tong University, China); Tong Ye (Shanghai JiaoTong University, China); Ning Deng (Huawei Technologies Co., Ltd., China)

0

The emerging multi-fiber optical networks makes it urgent to design large-scale flexible mesh optical cross-connects (OXCs). Though Clos network is the theory for building scalable and cost-effective switching fabrics, the nonblocking conditions of flex-grid optical Clos networks without wavelength conversion remain unknown. This paper studies the nonblocking conditions for the flex-grid OXC-Clos network that is constructed from a number of small-size standard OXCs. We first show that a strictly nonblocking (SNB) OXC-Clos network will incur a high cost, as small-granularity lightpaths may abuse central modules (CMs), rendering them unavailable for large-granularity requests due to frequency conflicts. We thus propose a granularity differential routing (GDR) strategy, the idea of which is to restrict the set of CMs that can be used by the lightpaths of each granularity. Under the GDR strategy, we investigate two system models, granularity-port binding and unbinding models, and prove the wide-sense nonblocking (WSNB) conditions for OXC-Clos networks. We show that the cost of WSNB networks is remarkably smaller than that of SNB networks, and find that the second model can lead to more flexible network-bandwidth utilization than the first model only at a small cost of switching fabrics.

Speaker Yibei Yao (Shanghai Jiao Tong University)

DDR: A Deadline-Driven Routing Protocol for Delay Guaranteed Service

Pu Yang, Tianfang Chang and Lin Cai (University of Victoria, Canada)

0

Time-sensitive applications have become increasingly prevalent in modern networks, necessitating the development of delay-guaranteed routing (DGR) solutions. Finding an optimal DGR solution remains a challenging task due to the NP-hard nature of the problem and the dynamic nature of network traffic. In this paper, we propose Deadline-Driven Routing (DDR), a distributed traffic-aware adaptive routing protocol that addresses the DGR problem. Inspired by online navigation techniques, DDR leverages real-time traffic conditions to optimize routing decisions and ensure on-time packet delivery. By combining network topology-based path generation with real-time traffic knowledge, each router can adjust the forwarding directions of packets to accommodate their heterogeneous latency requirements. Comprehensive simulations on real network topologies demonstrate that DDR can consistently provide delay-guaranteed service in different network topologies with different traffic conditions. Moreover, DDR ensures backward compatibility with legacy devices and existing routing protocols, making it a viable solution for supporting delay-guaranteed service.

Speaker

Speaker biography is not available.

Efficient Algorithm for Region-Disjoint Survivable Routing in Backbone Networks

Erika R. Bérczi-Kovács (ELTE Eötvös Loránd University, Hungary); Péter Gyimesi (Eötvös Loránd University, Hungary); Balázs Vass and János Tapolcai (Budapest University of Technology and Economics, Hungary)

0

Survivable routing is crucial in backbone networks to ensure connectivity, even during failures. At network design, groups of network elements prone to potential failure events are identified. These groups are referred to as Shared Risk Link Groups (SRLGs), and if they are a set of links intersected by a connected region of the plane, we call them regional-SRLGs. A recent study has presented a polynomial-time algorithm for finding a maximum number of regional-SRLG-disjoint paths between two given nodes in a planar topology, with the paths being node-disjoint. However, existing algorithms for this problem are not practical due to their runtime and implementation complexities.

This paper investigates a more general model, the maximum number of non-crossing, regional-SRLG-disjoint paths problem. It introduces an efficient and easily implementable algorithmic framework, leveraging an arbitrarily chosen shortest path finding subroutine for graphs with possibly negative weights. Depending on the subroutine chosen, the framework improves the previous worst-case runtime complexity, or can solve the problem w.h.p. in near-linear expected time.
The proposed framework enables the first additive approximation for a more general NP-hard version of the problem, where the objective is to find the maximum number of regional-SRLG-disjoint paths. We validate our findings through extensive simulations.

Speaker

Speaker biography is not available.

Session Chair

Javier Gomez (National University of Mexico, Mexico)

Enter Zoom

MultiHGR: Multi-Task Hand Gesture Recognition with Cross-Modal Wrist-Worn Devices

Mengxia Lyu, Hao Zhou, Kaiwen Guo and Wangqiu Zhou (University of Science and Technology of China, China); Xingfa Shen (Hangzhou Dianzi University, China); Yu Gu (University of Electronic Science and Technology of China, China)

0

Hand gesture recognition (HGR) is essential for human-machine interaction. Although the existing solutions achieve good performance in specific tasks, they still face challenges when users navigate through different application contexts, i.e., demanding multi-task ability to support newly arrived HGR tasks. In this paper, we propose the first IMU-vision based system hosted on wrist-worn devices to support multi-task HGR, denoted as MultiHGR. The system introduces a novel two-stage training strategy, i.e., task-agnostic stage to align cross-modal features from unlabeled arbitrary gesture through contrastive learning, and task-related stage to learn modality contributions with limited labeled data in specific tasks through self-attention mechanism. Since only the second task-related stage should be executed for each new task, MultiHGR could accommodate multiple tasks with significant reduced training cost and storage requirement. The evaluation results on three HGR tasks demonstrates that MultiHGR reduces 64.92% training time, and 24.04% storage as compared with traditional multimodal single-task models, and MultiHGR outperforms unimodal single-task models with 14.37%, 19.28%, and 31% improvements in these three tasks, respectively. As compared with state-of-the-art multimodal single-task model, MultiHGR achieves average 6.35% accuracy improvement, along with 65.74% training time reduction.

Speaker Mengxia Lyu (University of Science and Technology of China)

Mengxia Lyu earned her B.S. degree in Computer Science and Technology from East China University of Science and Technology in 2022. She is currently pursuing her M.S. degree in Computer Technology at the University of Science and Technology of China. Her research focus revolves around Intelligent Sensing, indicating her profound interest in this field.

Neural Enhanced Underwater SOS Detection

Qiang Yang and Yuanqing Zheng (The Hong Kong Polytechnic University, Hong Kong)

0

Every day, one person loses his life due to drowning in swimming pools, even with professional lifeguards present. Contrary to what the public might assume, drowning swimmers can hardly splash or yell for help. This life-threatening situation calls for a robust SOS channel between the swimmers and the lifeguards. This paper proposes Neusos, a neural-enhanced underwater SOS communication system based on commercial wearable devices and low-cost hydrophones deployed in the swimming pool. Specifically, we repurpose popular wearable devices (eg, smartwatches) as SOS transmitters, allowing swimmers to activate a distress signal by simply pressing one smartwatch button. In response, several underwater hydrophones in the swimming pool can detect SOS signals and alert lifeguards on duty immediately, enabling them to provide timely assistance. The main technical challenge lies in reliably detecting weak SOS signals in non-stationary underwater scenarios. To achieve so, we thoroughly characterize the property of underwater channels and examine the limitation of the traditional correlation-based signal detection method in underwater communication scenarios. Based on our empirical findings, we developed a robust SOS detection method enhanced with deep learning. By fully embedding visual hints into networks, Neusos outperforms state-of-the-art signal processing-based underwater SOS detection methods.

Speaker Qiang Yang (University of Cambridge)

Qiang Yang is a Postdoc at the University of Cambridge. Previously, he obtained his PhD degree from The Hong Kong Polytechnic University in 2023. His research interest includes acoustic sensing, smart health, and ubiquitous computing.

Hybrid Zone: Bridging Acoustic and Wi-Fi for Enhanced Gesture Recognition

Mengning Li (North Carolina State University, USA); Wenye Wang (NC State University, USA)

0

Gesture recognition possesses a vast potential for its application in the realms of human-computer interaction and virtual reality. The prevalent use of gesture recognition in domestic environments via Wi-Fi and acoustic sensing offers clear advantages for implementation. However, current techniques present significant challenges: acoustic sensing is vulnerable to environmental disturbances, whereas Wi-Fi necessitates prior knowledge of user's location for extracting features independent of the environment. To overcome these constraints, multimodal fusion appears as an effective solution, capitalizing on the complementary nature of these limitations.

Despite the promising performance shown by learning-based methods in facilitating multimodal fusion, they suffer from a lack of theoretical explanation for the integration of multimodal features. To address this gap, we introduce the concept of the "hybrid zone" in this paper. This theoretical model illuminates the process of merging acoustic and Wi-Fi sensing techniques. The "hybrid zone" model elucidates both the global perspective, which entails the fusion of acoustic and Wi-Fi sensing regions, and the local perspective, which involves the synthesis of acoustic and Wi-Fi fine-grained velocities.

Speaker

Speaker biography is not available.

HearBP: Hear Your Blood Pressure via In-ear Acoustic Sensing Based on Heart Sounds

Zhiyuan Zhao and Fan Li (Beijing Institute of Technology, China); Yadong Xie (Tsinghua University, China); Huanran Xie and Kerui Zhang (Beijing Institute of Technology, China); Li Zhang (HeFei University of Technology, China); Yu Wang (Temple University, USA)

1

In order to overcome the limitations of existing blood pressure (BP) measurement methods, we study the technology based on heart sounds, and find that the time interval between the first and second heart sounds (TIFS) is closely related to BP. Motivated by this, we propose HearBP, a novel BP monitoring system that utilizes in-ear microphones to collect bone-conducted heart sounds in the binaural canal. We first design a noise removing method based on U-net autoencoder-decoder to separate clean heart sounds from background noises. Then, we design a feature extraction method based on shannon energy and energy-entropy ratio to further mine the time domain and frequency domain features of heart sounds. In addition, combined with the principal component analysis algorithm, we achieve feature dimension reduction to extract the main features related to BP. Finally, we propose a network model based on dendritic neural regression to construct a mapping between the extracted features and BP. Extensive experiments with 41 participants show the average estimation error of 0.97mmHg and 1.61mmHg and the standard deviation error of 3.13mmHg and 3.56mmHg for diastolic pressure and systolic pressure, respectively. These errors are within the acceptable range specified by the FDA's AAMI protocol.

Speaker Zhiyuan Zhao (Beijing Institute of Technology, China)

Session Chair

Carla Fabiana Chiasserini (Politecnico di Torino, Italy)

Enter Zoom

Federated Analytics-Empowered Frequent Pattern Mining for Decentralized Web 3.0 Applications

Zibo Wang and Yifei Zhu (Shanghai Jiao Tong University, China); Dan Wang (The Hong Kong Polytechnic University, Hong Kong); Zhu Han (University of Houston, USA)

0

The emerging Web 3.0 paradigm aims to decentralize existing web services, enabling desirable properties such as transparency, incentives, and privacy preservation. However, current Web 3.0 applications supported by blockchain infrastructure still cannot support complex data analytics tasks in a scalable and privacy-preserving way. This paper introduces the emerging federated analytics (FA) paradigm into the realm of Web 3.0 services, enabling data to stay local while still contributing to complex web analytics tasks in a privacy-preserving way. We propose FedWeb, a tailored FA design for important frequent pattern mining tasks in Web 3.0. FedWeb remarkably reduces the number of required participating data owners to support privacy-preserving Web 3.0 data analytics based on a novel distributed differential privacy technique. The correctness of mining results is guaranteed by a theoretically rigid candidate filtering scheme based on Hoeffding's inequality and Chebychev's inequality. Two response budget saving solutions are proposed to further reduce participating data owners. Experiments on three representative Web 3.0 scenarios show that FedWeb can improve data utility by ~25.3% and reduce the participating data owners by ~98.4%.

Speaker Zibo Wang (Shanghai Jiao Tong Univ.)

Federated Offline Policy Optimization with Dual Regularization

Sheng Yue and Zerui Qin (Tsinghua University, China); Xingyuan Hua (Beijing Institute of Technology, China); Yongheng Deng and Ju Ren (Tsinghua University, China)

0

Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains. To overcome this challenge, this paper proposes a novel offline federated policy optimization algorithm, named DRPO, which enables distributed agents to collaboratively learn a decision policy only from private and static data without further environmental interactions. DRPO leverages dual regularization, incorporating both the local behavioral policy and the global aggregated policy, to judiciously cope with the intrinsic two-tier distributional shifts in offline FRL. Theoretical analysis characterizes the impact of the dual regularization on performance, demonstrating that by achieving the right balance thereof, DRPO can effectively counteract distributional shifts and ensures strict policy improvement in each federative learning round. Extensive experiments validate the significant performance gains of DRPO over baseline methods.

Speaker Sheng Yue (Tsinghua University)

Sheng Yue received his B.Sc. in mathematics (2017) and Ph.D. in computer science (2022), from Central South University, China. Currently, he is an assistant researcher with the Department of Computer Science and Technology, Tsinghua University, China. His research interests include network optimization, distributed learning, and reinforcement learning.

FedTC: Enabling Communication-Efficient Federated Learning via Transform Coding

Yixuan Guan, Xuefeng Liu and Jianwei Niu (Beihang University, China); Tao Ren (Institute of Software Chinese Academy of Sciences, China)

0

Federated learning (FL) enables distributed training via periodically synchronizing model updates among participants. Communication overhead becomes a dominant constraint of FL since participating clients usually suffer from limited bandwidth. To tackle this issue, top-$k$ based gradient compression techniques are extensively developed in FL context, manifesting powerful capabilities in reducing gradient volumes. However, these methods generally are conducted on original gradients where massive spatial redundancies exist and positions of non-zero parameters vary greatly between gradients, which impedes achievement of deeper compression. Top-$k$ sparsification may also degrade performance of trained models due to biased gradient estimations. Targeting above issues, we propose FedTC, a novel transform coding based compression framework. FedTC transforms gradients into a new domain with more concentric energy distributions, which facilitates reducing spatial redundancies and biases in subsequent sparsification. Furthermore, non-zero parameters across clients from different rounds become highly aligned in transform domain, motivating us to partition gradients into smaller parameter blocks with various alignment levels to better exploit these alignments. Lastly, positions and values of non-zero parameters are independently compressed in a block-wise manner with customized designs, through which a higher compression ratio is achieved. Theoretical analysis and extensive experiments both demonstrate effectiveness of our approach.

Speaker Yixuan Guan (Beihang University)

Yixuan Guan received his B.E. degree from Jilin University, Changchun, China, in 2016, and his M.S. degree from South China University of Technology, Guangzhou, China, in 2020. He is currently pursuing his Ph.D. degree from Beihang University, Beijing, China. His research interests include federated learning, data compression, and network communication.

Heroes: Lightweight Federated Learning with Neural Composition and Adaptive Local Update in Heterogeneous Edge Networks

Jiaming Yan, Jianchun Liu, Shilong Wang and Hongli Xu (University of Science and Technology of China, China); Haifeng Liu and Jianhua Zhou (Guangdong OPPO Mobile Telecommunications Corp., Ltd. Dongguan, Guangdong, China)

0

Federated Learning (FL) enables distributed clients to collaboratively train models without exposing their private data. However, it is difficult to implement efficient FL due to limited resources. Most existing works compress the transmitted gradients or prune the global model to reduce the resource cost, but leave the compressed or pruned parameters under-optimized, which degrades the training performance. To address this issue, the neural composition technique constructs size-adjustable models by composing low-rank tensors, allowing every parameter in the global model to learn the knowledge from all clients. Nevertheless, some tensors can only be optimized by a small fraction of clients, thus the global model may get insufficient training, leading to a long completion time, especially in heterogeneous edge scenarios. To this end, we enhance the neural composition technique, enabling all parameters to be fully trained. Further, we propose a lightweight FL framework, called Heroes, with enhanced neural composition and adaptive local update. A greedy-based algorithm is designed to adaptively assign the proper tensors and local update frequencies for participating clients according to their heterogeneous capabilities and resource budgets. Extensive experiments demonstrate that Heroes can reduce traffic consumption by about 72.05% and provide up to 2.97 times speedup compared to the baselines.

Speaker Jiaming Yan (University of Science and Technology of China)

Jiaming Yan received the B.S. degree in 2021 from Hefei University of Technology. He is currently a Ph.D. candidate in the School of Computer Science, University of Science and Technology of China (USTC). His main research interests are edge computing, deep learning and federated learning.

Session Chair

Ruidong Li (Kanazawa University, Japan)

Enter Zoom

On Pipelined GCN with Communication-Efficient Sampling and Inclusion-Aware Caching

Shulin Wang, Qiang Yu and Xiong Wang (Huazhong University of Science and Technology, China); Yuqing Li (Wuhan University, China); Hai Jin (Huazhong University of Science and Technology, China)

0

Graph convolutional network (GCN) has achieved enormous success in learning structural information from unstructured data. As graphs become increasingly large, distributed training for GCNs is severely prolonged by frequent cross-worker communications. Existing efforts to improve the training efficiency often comes at the expense of GCN performance, while the communication overhead persists. In this paper, we propose PSC-GCN, a holistic pipelined framework for distributed GCN training with communication-efficient sampling and inclusion-aware caching, to address the communication bottleneck while ensuring satisfactory model performance. Specifically, we devise an asynchronous pre-fetching scheme to retrieve stale statistics (features, embedding, gradient) of boundary nodes in advance, such that the embedding aggregation and model update are pipelined with statistics transmission. To alleviate communication volume and staleness effect, we introduce a variance-reduction based sampling policy, which prioritizes inner nodes over boundary ones for reducing the access frequency to remote neighbors, thus mitigating cross-worker statistics exchange. Complementing graph sampling, a feature caching module is co-designed to buffer hot nodes with high inclusion probability, ensuring that frequently sampled nodes will be available in local memory. Extensive evaluations on real-world datasets show the superiority of PSC-GCN over state-of-the-art methods, where we can reduce training time by 72%-80% without sacrificing model accuracy.

Speaker Shulin Wang

A Randomized Caching Algorithm for Distributed Data Access

Tianyu Zuo, Xueyan Tang and Bu Sung Lee (Nanyang Technological University, Singapore)

0

In this paper, we study an online cost optimization problem for distributed data access. The goal of this problem is to dynamically create and delete data copies in a multi-server distributed system as time goes, in order to minimize the total storage and network cost of serving access requests. We propose an online algorithm with randomized storage periods of data copies in the servers, and derive an optimal probability density function of storage periods, which makes the algorithm achieve a competitive ratio of $1+\frac{\sqrt{2}}{2}$. An example is presented to show that this competitive ratio is not only tight but also asymptotic. Experimental evaluations using real data access traces demonstrate that our algorithm outperforms the best known deterministic algorithm.

Speaker

Speaker biography is not available.

CDCache: Space-Efficient Flash Caching via Compression-before-Deduplication

Hengying Xiao and Jingwei Li (University of Electronic Science and Technology of China, China); Yanjing Ren (The Chinese University of Hong Kong, Hong Kong); Ruijin Wang and Xiaosong Zhang (University of Electronic Science and Technology of China, China)

0

Large-scale storage systems boost I/O performace via flash caching, but the underlying storage medium of flash caching incurs significant costs and also exhibits low endurance. Previous studies adopt compression-after-deduplication to mitigate writing redundant contents into the flash cache, so as to address the cost and endurance issues. However, deduplication and compression have conflicting preferable cases, and compression-after-deduplication essentially compromises the space-saving benefits of either deduplication or compression. To simultaneously preserve the benefits of both approaches, we explore compression-before-deduplication, which applies compression to eliminate byte-level redundancies across data blocks, followed by deduplication to write only a single copy of duplicate compressed blocks into the flash cache. We present CDCache, a space-efficient flash caching system that realizes compression-before-deduplication. It proposes to dynamically adjust the compression range of data blocks, so as to preserve the effectiveness of deduplication on the compressed blocks. Also, it builds on various design techniques to approximately estimate duplicate data blocks and efficiently manage compressed blocks. Trace-driven experiments show that CDCache improves the read hit ratio and the write reduction ratio of a previous compression-after-deduplication approach by up to 1.3× and 1.6×, respectively, while it only has small memory overhead for index management.

Speaker Hengying Xiao (University of Electronic Science and Technology of China)

Dependency-Aware Online Caching

Julien Dallot (TU Berlin, Germany); Amirmehdi Jafari Fesharaki (Sharif University of Technology, Iran); Maciej Pacut and Stefan Schmid (TU Berlin, Germany)

0

We consider a variant of the online caching problem where the items exhibit dependencies among each other: an item can reside in the cache only if all its dependent items are also in the cache. The dependency relations can form any directed acyclic graph. These requirements arise in systems such as CacheFlow [SOSR 2016] that cache forwarding rules for packet classification in IP-based communication networks. First, we present an optimal randomized online caching algorithm which accounts for dependencies among the items. Our randomized algorithm is O(log k)-competitive, where k is the size of the cache, meaning that our algorithm never incurs the cost of O(log k) times higher than even an optimal algorithm that knows the future input sequence. Second, we consider the bypassing model, where requests can be served at a fixed price without fetching the item and its dependencies into the cache - a variant of caching with dependencies introduced by Bienkowski et al. at SPAA 2017. For this setting, we give an O(sqrt(klogk))-competitive algorithm, which significantly improves the best known competitiveness. We conduct a small case study, to find out that our algorithm incurs on average 2x lower cost.

Speaker

Speaker biography is not available.

Session Chair

Stratis Ioannidis (Northeastern University, USA)

Enter Zoom

In-Orbit Processing or Not? Sunlight-Aware Task Scheduling for Energy-Efficient Space Edge Computing Networks

Weisen Liu, Zeqi Lai, Qian Wu and Hewu Li (Tsinghua University, China); Qi Zhang (Zhongguancun Laboratory, China); Zonglun Li (Beijing Jiaotong University, China); Yuanjie Li and Jun Liu (Tsinghua University, China)

0

With the rapid evolution of space-borne capabilities, space edge computing (SEC) is becoming a new computation paradigm for future integrated space and terrestrial networks. Satellite edges adopt advanced on-board hardware, which not only enables new opportunities to perform complex intelligent space tasks, but also involves new challenges due to additional energy consumption in power-constrained space environment. In this paper, we present Phoenix, an energy-efficient task scheduling framework for futuristic SEC networks. Phoenix exploits a key insight that in a SEC network, there dynamically exists a number of sunlit edges which are illuminated during their orbital period and have sufficient energy supplement from the sun. Phoenix accomplishes energy-efficient in-orbit computing by judiciously offloading space tasks to "sunlight-sufficient" edges or to the ground. Specifically, Phoenix first formulates the SEC battery energy optimizing (SBEO) problem with the goal of minimizing average battery energy consumption while satisfying various task completion constraints. Then Phoenix incorporates a sunlight-aware SEC task scheduling mechanism to make scheduling decisions effectively and efficiently. We implement a prototype and build a hardware-in-the-loop SEC experimental environment. Extensive data-driven evaluations demonstrate that as compared to other state-of-the-art solutions, Phoenix can effectively prolong battery lifetime to 1.7× while still completing tasks on time.

Speaker Weisen Liu (Tsinghua University)

ScalO-RAN: Energy-aware Network Intelligence Scaling in Open RAN

Stefano Maxenti, Salvatore D'Oro, Leonardo Bonati and Michele Polese (Northeastern University, USA); Antonio Capone (Politecnico di Milano, Italy); Tommaso Melodia (Northeastern University, USA)

0

Virtualization, software, and orchestration are pivotal elements in contemporary networks. In this context, the O-RAN architecture bypasses vendor lock-in, enables network programmability, and facilitates integrated artificial intelligence (AI) support. Moreover, container orchestration frameworks (e.g., Kubernetes, OpenShift) simplify how cellular networks and the newly introduced RAN Intelligent Controllers (RICs) are deployed, managed, and orchestrated. While this enables cost reduction via infrastructure sharing, it also makes meeting O-RAN control latency requirements more challenging, especially during peak resource utilization. For instance, the Near-real-time RIC executes applications (xApps) that take control decisions within 1 s, and we show that container platforms available today fail in guaranteeing such timing. To address this, we propose ScalO-RAN, a control framework rooted in optimization and designed as an O-RAN rApp that allocates and scales AI-based applications (xApps, rApps and dApps) to: (i) abide by application-specific latency requirements, and (ii) monetize the shared infrastructure while reducing energy consumption. We prototype ScalO-RAN on OpenShift with base stations, RIC, and a set of AI-based xApps. ScalO-RAN optimally allocates and distributes O-RAN applications to accommodate stringent latency requirements. More importantly, scaling O-RAN applications is primarily a time-constrained rather than resource-constrained, where scaling policies must account for stringent inference latency of AI applications.

Speaker

Speaker biography is not available.

Competitive Online Age-of-Information Optimization for Energy Harvesting Systems

Qiulin Lin (City University of Hong Kong, China); Junyan Su and Minghua Chen (City University of Hong Kong, Hong Kong)

0

We consider the scenario where an energy harvesting source sends its updates to a receiver. The source optimizes its energy allocation over a decision period to maximize a sum of time-varying functions of the age of information (AoI), representing the value of providing timely information. In a practical online setting, we need to make irrevocable energy allocation decisions at each time while the time-varying functions and the energy arrivals are only revealed sequentially. The problem is then challenging as 1) we are facing uncertain energy harvesting arrivals and time-varying functions, and 2) the energy allocation decisions and the energy harvesting process are coupled due to the capacity-limited battery. In this paper, we develop an optimal online algorithm \textsf{CR-Reserve} and show it achieves $\ln\theta+1)$-competitive, where $\theta$ is a parameter representing the level of uncertainty of the time-varying functions. It is the optimal competitive ratio among all deterministic and randomized online algorithms. We conduct simulations based on real-world traces and compare our algorithms with conceivable alternatives. The results show that our algorithms achieve 12$\%$ performance improvement as compared to the state-of-the-art baseline.

Speaker

Speaker biography is not available.

Mean-Field Multi-Agent Contextual Bandit for Energy-Efficient Resource Allocation in vRANs

Jose A. Ayala-Romero (NEC Laboratories Europe GmbH, Germany); Leonardo Lo Schiavo (Universidad Carlos III de Madrid & IMDEA Networks Institute, Spain); Andres Garcia-Saavedra (NEC Labs Europe, Germany); Xavier Costa-Perez (ICREA and i2cat & NEC Laboratories Europe, Spain)

0

Radio Access Network (RAN) virtualization, key for new-generation mobile networks, requires Hardware Accelerators (HAs) that swiftly process wireless signal from Base Stations (BSs) to meet stringent reliability targets. However, HAs are expensive and energy-hungry, which increases costs and has serious environmental implications. To address this problem, we gather data from our experimental platform and compare the performance and energy consumption of a HA (NVIDIA GPU V100) vs. a CPU (Intel Xeon Gold 6240R, 16 cores) for energy-friendly software processing. Based on the insights obtained from this data, we devise a strategy to offload workloads to HAs opportunistically to save energy while preserving reliability. This offloading strategy, however, needs to be configured in near-real-time for every BS sharing common computational resources. This renders a challenging multi-agent collaborative problem in which the number of involved agents (BSs) can be arbitrarily large and can change over time. Thus, we propose an efficient multi-agent contextual bandit algorithm called ECORAN, which applies concepts from mean field theory to be fully scalable. Using a real platform and traces from a production mobile network, we show that ECORAN can provide up to 40% energy savings with respect to the approach used today by the industry.

Speaker

Speaker biography is not available.

Session Chair

Falko Dressler (TU Berlin, Germany)

Enter Zoom

Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference

Shengyuan Ye and Jiangsu Du (Sun Yat-sen University); Liekang Zeng (Hong Kong University of Science and Technology (Guangzhou) & Sun Yat-Sen University, China); Wenzhong Ou (Sun Yat-sen University); Xiaowen Chu (The Hong Kong University of Science and Technology (Guangzhou) & The Hong Kong University of Science and Technology, Hong Kong); Yutong Lu (Sun Yat-sen University); Xu Chen (Sun Yat-sen University, China)

0

Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recognized for edge intelligence, but it still confronts significant challenges stemming from the conflict between intensive workloads and limited on-device computing resources. In this paper, we leverage our observation that many edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources and propose Galaxy, a collaborative edge AI system that breaks the resource walls across heterogeneous edge devices for efficient Transformer inference acceleration. Galaxy introduces a novel hybrid model parallelism to orchestrate collaborative inference, along with a heterogeneity-aware parallelism planning for fully exploiting the resource potential. Furthermore, Galaxy devises a tile-based fine-grained overlapping of communication and computation to mitigate the impact of tensor synchronizations on inference latency under bandwidth-constrained edge environments. Extensive evaluation based on prototype implementation demonstrates that Galaxy remarkably outperforms state-of-the-art approaches under various edge environment setups, achieving up to $2.5\times$ end-to-end latency reduction.

Speaker

Speaker biography is not available.

Industrial Control Protocol Type Inference Using Transformer and Rule-based Re-Clustering

Yuhuan Liu (The Hong Kong Polytechnic University & Southern University of Science and Technology, Hong Kong); Yulong Ding (Southern University of Science and Technology, China); Jie Jiang (China University of Petroleum-Beijing, China); Bin Xiao (The Hong Kong Polytechnic University, Hong Kong); Shuang-Hua Yang (Department of Computer Science, University of Reading, UK)

0

The development of the Industrial Internet of Things (IIoT) is impeded by the lack of unknown protocol specifications. Protocol Reverse Engineering (PRE) plays a crucial role in inferring unpublished protocol specifications by analyzing traffic messages. Since different types within a protocol often have distinct formats, inferring the protocol type is essential for subsequent reverse analysis. Natural Language Processing (NLP) models have demonstrated remarkable capabilities in various sequence tasks, and traffic messages of unknown protocols can be analyzed as sequences. In this paper, we propose a framework for clustering unknown industrial control protocol types. Our framework utilizes a transformer-based auto-encoder network to train corresponding request and response messages, leveraging intermediate layer embedding vectors learned by the network for clustering. The clustering results are employed to extract candidate keywords and establish empirical rules. Subsequently, rule-based re-clustering is performed, and its effectiveness is evaluated based on previous clustering results. Through this re-clustering process, we identify the most effective combination of keywords that define the type. We evaluate the proposed framework using three general protocols that have different type rules and successfully separate the protocol internal types completely.

Speaker

Speaker biography is not available.

OTAS: An Elastic Transformer Serving System via Token Adaptation

Jinyu Chen, Wenchao Xu and Zicong Hong (The Hong Kong Polytechnic University, China); Song Guo (The Hong Kong University of Science and Technology, Hong Kong); Haozhao Wang (Huazhong University of Science and Technology, China); Jie Zhang (The Hong Kong Polytechnic University, Hong Kong); Deze Zeng (China University of Geosciences, China)

0

Transformer model empowered architectures have become a pillar of cloud services that keeps reshaping our society. However, the dynamic query loads and heterogeneous user requirements severely challenge current transformer serving systems, which rely on pre-training multiple variants of a foundation model, i.e., with different sizes, to accommodate varying service demands. Unfortunately, such a mechanism is unsuitable for large transformer models due to the prohibitive training costs and excessive I/O delay. In this paper, we introduce OTAS, the first elastic serving system specially tailored for transformer models by exploring lightweight token management. We develop a novel idea called token adaptation that adds prompting tokens to improve accuracy and removes redundant tokens to accelerate inference. To cope with fluctuating query loads and diverse user requests, we enhance OTAS with application-aware selective batching and online token adaptation. OTAS first batches incoming queries with similar service-level objectives to improve the ingress throughput. Then, to strike a tradeoff between the overhead of token increment and the potentials for accuracy improvement, OTAS adaptively adjusts the token execution strategy by solving an optimization problem. We implement and evaluate a prototype of OTAS with multiple datasets, which show that OTAS improves the system utility by at least 18.2%.

Speaker

Speaker biography is not available.

T-PRIME: Transformer-based Protocol Identification for Machine-learning at the Edge

Mauro Belgiovine, Joshua B Groen, Miquel Sirera, Chinenye M Tassie, Sage Trudeau, Stratis Ioannidis and Kaushik Chowdhury (Northeastern University, USA)

0

Spectrum sharing allows different protocols of the same standard (e.g., 802.11 family) or different standards (e.g., LTE and DVB) to coexist in overlapping frequency bands. As this paradigm continues to spread, wireless systems must also evolve to identify active transmitters and unauthorized waveforms in real time under intentional distortion of preambles, extremely low signal-to-noise ratios and challenging channel conditions. This paper mitigates limitation of correlation-based preamble matching methods in such conditions through the design of T-PRIME: a transformer-based machine learning approach. T-PRIME learns the structural design of transmitted frames through its attention mechanism, looking at patterns of sequences that go beyond the preamble alone. The paper makes three contributions: First, it compares transformer models and demonstrates their superiority over traditional methods and convolutional neural networks. Second, it rigorously analyzes T-PRIME's real-time feasibility on DeepWave's AirT platform. Third, it utilizes an extensive 66 GB dataset of over-the-air (OTA) WiFi transmissions for training, which is released along with the code for community use. Results reveal nearly perfect (i.e. >98%) classification accuracy under simulated scenarios, showing 100% detection improvement over legacy methods in low SNR ranges, 97% classification accuracy for OTA single-protocol transmissions and up to 75% double-protocol classification accuracy in interference scenarios.

Speaker

Speaker biography is not available.

Session Chair

Minghua Chen (City University of Hong Kong, Hong Kong)

Enter Zoom

ZETA: Transparent Zero-Trust Security Add-on for RDMA

Hyunseok Chang and Sarit Mukherjee (Nokia Bell Labs, USA)

0

While the fast adoption of RDMA in data centers has been primarily driven by its performance benefits, more and more attention is being paid to its security implication, especially with mounting security risks associated with lateral communication within data centers. However, since RDMA is implemented as NIC's fixed function, it is challenging to incorporate new security features in RDMA. In this paper, we propose ZETA, a zero-trust security addon for RoCEv2, which enables network-independent, fine-grained zero-trust security control for RDMA. It does not require any change in RDMA's ASIC implementation or application-level interfaces. To this end, ZETA leverages modern SmartNIC's versatility to perform zero-trust policy control on RDMA packets within a SmartNIC in a cryptographically secure fashion. From its prototype implementation and evaluation based on real-word applications, we show that, while cryptographic verification of ZETA introduces 1.5ms session startup latency, the overhead of end-to-end application performance is marginal (e.g., less than 1% throughput and 5% latency penalty).

Speaker

Speaker biography is not available.

Host-driven In-Network Aggregation on RDMA

Yulong Li and Wenxin Li (Tianjin University, China); Yinan Yao (TianJin University, China); Yuxuan Du and Keqiu Li (Tianjin University, China)

0

Large-scale datacenter networks are increasingly using in-network aggregation (INA) and remote direct memory access (RDMA) techniques to accelerate deep neural network (DNN) training. However, existing research trends suggest that these two techniques are on an inevitable collision course. To fill this gap, we present FreeINA, a host-driven in-network aggregation aimed at providing RDMA reliable connection (RC) for multi-tenant learning settings. FreeINA relies on dual transmission paths to support RC compatibility, with one path for INA and another one for aggregation on end-host parameter server. With dynamic control of these two paths, FreeINA can leave the traditional in-server aggregation unaffected while ensuring INA's reliability without modifying RDMA network interfaces (RNICs). To support multi-tenant learning, FreeINA employs all-reduce-level memory allocation, which can capture the well-known "on and off" DNN training pattern and thus improve switch memory efficiency. We have implemented a FreeINA prototype using P4-programmable switch and commercial RNICs, and evaluated it extensively using 100Gbps testbed. The results show that compared to the state-of-the-art solution---ATP, FreeINA improves single-job training speedup ratio by 1.20x, while improving the aggregation throughput by 2.65x in multi-job scenario.

Speaker

Speaker biography is not available.

INSERT: In-Network Stateful End-to-End RDMA Telemetry

Hyunseok Chang (Nokia Bell Labs, USA); Walid A. Hanafy (University of Massachusetts Amherst, USA); Sarit Mukherjee and Limin Wang (Nokia Bell Labs, USA)

0

Remote Direct Memory Access (RDMA) has been widely adopted in modern data centers thanks to its high-throughput, low-latency data transfer capability and reduced CPU overhead. However, traditional network-flow-based monitoring is poor at interpreting RDMA-based communication and hence inadequate for gaining insights. In this paper, we present INSERT, an end-to-end RDMA telemetry system that enables seamless visibility on RDMA-based communication from network-layer all the way to application-layer. To this end, INSERT combines (i) eBPF-based transparent RDMA tracing on end-hosts and (ii) stateful RDMA network telemetry on programmable data plane. We implement RDMA network telemetry on programmable SmartNICs, where we address practical challenges for maintaining fine-grained state on massively-parallel packet processing pipelines. We demonstrate that INSERT can perform reasonably accurate telemetry at line-rate for different types of RDMA traffic even in the presence of out-of-order packets, and finally showcase two practical use cases that can benefit from INSERT.

Speaker

Speaker biography is not available.

RB$^2$: Narrow the Gap between RDMA Abstraction and Performance via a Middle Layer

Haifeng Sun, Yixuan Tan, Yongtong Wu, Jiaqi Zhu and Qun Huang (Peking University, China); Xin Yao and Gong Zhang (Huawei Technologies Co., Ltd., China)

0

Although the native RDMA interface allows for high throughput and low latency, its low-level abstraction raises significant programming challenges. Consequently, numerous systems encapsulate the RDMA interface into more user-friendly high-level abstractions such as Socket, MPI, and RPC. However, this ease of development often incurs considerable performance degradation. To address this trade-off, this paper introduces RB$^2$, a high-performance RDMA-based Distributed Ring Buffer (DRB). RB$^2$ serves as a middle layer that effectively conceals the low-level details of the RDMA interface while also facilitating extension to other high-level abstractions.

Nonetheless, it is non-trivial for DRBs to preserve the RDMA performance. We optimize the performance of RB$^2$ in three aspects. First, we perform micro-benchmarks to identify the pointer synchronization methods that are seemingly counter-intuitive but offer optimal performance improvements. Second, we propose an adaptive batching mechanism to alleviate the limitations of conventional fixed batching. Finally, we build an efficient memory subsystem using various optimization techniques. RB$^2$ outperforms SOTA designs by achieving 2.5× to 7.5× throughput while maintaining comparable tail latency for small messages.

Speaker

Speaker biography is not available.

Session Chair

Sangheon Pack (Korea University, Korea (South))

Enter Zoom

LoBaCa: Super-Resolution LoRa Backscatter Localization for Low Cost Tags

Boxin Hou and Jiliang Wang (Tsinghua University, China)

0

Long Range LoRa backscatter localization has shown great potential in many applications. However, the narrow bandwidth of LoRa and the defects of backscatter tag make localization challenging in practice. This paper presents LoBaCa, the first super-resolution LoRa backscatter localization system for low-cost tags. To increase the overall bandwidth, LoBaCa first utilizes frequency hopping technique and exploits the phase slope to synchronize multiple frequency bands. We further show that the low-cost backscatter tag causes additional phase distortion in the weak backscatter signal and thus introduces significant localization error. Therefore, LoBaCa leverages the upper and lower sideband signals to improve the SNR and correct the phase distortion. Finally, LoBaCa adopts a super-resolution ESPRIT algorithm to solve the complex multipath effect, estimate the angle of arrival (AoA) and localize the backscatter tag. We prototype LoBaCa and conduct extensive experiments to evaluate LoBaCa in both indoor and outdoor scenarios. Our results show that the localization error of LoBaCa is 5.0cm and 71cm when the LoBaCa tag is 5m and 40m away, which is 4.3times and 1.7times better than the state-of-the-arts.

Speaker Boxin Hou (Tsinghua University)

a phd candidate from Tsinghua University

Multi-Node Concurrent Localization in LoRa Networks: Optimizing Accuracy and Efficiency

Jingkai Lin, Runqun Xiong, Zhuqing Xu, Wei Tian, Ciyuan Chen, Xirui Dong and Luo Junzhou (Southeast University, China)

0

LoRa Localization, a fundamental service in LoRa networks, has garnered significant attention due to its long-range capabilities and low power consumption. However, existing approaches for LoRa localization are either incompatible with commercial devices or highly susceptible to environmental factors. To tackle this challenge, we propose SyncLoc, a TDoA-based LoRa localization framework that integrates a dedicated node for multi-dimensional time-drift correction. Our proposal is built on two key observations: firstly, the nanosecond-precise measurement of time differences between gateways, and secondly, the substantial impact of SNR on gateway time drift. To accomplish our objective, we present three progressively enhanced versions of SyncLoc, each intended to comprehensively analyze the factors influencing LoRa time synchronization accuracy across different deployment scenarios involving nodes, carrier frequencies, and spreading factors. In addition to improving accuracy, we identify inefficiencies in LoRa's multi-node concurrent localization, and introduce SyncLoc-4, a multi-node localization scheduling mechanism that optimizes efficiency with a 2-approximation ratio. Through extensive experiments utilizing commercial LoRa devices in real-world environments, we demonstrate a 2.44× improvement in accuracy achieved by SyncLoc compared to LoRaWAN. Furthermore, simulations of large-scale networks exhibit a 2.47× boost in localization scalability (i.e., the number of concurrently located nodes) when employing SyncLoc instead of LoRaWAN.

Speaker Jingkai Lin (Southeast University)

TransformLoc: Transforming MAVs into Mobile Localization Infrastructures in Heterogeneous Swarms

Haoyang Wang, Jingao Xu and Chenyu Zhao (Tsinghua University, China); Zihong Lu (Harbin Institute of Technology, China); Yuhan Cheng and Xuecheng Chen (Tsinghua University, China); Xiao-Ping (Steven) Zhang (Tsinghua University & Toronto Metropolitan University, Canada); Yunhao Liu and Xinlei Chen (Tsinghua University, China)

0

A heterogeneous micro aerial vehicles (MAV) swarm consists of resource-intensive but expensive advanced MAVs (AMAVs) and resource-limited but cost-effective basic MAVs (BMAVs), offering opportunities in diverse fields. Accurate and real-time localization is crucial for MAV swarms, but current practices lack a low-cost, high-precision, and real-time solution, especially for lightweight BMAVs. We find an opportunity to accomplish the task by transforming AMAVs into mobile localization infrastructures for BMAVs. However, turning this insight into a practical system is non-trivial due to challenges in location estimation with BMAVs' unknown and diverse localization errors and resource allocation of AMAVs given coupled influential factors. This study proposes TransformLoc, a new framework that transforms AMAVs into mobile localization infrastructures, specifically designed for low-cost and resource-constrained BMAVs. We first design an error-aware joint location estimation model to perform intermittent joint location estimation for BMAVs and design a similarity-instructed adaptive grouping-scheduling strategy to allocate resources of AMAVs dynamically. TransformLoc achieves a collaborative, adaptive, and cost-effective localization system suitable for large-scale heterogeneous MAV swarms. We implement TransformLoc on industrial drones and validate its performance. Results show that TransformLoc outperforms baselines by maintaining a localization error under 1m and an average navigation success rate of 95%.

Speaker Haoyang Wang (Tsinghua University)

Haoyang Wang received the B.E. degree from the School of Computer Science and Engineering, Central South University, China, in 2022. He is currently pursuing the Ph.D. degree at the Tsinghua Shenzhen International Graduate School, Tsinghua University, China. His research interests include AIoT, mobile computing, and distributed & embedded AI.

AdaSem: Adaptive Goal-Oriented Semantic Communications for End-to-End Camera Relocalization

Qi Liao (Nokia Bell Labs, Germany); Tze-Yang Tung (Nokia Bell Labs, USA)

0

Recently, deep autoencoders have gained traction as a powerful method for implementing goal-oriented semantic communications systems. The idea is to train a mapping from source domain directly to channel symbols, and vice versa. However, prior studies often focused on rate-distortion tradeoff and transmission latency, at the cost of increasing end-to-end complexity. Moreover, they used publicly available datasets which cannot validate the observed gains against real-world baseline systems, leading to an unfair comparison. In this paper, we study the problem of remote camera pose estimation and propose AdaSem, an adaptive semantic communications approach by optimizing the tradeoff between inference accuracy and end-to-end latency. We develop an adaptive semantic codec model, which encodes the source data into a dynamic number of symbols, based on the latent space distribution and the channel state information. We utilize a lightweight model for both transmitter and receiver to ensure comparable complexity to the baseline implemented in a real-world system. Extensive experiments on real-environment data show the effectiveness of our approach. When compared to a real implementation of client-server camera relocalization service, AdaSem outperforms the baseline by reducing the latency and estimation error by 75.8% and over 63%, respectively.

Speaker

Speaker biography is not available.

Session Chair

Wenye Wang (NC State University, USA)

Enter Zoom

Panel: AI+Network: Look Back, Look Forward

Chunyi Peng (Purdue University, USA)

0

Artificial Intelligence (AI) has undergone remarkable advancements, profoundly influencing research in next-generation networks and edge computing. Over the recent years, the field has experienced substantial investment in interdisciplinary research, forging collaborations between AI and networking research communities. This has been notably evident in significant projects and research centers, boasting large teams of researchers and numerous accomplishments. Comprising distinguished researchers from academia and funding agencies, this panel aims to retrospect and anticipate future developments. The panelists will articulate their perspectives on AI+Network, shed light on the research challenges their teams are currently tackling, and engage in discussions on major research topics and grand challenges for the upcoming decade. Furthermore, they will share valuable lessons and insights gained from their experiences to propel advancements in this direction.

Speaker Moderator: Chunyi Peng (Purdue University, USA)

Panelists: Suman Banerjee (University of Wisconsin-Madison, USA) Falko Dressler (Technical University of Berlin, Germany) Ness Shroff (The Ohio State University, USA) Murat Torlak (NSF, USA)

Session Chair

Chunyi Peng (Purdue University, USA)

Enter Zoom

A Semi-Asynchronous Decentralized Federated Learning Framework via Tree-Graph Blockchain

Cheng Zhang, Yang Xu and Xiaowei Wu (Hunan University, China); En Wang (Jilin University, China); Hongbo Jiang (Hunan University, China); Yaoxue Zhang (Tsinghua University, China)

0

Decentralized federated learning (DFL) overcomes the single point of failure issue of centralized Federated Learning. Building upon DFL, blockchain-based federated learning (BFL) takes further strides in establishing trust, enhancing security, and fault tolerance, with its commercial products serving various domains. However, BFL, which is based on the classical linear structure blockchain, is limited by the performance bottleneck and is less efficient. Some recent schemes introduce the directed acyclic graph (DAG) blockchain as the underlying structure, improving performance at the expense of computational verifiability and facing some security risks. In this paper, we propose TGFL, a decentralized federated learning framework based on the Tree-Graph blockchain. The underlying structure of TGFL is designed as a block-centered DAG blockchain to support semi-asynchronous training. The iterative relationship between models is represented as a tree-graph composed of blocks. To facilitate fast convergence, we design a pivot chain generation algorithm that topologically sorts the asynchronous training process, guiding participants in sampling appropriate models. The effectiveness of model updates is checked as part of TGFL's weak consistency consensus mechanism. We discuss adaptive attacks and defenses against TGFL, and validate its effectiveness through experimental evaluations with four baseline approaches.

Speaker

Speaker biography is not available.

Momentum-Based Federated Reinforcement Learning with Interaction and Communication Efficiency

Sheng Yue (Tsinghua University, China); Xingyuan Hua (Beijing Institute of Technology, China); Lili Chen and Ju Ren (Tsinghua University, China)

0

Federated Reinforcement Learning (FRL) has garnered increasing attention recently. However, due to the intrinsic spatio-temporal non-stationarity of data distributions, the current approaches typically suffer from high interaction and communication costs. In this paper, we introduce a new FRL algorithm, named MFPO, that utilizes momentum, importance sampling, and additional server-side adjustment to control the shift of stochastic policy gradients and enhance the efficiency of data utilization. We prove that by proper selection of momentum parameters and interaction frequency, MFPO can achieve $\tilde{\mathcal{O}}(H N^{-1}\epsilon^{-3/2})$ and $\tilde{\mathcal{O}}(\epsilon^{-1})$ interaction and communication complexities ($N$ represents the number of agents), where the interaction complexity achieves linear speedup with the number of agents, and the communication complexity aligns the best achievable of existing first-order FL algorithms. Extensive experiments corroborate the substantial performance gains of MFPO over existing methods on a suite of complex and high-dimensional benchmarks.

Speaker Sheng Yue (Tsinghua University)

Sheng Yue received his B.Sc. in mathematics (2017) and Ph.D. in computer science (2022), from Central South University, China. Currently, he is an assistant researcher with the Department of Computer Science and Technology, Tsinghua University, China. His research interests include network optimization, distributed learning, and reinforcement learning.

SpreadFGL: Edge-Client Collaborative Federated Graph Learning with Adaptive Neighbor Generation

Luying Zhong, Yueyang Pi and Zheyi Chen (Fuzhou University, China); Zhengxin Yu (Lancaster University, United Kingdom (Great Britain)); Wang Miao (University of Plymouth, United Kingdom (Great Britain)); Xing Chen (Fuzhou University, China); Geyong Min (University of Exeter, United Kingdom (Great Britain))

0

Federated Graph Learning (FGL) has garnered widespread attention by enabling collaborative training on multiple clients for semi-supervised classification tasks. However, most existing FGL studies do not well consider the missing inter-client topology information in real-world scenarios, causing insufficient feature aggregation of multi-hop neighbor clients during model training. Moreover, the classic FGL commonly adopts the FedAvg but neglects the high training costs when the number of clients expands, resulting in the overload of single edge server. To address these important challenges, we propose a novel FGL framework, named SpreadFGL, to promote the information flow in edge-client collaboration and extract more generalized potential relationships between clients. In SpreadFGL, an adaptive graph imputation generator incorporated with a versatile assessor is first designed to exploit the potential links between subgraphs, without sharing raw data. Next, a new negative sampling mechanism is developed to make SpreadFGL concentrate on more refined information in downstream tasks. To facilitate load balancing at the edge layer, SpreadFGL follows a distributed training manner that enables fast model convergence. Using real-world testbed and benchmark graph datasets, extensive experiments demonstrate the effectiveness of the proposed SpreadFGL. The results show that SpreadFGL achieves higher accuracy and faster convergence against the state-of-the-art algorithms.

Speaker Luying Zhong (Fuzhou University)

Luying Zhong received the B.S. degree in Computer Science from Fuzhou University, Fuzhou, China. She is currently pursuing the doctoral degree in the College of Computer and Data Science, Fuzhou University. Her research interests include Edge Computing, Federated Learning, and Graph Learning.

Strategic Data Revocation in Federated Unlearning

Ningning Ding, Ermin Wei and Randall A Berry (Northwestern University, USA)

0

By allowing users to erase their data's impact on federated learning models, federated unlearning protects users' right to be forgotten and data privacy. Despite a burgeoning body of research on federated unlearning's technical feasibility, there is a paucity of literature investigating the considerations behind users' requests for data revocation. This paper proposes a non-cooperative game framework to study users' data revocation strategies in federated unlearning. We prove the existence of a Nash equilibrium. However, users' best response strategies are coupled via model performance and unlearning costs, which makes the equilibrium computation challenging. We obtain the Nash equilibrium by establishing its equivalence with a much simpler auxiliary optimization problem. We also summarize users' multi-dimensional attributes into a single-dimensional metric and derive the closed-form characterization of an equilibrium, when users' unlearning costs are negligible. Moreover, we compare the cases of allowing and forbidding partial data revocation in federated unlearning. Interestingly, the results reveal that allowing partial revocation does not necessarily increase users' data contributions or payoffs due to the game structure. Additionally, we demonstrate that positive externalities may exist between users' data revocation decisions when users incur unlearning costs, while this is not the case when their unlearning costs are negligible.

Speaker

Speaker biography is not available.

Session Chair

Hanif Rahbari (Rochester Institute of Technology, USA)

Enter Zoom

AdaStreamer: Machine-Centric High-Accuracy Multi-Video Analytics with Adaptive Neural Codecs

Andong Zhu, Sheng Zhang, Ke Cheng, Xiaohang Shi, Zhuzhong Qian and Sanglu Lu (Nanjing University, China)

0

Increased videos captured by widely deployed cameras are being analyzed by computer vision-based Deep Neural Networks (DNNs) on servers rather than being streamed for humans. Unfortunately, the conventional codecs (e.g., H.26x and MPEG-x) originally designed for video streaming lack content-aware feature extraction and hinder machine-centric video analytics, making it difficult to achieve the required high accuracy with tolerable delay. Neural codecs (e.g., autoencoder) now hold impressive compression performance and have been widely advocated in video streaming. While autoencoder shows transformative potential, the application in video analytics is hampered by low accuracy in detecting small objects of high-resolution videos and the serious challenges posed by multi-video streaming. To this end, we propose AdaStreamer with adaptive neural codecs to enable real machine-centric high-accuracy multi-video analytics. We also investigate how to achieve optimal accuracy under delay constraints via careful scheduling in Compression Ratios (CRs, the ratio of the compressed size to the original data size) and bandwidth allocation, and further propose a Markov-based Adaptive Compression and Bandwidth Allocation algorithm (MACBA). We have practically developed a prototype of AdaStreamer, based on which extensive experiments verify its accuracy improvement (up to 15%) compared to state-of-the-art coding and streaming solutions.

Speaker

Speaker biography is not available.

AggDeliv: Aggregating Multiple Wireless Links for Efficient Mobile Live Video Delivery

Jinlong E (Renmin University of China, China); Lin He and Zongyi Zhao (Tsinghua University, China); Yachen Wang, Gonglong Chen and Wei Chen (Tencent, China)

0

Mobile live-streaming applications with stringent latency and bandwidth requirements have gained tremendous attention in recent years. Encountered with bandwidth insufficiency and congestion instability of the wireless uplinks, multi-access networking provides opportunities to achieve fast and robust connectivity. However, the state-of-the-art multi-path transmission solutions are lack of adaptivity to the heterogeneous and dynamic nature of wireless networks. Meanwhile, the indispensable video coding and transformation bring about extra latency and make the video delivery vulnerable to network throughput fluctuation. This paper presents AggDeliv, a framework that provides efficient and robust multi-path transmission for mobile live video delivery. The key idea is to relate multi-path packet scheduling to congestion control optimization over diverse wireless links and adapt it to the mobile video characteristics. This is achieved by probabilistic packet allocation based on links' congestion windows, wireless-oriented delay and loss aware congestion control, as well as lightweight video frame coding and network-adaptive frame-packet transformation. Real-world evaluations demonstrate that our framework significantly outperforms the state-of-the-art solutions on aggregate goodput and streaming video bitrate.

Speaker Jinlong E (Renmin University of China)

He is currently a lecturer at Renmin University of China. His research interests include cloud/edge computing, mobile streaming media, and AIoT.

BiSwift: Bandwidth Orchestrator for Multi-Stream Video Analytics on Edge

Lin Sun (Nanjing University, China); Weijun Wang (Tsinghua University, China); Tingting Yuan (Georg-August-University of Göttingen, Germany); Liang Mi (Nanjing University, China); Haipeng Dai (Nanjing University, China & State Key Laboratory for Novel Software Technology, China); Yunxin Liu (Tsinghua University, China); Xiaoming Fu (University of Goettingen, Germany)

0

High-definition (HD) cameras for surveillance and road traffic have experienced tremendous growth, demanding intensive computation resources for real-time analytics. Recently, offloading frames from the front-end device to the back-end edge server has shown great promise. In multi-stream competitive environments, efficient bandwidth management and proper scheduling are crucial to ensure both high inference accuracy and high throughput. To achieve this goal, we propose BiSwift, a bi-level framework that scales the concurrent real-time video analytics by a novel adaptive hybrid codec integrated with multi-level pipelines, and a global bandwidth controller for multiple video streams. The lower-level front-back-end collaborative mechanism (called adaptive hybrid codec) locally optimizes the accuracy and accelerates end-to-end video analytics for a single stream. The upper-level scheduler aims to accuracy fairness among multiple streams via the global bandwidth controller. The evaluation of BiSwift shows that BiSwift is able to real-time object detection on 9 streams with an edge device only equipped with an NVIDIA RTX3070 (8G) GPU. BiSwift improves 10%~21% accuracy and presents 1.2~9 times throughput compared with the state-of-the-art video analytics pipelines.

Speaker Weijun Wang (Tsinghua University)

Weijun Wang is currently a Postdoc Fellow in AIoT Group from Institute for AI Industry Research, Tsinghua University, China. His research area is Edge LLM, especially Efficiently Serving Large Vision Models on Edge. Weijun Wang received his dual Ph.D. degrees respectively from Nanjing University, China, and the University of Göttingen, Germany. He was a Researcher at the University of Göttingen from 2022 to 2023.

Crucio: End-to-End Coordinated Spatio-Temporal Redundancy Elimination for Fast Video Analytics

Andong Zhu, Sheng Zhang, Xiaohang Shi, Ke Cheng, Hesheng Sun and Sanglu Lu (Nanjing University, China)

0

Video Analytics Pipeline (VAP) usually relies on traditional codecs to stream video content from clients to servers. However, such analytics-agnostic codecs preserve considerable pixels not relevant to achieving high analytics accuracy, incurring a large end-to-end delay. Despite the significant efforts of pioneers, they fall short as they resisted complete redundancy elimination. Achieving such a goal is extremely challenging, and naive design without coordination can result in the benefits of redundancy elimination being counterbalanced by intolerable delays introduced. We present Crucio, an end-to-end coordinated spatio-temporal redundancy elimination system for edge video analytics. Crucio leverages reshaped asymmetric autoencoders for end-to-end frame filtering (temporally) and coordinated intra-frame (spatially), inter-frame (temporally) compression. Furthermore, Crucio can decode the compressed key frames all in one go and support adaptive VAP batch size for delay optimization. Extensive evaluations reveal significant end-to-end delay reductions (at least 31% under an accuracy target of 0.9) in Crucio compared to the state-of-the-art VAP redundancy elimination methods (e.g., DDS, Reducto, STAC, etc).

Speaker

Speaker biography is not available.

Session Chair

Sabur Baidya (University of Louisville, USA)

Enter Zoom

AGR: Acoustic Gait Recognition Using Interpretable Micro-Range Profile

Penghao Wang, Ruobing Jiang and Chao Liu (Ocean University of China, China); Jun Luo (Nanyang Technological University, Singapore)

0

Recently, gait recognition, a type of biometric identification, has seen extensive application in area access control and smart homes, bolstering convenience, privacy, and personalized experiences. While privacy-preserving wireless sensing solutions have become a research focal point as alternatives to computer vision methods, current strategies are predominantly based on abstract features, inherently suffering from limitations in interpretability and stability. Fortunately, the widespread utilization of smart speakers has opened up opportunities for acoustic sensing, making it possible to extract more interpretable features. In this paper, we further push the limit of acoustic recognition with visual interpretability by sequentially visualizing fine-grained acoustic human gait features. Original gait profiles, with invisible gait indications, are first constructed by matrixing and compressing multipath gait echoes. Then, we achieve interpretable gaits through the novelly proposed micro-range profiles. Key innovations include Mobile Target Detector (MTD) based clutter elimination, farther echo strength compensation, and macro torso migration subtraction. Practical benefits provided by the interpretable gait profiles lie in improving data utilization, optimizing abnormal data handling, and enhancing model stability. Extensive evaluations with an open experimental scenario have been conducted to demonstrate accuracy reaching 97.5% in general, and robust performance against impacts from various practical factors.

Speaker Chao Liu

Chao Liu received his B.S. degree from Ocean University of China in 2011 and his Ph.D. degrees from the Illinois Institute of Technology and Ocean University of China in 2015 and 2016, respectively. He is currently an associate professor in the Department of Computer Science and Technology, Ocean University of China. He is also the chair of IEEE Std 1851-2023 and the vice chair of ISO 21851-2020. His main research interests include acoustic sensing, mobile computing, and wireless sensor networks. He has authored or coauthored more than 70 papers in international journals and conference proceedings, such as the CCS, INFOCOM, JSAC, TIP, TII, TCSVT and the TOSN. He is a member of the ACM and IEEE.

hBP-Fi: Contactless Blood Pressure Monitoring via Deep-Analyzed Hemodynamics

Yetong Cao (Beijing Institute of Technology, China); Shujie Zhang (Nanyang Technological University, Singapore); Fan Li (Beijing Institute of Technology, China); Zhe Chen (Fudan University & AIWiSe Company, China); Jun Luo (Nanyang Technological University, Singapore)

0

Blood pressure (BP) measurement is significant to the assessment of many dangerous health conditions. Non-invasive approaches typically rely on wearing devices on specific skin areas with consistent pressure. However, this can be uncomfortable and unsuitable for certain individuals, and the accuracy of these methods may significantly decrease due to improper device placements and wearing states. Recently, contactless methods leveraging RF technology have emerged as a potential alternative. However, these methods suffer from the drawback of overfitting deep learning (DL) models without a sound physiological basis, resulting in a lack of clear explanations for their outputs. Consequently, such limitations lead to skepticism and distrust among medical experts. In this paper, we propose hBP-Fi, a contactless BP measurement system driven by hemodynamics acquired via RF sensing. hBP-Fi has several advantages: i) it relies on hemodynamics as the key physical process of heart-pulse activities, ii) it uses beam-steerable RF devices for super-resolution scans of fine-grained pulse activities along arm arteries, and iii) it ensures trustworthy outputs through an explainable (decision-understandable) DL model. Extensive experiments with 35 subjects demonstrate that hBP-Fi achieves an error of -2.05±6.83 mmHg for monitoring systolic blood pressure and 1.99±6.30 mmHg for diastolic blood pressure.

Speaker Yetong Cao (Beijing Institute of Technology, China)

Yetong Cao is currently a research fellow at the College of Computing and Data Science, Nanyang Technological University. She received her Ph.D. degree from the School of Computer Science at Beijing Institute of Technology in 2023, advised by Prof. Fan Li. She received her B.E. degree from Shandong University in 2017. Her research interests include Smart Sensing, Mobile Computing, Mobile Health, and Security & Privacy.

M2-Fi: Multi-person Respiration Monitoring via Handheld WiFi Devices

Jingyang Hu and Hongbo Jiang (Hunan University, China); Tianyue Zheng, Jingzhi Hu and Hongbo Wang (Nanyang Technological University, Singapore); Hangcheng Cao (City University of Hong Kong, China); Zhe Chen (Fudan University & AIWiSe Company, China); Jun Luo (Nanyang Technological University, Singapore)

0

Wi-Fi signal is commonly used for conventional communication, yet it can also realize low-cost and non-invasive human sensing. However, using handheld devices for Wi-Fi sensing in Multi-person scenarios is still a challenging problem. In this paper, we propose M2-Fi to achieve multi-person respiration monitoring using handheld device. M2-Fi leverages Wi-Fi BFI (beamforming feedback information) performs multi-person respiration monitoring. As a compressed version of The uplink CSI (channel state information), BFI transmission is unencrypted, easily obtained using frame capture, does not require specific firmware or WiFi chipsets to obtain. M2-Fi is based on an interesting experiment phenomenon that when a Wi-Fi device is very close to a subject, near-field channel changes caused by the subject significantly cancel out changes from other subjects. We employed VMD (Variational Mode Decomposition) to eliminate the interference caused by hand movement in the BFI time series. Subsequently, we devised a deep learning architecture based on GAN (Generative Adversarial Networks) to recover fine-grained respiration waveforms from the respiration patterns extracted from the BFI time series. Our experiments on collected 50-hour data from 8 subjects show that M2-Fi can accurately recover the respiration waveforms of multiple persons with personal handheld device.

Speaker Jingyang Hu (Hunan University)

Jingyang Hu is currently pursuiting Ph.D. student with the College of Computer Science and Electronic Engineering, Hunan University, China. From 2022 to 2023, he works as a joint Ph.D. student at the School of Computer Science and Engineering at Nanyang Technological University (NTU), Singapore. He has published papers in ACM Ubicomp, ACM CCS, IEEE INFOCOM, IEEE ICDCS, IEEE TMC, IEEE JSAC, IEEE TITS, IEEE IoT-J, etc. His research interests include mobile and pervasive computing, the Internet of Things, and machine learning.

One is Enough: Enabling One-shot Device-free Gesture Recognition with COTS WiFi

Leqi Zhao, Rui Xiao and Jianwei Liu (Zhejiang University, China); Jinsong Han (Zhejiang University & School of Cyber Science and Technology, China)

0

In recent years, WiFi-based gesture recognition (WGR) has gained popularity due to its privacy-preserving nature and wide availability of WiFi infrastructure. However, existing WGR systems suffer from scalability issues, i.e., requiring extensive data collection and re-training for each new gesture class. To address these limitations, we propose OneSense, a one-shot WiFi-based gesture recognition system that can efficiently and easily adapt to new gesture classes. Specifically, we first propose a data enrichment approach based on the law of signal propagation in physical world to generate virtual gestures, enhancing the diversity of the training set without extra overhead of real sample collection. Then, we devise an aug-meta learning (AML) framework to enable efficient and scalable few-short learning. This framework leverages two pre-training stages (i.e., aug-training and meta-training) to improve the model's feature extraction and generalization abilities, and ultimately achieves accurate one-shot gesture recognition through fine-tuning. Experimental results demonstrate that OneSense achieves 93% one-shot gesture recognition accuracy, which outperforms the state-of-the-art approaches. Moreover, it maintains high recognition accuracy when facing new environments, user locations, and user orientations. Furthermore, the proposed AML framework reduces 86%+ pre-training latency compared to conventional meta-learning method.

Speaker Leqi Zhao (Zhejiang University)

Leqi Zhao received the BS degree from Zhejiang University in 2023. She is currently a first-year Ph.D. student at Department of Computer Science and Technology, Zhejiang University. Her research interests include wireless sensing, mobile computing, and IoT security.

Session Chair

Hina Tabassum (York University, Canada)

Enter Zoom

A Reflection with INFOCOM Achievement Award Winner

Guoliang Xue (Arizona State University, USA)

0

In this session, Prof. Guoliang Xue will have a conversation with the INFOCOM Achievement Award winner: Baochun Li (University of Toronto).

Speaker Baochun Li (University of Toronto)

Baochun Li has made pioneering and sustained contributions to computer networking and distributed systems, ranging from the theory and practice of network coding to cloud security and federated learning. He is widely respected for taking the mathematical rigor of theoretical underpinnings to design and build real-world systems of practical relevance. His pioneering work on both theoretical advances and practical designs of network coding has led to the UUSee live streaming system in 2009, the world’s first large-scale deployment of network coding in commercial live streaming systems. His seminal work in 2014 studied the practical security challenge of sharing data over the cloud, which has become a daily routine with the prevalence of shared cloud storage services. More recently, he has made seminal contributions towards improving the performance of distributed machine learning systems and federated learning. Since 2015, he has played leadership roles towards automating review assignments in IEEE INFOCOM. His research impact has led to the IEEE Communications Society Leonard G. Abraham Prize Paper Award (2000), the IEEE Communications Society Multimedia Communications Best Paper Award (2009), as well as the Best Paper Award in IEEE INFOCOM 2023. He is a Fellow of the Canadian Academy of Engineering, the Engineering Institute of Canada, and IEEE.

Session Chair

Guoliang (Larry) Xue (Arizona State University, USA)

Enter Zoom

Tolerating Disasters with Hierarchical Consensus

Wassim Yahyaoui (University of Luxembourg & SnTInterdisciplinary Centre for Security, Reliability and Trust (SnT), Luxembourg); Jeremie Decouchant (Delft University of Technology, The Netherlands); Marcus Völp (University of Luxembourg, Luxembourg); Joachim Bruneau-Queyreix (Bordeaux-INP, France)

0

Geo-replication provides disaster recovery after catastrophic accidental failures or attacks, such as fires, black outs or denial-of-service attacks to a data center or region. Naturally distributed data structures, such as Blockchains, when well designed, are immune against such disruptions, but they also benefit from leveraging locality. In this work, we consolidate the performance of geo-replicated consensus by leveraging novel insights about hierarchical consensus and a construction methodology that allows creating novel protocols from existing building blocks. In particular we show that cluster confirmation, paired with subgroup rotation, allows protocols to safely operate through situations where all members of the global consensus group are Byzantine. We demonstrate our compositional construction by combining the recent HotStuff and Damysus protocols into a hierarchical geo-replicated blockchain with global durability guarantees. We present a compositionality proof and demonstrate the correctness of our protocol, including its ability to tolerate cluster crashes. Our protocol achieves a 20\% higher throughput than GeoBFT, the latest hierarchical Byzantine Fault-Tolerant (BFT) protocol.

Speaker

Speaker biography is not available.

Auncel: Fair Byzantine Consensus Protocol with High Performance

Chen Wuhui (Sun Yat-sen University, China); Yikai Feng (Sun Yat-Sen University, China); Jianting Zhang (Purdue University, USA); Zhongteng Cai (Sun Yat-Sen University, China); Hong-Ning Dai (Hong Kong Baptist University, Hong Kong); Zibin Zheng (School of Data and Computer Science, Sun Yat-sen University, China)

0

Since the advent of decentralized financial applications based on blockchains, new attacks that take advantage of manipulating the order of transactions have emerged. To this end, order fairness protocols are devised to prevent such order manipulations. However, existing order fairness protocols adopt time-consuming mechanisms that bring huge computation overheads and defer the finalization of transactions to the following rounds, eventually compromising system performance. In this work, we present Auncel, a novel consensus protocol that achieves both order fairness and high performance. Auncel leverages a weight-based strategy to order transactions, enabling all transactions in a block to be committed within one consensus round, without cost computation and further delays. Furthermore, Auncel achieves censorship resistance by integrating the consensus protocol with the fair ordering strategy, ensuring all transactions can be ordered fairly. To reduce the overheads introduced by the fair ordering strategy, we also design optimization mechanisms, including dynamic transaction compression and adjustable replica proposal strategy. We implement a prototype of Auncel based on HotStuff and construct extensive experiments. Experimental results show that Auncel can increase the throughput by 6 times and reduce the confirmation latency by 3 times compared with state-of-the-art order fairness protocols.

Speaker Yikai Feng (Sun Yat-Sen University, China)

A graduate student at the Sun Yat-Sen University who mainly researches blockchain architecture for about three years.

CRACKLE: A Fast Sector-based BFT Consensus with Sublinear Communication Complexity

Hao Xu, Xiulong Liu, Chenyu Zhang, Wenbin Wang, Jianrong Wang and Keqiu Li (Tianjin University, China)

0

Blockchain systems widely employ Byzantine fault-tolerant (BFT) protocols to ensure consistency. Improving BFT protocols' throughput is crucial for large-scale blockchain systems. Frontier protocols face two problems: (i) the binary dilemma between leader bottleneck in star-based linear communication and compromised resilience in tree-based sublinear communication; and (ii) 2- or 3-round protocols restrict the phase number of one proposal, thereby limiting the scalability and parallelism of the pipeline. To overcome the above problems, this paper proposes CRACKLE, the first sector-based pipelined BFT protocol with a sublinear communication complexity, for a throughput improvement of consensus protocol with max resilience of (N-1)/3. We propose a sector-based communication mode to disseminate messages from the leader to a subset of replicas in each phase to accelerate consensus and split the traditional two-round protocol into 2κ phases to increase the basic pipeline scale. When implementing CRACKLE, we address two technical challenges: (i) ensuring QC certification during continuous κ phases and (ii) achieving pipeline decoupling among shorter phases. We provide comprehensive theoretical proof of the correctness of CRACKLE. Real experimental results reveal that CRACKLE achieves up to 10.36x higher throughput compared with state-of-the-art BFT protocols such as Kauri and Hotstuff.

Speaker Hao Xu (Tianjin University)

Hao Xu received the B.E. degree from Tianjin University, Tianjin, China, in 2018. He is currently pursuing a Ph.D. degree in Computer Science from Tianjin University at Tianjin, China. He also worked as a research assistant at the Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China, in 2019. His research interests include blockchain technologies and distributed system technologies.

Expediting In-Network Federated Learning by Voting-Based Consensus Model Compression

Xiaoxin Su (Shenzhen University, China); Yipeng Zhou (Macquarie University, Australia); Laizhong Cui (Shenzhen University, China); Song Guo (The Hong Kong University of Science and Technology, Hong Kong)

0

Recently, federated learning (FL) has gained momentum because of its capability in preserving data privacy. To conduct model training by FL, multiple clients exchange model updates with a parameter server via Internet. To accelerate the communication speed, it has been explored to deploy a programmable switch (PS) in lieu of the parameter server to coordinate clients. The challenge to deploy the PS in FL lies in its scarce memory space, prohibiting running memory consuming aggregation algorithms on the PS. To overcome this challenge, we propose Federated Learning in-network Aggregation with Compression (FediAC) algorithm, consisting of two phases: client voting and model aggregating. In the former phase, clients report their significant model update indices to the PS to estimate global significant model updates. In the latter phase, clients upload global significant model updates to the PS for aggregation. FediAC consumes much less memory space and communication traffic than existing works because the first phase can guarantee consensus compression across clients. The PS easily aligns model update indices to swiftly complete aggregation in the second phase. Finally, we conduct extensive experiments by using public datasets to demonstrate that FediAC remarkably surpasses the state-of-the-art baselines in terms of model accuracy and communication traffic.

Speaker

Speaker biography is not available.

Session Chair

Suman Banerjee (University of Wisconsin, USA)

Enter Zoom

MatrixLoc: Centimeter-Level Relative Vehicle Positioning with Matrix Headlight

Wen-Hsuan Shen and Hsin-Mu Tsai (National Taiwan University, Taiwan)

0

Precise vehicle positioning is the fundamental technology in vehicle platooning, for sensing the driving environment, detecting hazards, and determining driving strategies. While radars offer robust performance in extreme weather, the positioning error of up to 1.8 m could be insufficient for platoons with vehicle spacings of a few meters. In contrast, LiDARs provide great accuracy, but, its high cost hinders its penetration rate in the commercial market. To this end, this paper presents MatrixLoc, utilizing a projector as the headlight and a customized light sensor array as the receiver to achieve low-cost and centimeter-level relative vehicle positioning in both longitudinal and lateral axes, along with bearing information. MatrixLoc leverages differential phase shift keying (DPSK) modulation to create a fringe pattern, enabling single-shot positioning. Accurate positioning is achieved by analyzing the observed space-varying frequency and demodulated phase information. Real-world driving results demonstrate that MatrixLoc achieves centimeter-level positioning accuracy, with a positioning error of 30 cm and a bearing error of 9 degrees at a distance of 20 m.

Speaker

Speaker biography is not available.

Edge-Assisted Camera Selection in Vehicular Networks

Ruiqi Wang and Guohong Cao (The Pennsylvania State University, USA)

0

Camera sensors have been widely used to perceive the vehicle surrounding environments, understand the traffic condition, and then help avoid traffic accidents. Since most sensors are limited by line of sight, the perception data collected through individual vehicle can be uploaded and shared through the edge server. To reduce the bandwidth, storage and processing cost, we propose an edge-assisted camera selection system that only selects the necessary camera images to upload to the server. The selection is based on the camera metadata which describes the coverage of the cameras represented with GPS locations, orientations, and field of views. Different from existing work, our metadata based approach can detect and locate camera occlusions by leveraging LiDAR sensors, and then precisely and quickly calculate the real camera coverage and identify the coverage overlap. Based on the camera metadata, we study two camera selection problems, the Max-Coverage problem and the Min-Selection problem, and solve them with efficient algorithms. Moreover, we propose similarity based redundancy suppression techniques to further reduce the bandwidth consumption which becomes significant due to vehicle movements. Extensive evaluations demonstrate that the proposed algorithms can effectively select cameras to maximize coverage or minimize bandwidth consumption based on the application requirements.

Speaker Ruiqi Wang (Pennsylvania State University)

COPILOT: Cooperative Perception using Lidar for Handoffs between Road Side Units

Suyash Sunay Pradhan (MS at Northeastern University, USA); Debashri Roy (The University of Texas Arlington, USA); Batool Salehihikouei and Kaushik Chowdhury (Northeastern University, USA)

0

This paper presents COPILOT, a ML-based approach that allows vehicles requiring ubiquitous high bandwidth connectivity to identify the most suitable road side units (RSUs) through proactive handoffs. By cooperatively exchanging the data obtained from local 3D Lidar point clouds within adjacent vehicles and with coarse knowledge of their relative positions, COPILOT identifies transient blockages to all candidate RSUs along the path under study. Such cooperative perception is critical for choosing RSUs with highly directional links required for mmWave bands, which majorly degrade in the absence of LOS. COPILOT proposes three modules that operate in an inter-connected manner: (i) As an alternative to sending raw Lidar point clouds, it extracts and transmits low-dimensional intermediate features to lower the overhead of inter-vehicle messaging; (ii) It utilizes an attention-mechanism to place greater emphasis on data collected from specific cars, as opposed to naive nearest neighbor and distance-based selection schemes, and (iii) it experimentally validates the outcomes using an outdoor testbed composed of an autonomous car and Talon AD7200 60GHz routers emulating the RSUs, accompanied by the public release of the datasets. Results reveal COPILOT yields upto 69.8% and 20.42% improvement in latency and throughput compared to traditional reactive handoffs for mmWave networks, respectively.

Speaker

Speaker biography is not available.

LoRaPCR: Long Range Point Cloud Registration through Multi-hop Relays in VANETs

Zhenxi Wang, Hongzi Zhu, Yunxiang Cai and Quan Liu (Shanghai Jiao Tong University, China); Shan Chang (Donghua University, China); Liang Zhang (Shanghai Jiao Tong University, China)

0

Point cloud registration (PCR) can significantly extend the visual field and enhance the point density on distant objects, thereby improving driving safety. However, it is very challenging for vehicles to perform online registration between long-range point clouds. In this paper, we propose an online long-range PCR scheme in VANETs, called LoRaPCR, where vehicles achieve long-range registration through multi-hop short-range highly-accurate registrations. Given the NP-hardness of the problem, a heuristic algorithm is developed to determine best registration paths while leveraging the reuse of registration results to reduce computation cost. Moreover, we utilize an optimized dynamic programming algorithm to determine the transmission routes while minimizing the communication overhead. Results of extensive simulations demonstrate that LoRaPCR can achieve high PCR accuracy with low relative translation and rotation errors of 0.55 meters and 1.43°, respectively, at a distance of over 100 meters, and reduce the computation overhead by more than 50% compared to the state-of-the-art method.

Speaker Zhenxi Wang(Shanghai Jiao Tong University)

Session Chair

Hang Qiu (UCR, USA)

Enter Zoom

BCC: Re-architecting Congestion Control in DCNs

Qingkai Meng, Shan Zhang, Zhiyuan Wang and Tao Tong (Beihang University, China); Chaolei Hu (Tsinghua University, China); Hongbin Luo (Beihang University, China); Fengyuan Ren (Tsinghua University, China)

0

The nature of datacenter traffic is a high volume of bursty tiny flows and standing long flows, which forms the coexistence of transient and persistent congestion. Traditional congestion control (CC) algorithms have inherent limitations in reconciling fast response and high efficiency towards transients with stability and fairness during persistence. In this paper, we provide an insight that re-architects CC with two control laws, tailored to transient and persistent concerns, respectively. Armed with this key insight, we propose bimodal congestion control (BCC), which is founded on two core ideas: (i) Quaternary network state detection, which further distinguishes transient and persistent states in switches, and (ii) Bimodal control law, which is manifested as the transient controller and persistent controller at sources. The transient controller employs a precise control paradigm that halts flows to drain backlogged packets and ramps down/up flow rates to bottleneck bandwidth directly, striving for high efficiency. The persistent controller grounds itself in traditional CC algorithms, inheriting stability and fairness. We implement BCC in the Linux kernel and P4-programmable switch. Evaluations in testbed and simulations show that compared to DCQCN, HPCC, PowerTCP, and Swift, BCC shortens the convergence time by up to 96\% and reduces flow completion time by 14\%$\sim$99\%.

Speaker Qingkai Meng(Beihang University)

Reinforcement Learning-based Congestion Control: A Systematic Evaluation of Fairness, Efficiency and Responsiveness

Luca Giacomoni and George Parisis (University of Sussex, United Kingdom (Great Britain))

0

Reinforcement learning (RL)-based congestion control (CC) promises efficient CC in a fast-changing networking landscape, where evolving communication technologies, applications and traffic workloads pose severe challenges to human-derived CC algorithms. RL-based CC is in its early days and substantial research is required to understand existing limitations, identify research challenges and yield deployable solutions. In this paper we present the first reproducible and systematic study of RL-based CC with the aim to highlight strengths and uncover fundamental limitations of the state-of-the-art. We identify challenges in evaluating RL-based CC, establish a methodology for studying said approaches and perform experimentation with RL-based CC approaches that are publicly available. We show that existing approaches can acquire all available bandwidth swiftly and are resistant to non-congestive loss, at the cost of excessive packet loss in normal operation. We show that, as fairness is not embedded into reward functions, existing approaches exhibit unfairness in almost all tested network setups. Finally, we show that existing RL-based CC approaches under-perform when the available bandwidth and end-to-end latency dynamically change. Our codebase and datasets are publicly available with the aim to galvanise the community towards transparency and reproducibility, which have been recognised as crucial for researching and evaluating machine-generated policies.

Speaker

Speaker biography is not available.

Approximation Algorithms for Minimizing Congestion in Demand-Aware Networks

Wenkai Dai (University of Vienna, Austria); Michael Dinitz (Johns Hopkins University, USA); Klaus-Tycho Foerster (TU Dortmund, Germany); Long Luo (University of Electronic Science and Technology of China, China); Stefan Schmid (TU Berlin, Germany)

0

Emerging reconfigurable optical communication technologies enable demand-aware networks: networks whose static topology can be enhanced with demand-aware links optimized towards the traffic pattern the network serves. This paper studies the algorithmic problem of how to jointly optimize the topology and the routing in such demand-aware networks, to minimize congestion. We investigate this problem along two dimensions: (1) whether flows are splittable or unsplittable, and (2) whether routing on the hybrid topology is segregated or not, i.e., whether or not flows either have to use exclusively either the static network or the demand-aware connections. For splittable and segregated routing, we show that the problem is $2$-approximable in general, but APX-hard even for uniform demands induced by a bipartite demand graph. For unsplittable and segregated routing, we show an upper bound of $O\left(\log m/ \log\log m \right)$ and a lower bound of $\Omega\left(\log m/ \log\log m \right)$ for polynomial-time approximation algorithms, where $m$ is the number of static links. We further show that under splittable (resp., unsplittable) and non-segregated routing, even for demands of single source (resp., destination), the problem cannot be approximated within a ratio better than $\Omega\left(\frac{c_{\max}}{c_{\min}} \right)$ unless P=NP, where $c_{\max}$ (resp., $c_{\min}$) denotes the maximum (resp., minimum) capacity.

Speaker Wenkai Dai (University of Vienna)

Wenkai Dai is a final-year PhD student at the Faculty of Computer Science, University of Vienna, Austria, scheduled to finish his PhD in 2024. Previously, he obtained his master's degree in computer science from the University of Saarland, Germany, with a focus on theoretical computer science.

In his doctoral research, he delves into addressing algorithmic challenges inherent to next-generation networking and distributed systems. His work spans a broad spectrum of complexities, ranging from mitigating congestion and optimizing routing lengths in reconfigurable/optical data center networks to robust failover routing protocols. Moreover, he maintains a keen interest in algorithmic problems across various domains, including complexity theory, combinatorial optimization, graph theory, and distributed/online algorithms.

Congestion-aware Routing and Content Placement in Elastic Cache Networks

Jinkun Zhang and Edmund Yeh (Northeastern University, USA)

0

Caching can be leveraged to significantly improve network performance and mitigate congestion. However, characterizing the optimal tradeoff between routing cost and cache deployment cost remains an open problem. In this paper, for a network with arbitrary topology and congestion-dependent nonlinear cost functions, we aim to jointly determine the cache deployment, content placement, and hop-by-hop routing strategies, so that the sum of routing cost and cache deployment cost is minimized. We tackle this NP-hard problem starting with a fixed-routing setting, and then generalize to a dynamic-routing setting. For the fixed-routing setting, a Gradient-combining Frank-Wolfe algorithm with (1,1/2)-approximation is presented. For the general dynamic-routing setting, we obtain a set of KKT conditions, and devise a distributed and adaptive online algorithm based on these conditions. We demonstrate via extensive simulation that our algorithms significantly outperform a number of baseline techniques.

Speaker

Speaker biography is not available.

Session Chair

I-Hong Hou (Texas A&M University, USA)

Enter Zoom

AirSLAM: Rethinking Edge-Assisted Visual SLAM with On-Chip Intelligence

Danyang Li, Yishujie Zhao and Jingao Xu (Tsinghua University, China); Shengkai Zhang (Wuhan University of Technology, China); Longfei Shangguan (University of Pittsburgh, USA); Zheng Yang (Tsinghua University, China)

2

Edge-assisted visual SLAM stands as a pivotal enabler for emerging mobile applications, such as search-and-rescue and industrial inspection. Limited by the computing capability of lightweight mobile devices, current innovations balance system accuracy and efficiency by allocating lightweight and time-sensitive tracking tasks to mobile devices, while offloading the more resource-intensive yet delay-tolerant map optimization tasks to the edge. However, our pilot study reveals several limitations of such a tracking-optimization decoupled paradigm, arising due to the disruption of inter-dependencies between the two tasks concerning data, resources, and threads.

In this paper, we design and implement AirSLAM, an innovative system that reshapes the edge-assisted visual SLAM by tightly integrating tracking and partial-yet-crucial optimization on mobile. AirSLAM harnesses the hierarchical and heterogeneous computing units offered by the latest commercial systems-on-chip (SoCs) to enhance the computational capacity of mobile devices, which in turn, allows AirSLAM to design a suit of novel algorithms for map sync, optimization, and tracking that accommodate such architectural upgrade. By fully embracing the on-chip intelligence, AirSLAM simultaneously enhances system accuracy and efficiency through software-hardware co-design. We deploy AirSLAM on a drone for industrial inspection. Comprehensive experiments in one of the world's largest oil fields over three months demonstrate its superior performance.

Speaker Danyang Li (Tsinghua University)

Danyang Li is currently a PhD student in Software Engineering at Tsinghua University. His research interests include

Internet of Things and mobile computing.

BREAK: A Holistic Approach for Efficient Container Deployment among Edge Clouds

Yicheng Feng and Shihao Shen (Tianjin University, China); Xiaofei Wang (Tianjin Key Laboratory of Advanced Networking, Tianjin University, China); Qiao Xiang (Xiamen University, China); Hong Xu (The Chinese University of Hong Kong, Hong Kong); Chenren Xu (Peking University, China); Wenyu Wang (Shanghai Zhuichu Networking Technologies Co., Ltd., China)

0

Container technology has revolutionized service deployment, offering streamlined processes and enabling container orchestration platforms to manage a growing number of container clusters. However, the deployment of containers in distributed edge clusters presents challenges due to their unique characteristics, such as bandwidth limitations and resource constraints. Existing approaches designed for cloud environments often fall short in addressing the specific requirements of edge computing. Additionally, very few edge-oriented solutions explore fundamental changes to the container design, resulting in difficulties achieving backward compatibility. In this paper, we reevaluate the fundamental layer-based structure of containers. We identify that the proliferation of redundant files and operations within image layers hinders efficient container deployment. Drawing upon the crucial insight of enhancing layer reuse and extracting benefits from it, we introduce BREAK, a holistic approach centered on layer structure throughout the entire container deployment pipeline, ensuring backward compatibility. BREAK refactors image layers and proposes an edge-oriented cache solution to enable ubiquitous and shared layers. Moreover, it addresses the complete deployment pipeline by introducing a customized scheduler and a tailored storage driver. Our results demonstrate that BREAK accelerates the deployment process by up to 2.1× and reduces redundant image size by up to 3.11× compared to state-of-the-art approaches.

Speaker Yicheng Feng (Tianjin University)

Yicheng Feng is a master's student at Tianjin University. His research focuses on edge computing, resource optimization, and scheduling.

Exploiting Storage for Computing: Computation Reuse in Collaborative Edge Computing

Xingqiu He and Chaoqun You (Fudan University, China); Tony Q. S. Quek (Singapore University of Technology and Design, Singapore)

0

Collaborative Edge Computing (CEC) is a new edge computing paradigm that enables neighboring edge servers to share computational resources with each other. Although CEC can enhance the utilization of computational resources, it still suffers from resource waste. The primary reason is that end-users from the same area are likely to offload similar tasks to edge servers, thereby leading to duplicate computations. To improve system efficiency, the computation results of previously executed tasks can be cached and then reused by subsequent tasks. However, most existing computation reuse algorithms only consider one edge server, which significantly limits the effectiveness of computation reuse. To address this issue, this paper applies computation reuse in CEC networks to exploit the collaboration among edge servers. We formulate an optimization problem that aims to minimize the overall task response time and decompose it into a caching subproblem and a scheduling subproblem. By analyzing the properties of optimal solutions, we show that the optimal caching decisions can be efficiently searched using the bisection method. For the scheduling subproblem, we utilize projected gradient descent and backtracking to find a local minimum. Numerical results show that our algorithm significantly reduces the response time under various situations.

Speaker

Speaker biography is not available.

INVAR: Inversion Aware Resource Provisioning and Workload Scheduling for Edge Computing

Bin Wang (University of Massachusetts Amherst, USA); David Irwin and Prashant Shenoy (University of Massachusetts, Amherst, USA); Don Towsley (University of Massachusetts at Amherst, USA)

0

Edge computing is emerging as a complementary architecture to cloud computing to address some of its associated issues. One of the major advantages of edge computing is that edge data centers are usually much closer to users compared to traditional cloud data centers. Therefore, it is commonly believed that for developers of latency-sensitive applications, they can effectively reduce the overall end-to-end latency by simply transitioning from a cloud deployment to an edge deployment. However, as recent work has shown, the performance of an edge deployment is vulnerable to a couple of factors which under many practical scenarios can lead to edge servers providing worse end-to-end response time than cloud servers. This phenomenon is referred to as edge performance inversion. In this paper, we propose resource allocation algorithms and workload scheduling algorithms that actively prevent edge performance inversion. Our algorithms are based on queueing theory results and optimization techniques. Evaluation results show that INVAR can find a near-optimal solution that outperforms the performance of a cloud deployment by an adjustable margin. Simulation results based on production workloads from Akamai data centers show that INVAR can outperform common heuristic-based edge deployment by 11% to 24% in real-world scenarios.

Speaker

Speaker biography is not available.

Session Chair

Li Chen (University of Louisiana at Lafayette, USA)

Enter Zoom

Expediting Distributed GNN Training with Feature-only Partition and Optimized Communication Planning

Bingqian Du and Jun Liu (Huazhong University of Science and Technology, China); Ziyue Luo (The Ohio State University, USA); Chuan Wu (The University of Hong Kong, Hong Kong); Qiankun Zhang- and Hai Jin (Huazhong University of Science and Technology, China)

0

Feature-only partition of large graph data in distributed Graph Neural Network (GNN) training offers advantages over commonly adopted graph structure partition, such as minimal graph preprocessing cost and elimination of cross-worker subgraph sampling burdens. Nonetheless, performance bottleneck of GNN training with feature-only partitions still largely lies in the substantial communication overhead due to cross-worker feature fetching. To reduce the communication overhead and expedite distributed training, we first investigate and answer two key questions on convergence behaviors of GNN model in feature-partition based distribute GNN training: 1) As no worker holds a complete copy of each feature, can gradient exchange among workers compensate for the information loss due to incomplete local features? 2) If the answer to the first question is negative, is feature fetching in every training iteration of the GNN model necessary to ensure model convergence? Based on our theoretical findings on these questions, we derive an optimal communication plan that decides the frequency for feature fetching during the training process, taking into account bandwidth levels among workers and striking a balance between model loss and training time. Extensive evaluation demonstrates consistent results with our theoretical analysis, and the effectiveness of our proposed design.

Speaker Bingqian Du(Huazhong University of Science and Technology)

Workflow Optimization for Parallel Split Learning

Joana Tirana (University College Dublin and VistaMilk SFI, Ireland); Dimitra Tsigkari (Telefonica Research, Spain); George Iosifidis (Delft University of Technology, The Netherlands); Dimitris Chatzopoulos (University College Dublin, Ireland)

0

Split learning (SL) has been recently proposed as a way to enable resource-constrained devices to train multi-parameter neural networks (NNs) and participate in federated learning (FL). In a nutshell, SL splits the NN model into parts, and allows clients (devices) to offload the largest part as a processing task to a computationally powerful helper. In parallel SL, multiple helpers can process model parts of one or more clients, thus, considerably reducing the maximum training time over all clients (makespan). In this paper, we focus on orchestrating the workflow of this operation, which is critical in highly heterogeneous systems, as our experiments show. In particular, we formulate the joint problem of client-helper assignments and scheduling decisions with the goal of minimizing the training makespan, and we prove that it is NP-hard. We propose a solution method based on the decomposition of the problem by leveraging its inherent symmetry, and a second one that is fully scalable. A wealth of numerical evaluations using our testbed's measurements allow us to build a solution strategy comprising these methods. Moreover, we show that this strategy finds a near-optimal solution, and achieves a shorter makespan than the baseline scheme by up to 52.3%.

Speaker

Speaker biography is not available.

Learning to Decompose Asymmetric Channel Kernels for Generalized Eigenwave Multiplexing

Zhibin Zou, Iresha Amarasekara and Aveek Dutta (University at Albany, SUNY, USA)

0

Learning the principal eigenfunctions of a kernel is at the core of many machine-learning problems. Common methods usually deal with symmetric kernels based on Mercer's Theorem. However, in the communication systems, the channel kernel is usually asymmetric due to the inconsistencies between the uplink and the downlink propagation environment. In this paper, we propose an explainable Neural Network for extracting eigenfunctions from generic multi-dimensional asymmetric channel kernels based on a recent method called High Order Generalized Mercer's Theorem (HOGMT), by decomposing it into jointly orthogonal eigenfunctions. The proposed neural network based approach is efficient and can be easily implemented compared to the conventional SVD based solutions used for eigen decomposition. We also discuss the effect of different hyper-parameters on the training time, constraint satisfaction, and overall performance. Finally, we show that multiplexing using these eigenfunctions mitigates interference across all the available Degrees of Freedom (DoF), both mathematically as well as via neural network based system-level simulations.

Speaker

Speaker biography is not available.

META-MCS: A Meta-knowledge Based Multiple Data Inference Framework

Zijie Tian, En Wang, Wenbin Liu, Baoju Li and Funing Yang (Jilin University, China)

0

Mobile crowdsensing (MCS) is a paradigm for data collection with the limitation of budgets and worker availability. The central strategy of MCS is recruiting workers to sense a part of data and subsequently infer the unsensed data. To infer unsensed data, prior research has proposed several algorithms that do not require historical data, but their inference accuracy is very limited. More effective works are training a model with sufficient historical data. However, such methods can't infer data with few to none historical data. A more promising strategy is training models from other similar datasets that have been sensed. However, such datasets are different in terms of sensing locations, numbers of sensed data and data types. Such variance introduces the complex issue of integrating knowledge from these datasets and then training inference models. To solve these, we propose a meta-knowledge based multiple data inference framework named META-MCS. In META-MCS, we propose a similarity evaluation model TMFS. Following this, we cluster similar datasets and train generalized models for each cluster. Finally, META-MCS selects an appropriate model to infer unsensed data. We validate our proposed methods through extensive experiments using ten different datasets, which substantiate the effectiveness of our framework.

Speaker Zijie Tian (Jilin University)

Session Chair

Mariya Zheleva (UAlbany SUNY, USA)

Enter Zoom

RateMP: Optimizing Bandwidth Utilization with High Burst Tolerance in Data Center Networks

Jiangping Han, Kaiping Xue and Wentao Wang (University of Science and Technology of China, China); Ruidong Li (Kanazawa University, Japan); Qibin Sun and Jun Lu (University of Science and Technology of China, China)

0

Load balancing in data center networks (DCNs) is a crucial and complex undertaking. Multi-path TCP (MPTCP) has been proposed as a cost-effective solution that aims to distribute workloads and improve network resource utilization. However, it can escalate buffer occupancy and undermine burst tolerance, particularly in scenarios involving incast short flows.
To address these limitations, we propose a novel multi-path congestion control algorithm, RateMP, to optimize bandwidth utilization efficiency while ensuring burst tolerance in DCNs. RateMP employs a hybrid window and rate control loop with coupled gradient projection adjustment, enabling fast and fine-grained bandwidth allocation and accelerating convergence. Additionally, RateMP eliminates the limitation of cwnd with under-rate pacing to protect incast and busty flows.
We prove that RateMP is Lyapunov stable and asymptotically stable, and show the improvement of RateMP through a kernel-based implementation and extended large-scale simulations. RateMP keeps high bandwidth utilization, cuts RTT by 2x and reduces flow completion times (FCT) by 45\% in incast scenarios compared to existing algorithms.

Speaker Jiangping Han (University of Science and Technology of China)

Jiangping Han received her bachelor's degree and the doctor's degree both from the Department of Electronic Engineering and Information Science (EEIS), USTC, in 2016 and 2021, respectively. From Nov. 2019 to Oct. 2021, She was a visiting scholar with the School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, she was a Post-Doctoral researcher with the School of Cyber Science and Technology, USTC. She is currently an associate researcher with the School of Cyber Science and Technology, USTC. Her research interests include future Internet architecture design and transmission optimization.

Rearchitecting Datacenter Networks: A New Paradigm with Optical Core and Optical Edge

Sushovan Das, Arlei Silva and T. S. Eugene Ng (Rice University, USA)

0

All-optical circuit-switching (OCS) technology is the key to design energy-efficient and high-performance datacenter network (DCN) architectures for the future. However, existing round-robin based OCS cores perform poorly under realistic workloads having high traffic skewness and high volume of inter-rack traffic. To address this issue, we propose a novel DCN architecture OSSV: a combination of OCS-based core (between ToR switches) and OCS-based reconfigurable edge (between servers and ToR switches). On one hand, the OCS core is traffic agnostic and realizes reconfigurably non-blocking ToR-level connectivity. On the other hand, OCS-based edge reconfigures itself to reshape the incoming traffic in order to jointly minimize traffic skewness and inter-rack traffic volume. Our novel optimization framework can obtain the right balance between these inter-twined objectives. Our extensive simulations and testbed evaluation show that OSSV can achieve high performance under diverse DCN traffic while consuming low power and incurring low cost.

Speaker

Speaker biography is not available.

BiCC: Bilateral Congestion Control in Cross-datacenter RDMA Networks

Zirui Wan, Jiao Zhang and Mingxuan Yu (Beijing University of Posts and Telecommunications, China); Junwei Liu and Jun Yao (Chinamobile Cloud Centre, China); Xinghua Zhao (China Mobile (Suzhou) Software Technology Co., Ltd, China); Tao Huang (Beijing University of Posts and Telecommunications, China)

0

With the development of network-intensive applications like machine learning and cloud storage, there are two growing trends: (i) RDMA has been widely deployed to enhance underlying high-speed networks; (ii) applications are deployed on geographically distributed datacenters to meet customer demands (e.g., low access latency to services or regular data backups). To fully utilize the benefits of RDMA, we desire to support long-haul RDMA transport for cross-datacenter applications. Different from common intra-datacenter communications, the hybrid of long-haul and intra-datacenter traffic complicates the congestion state, and the considerably long control loop makes it more severe. We revisit existing congestion control methods and find they are insufficient to address the hybrid traffic congestion.

In this paper, we propose Bilateral Congestion Control (BiCC), a novel solution relying on two-side DCI-switches to bilaterally alleviate the hybrid traffic congestion in the sender-side and receiver-side datacenter while serving as a building block for existing host-driven methods. We implement BiCC on commodity P4-based switches and conduct evaluations using both testbed experiments and NS3 simulations. The extensive evaluation results show that BiCC ensures fast congestion avoidance. Thus, BiCC reduces the average FCT for intra-datacenter and interdatacenter traffic by up to 53% and 51%, respectively, in largescale simulations.

Speaker Zirui Wan

Zirui Wan is the fourth year phd student, from Beijing University of Posts and Telecommunications, advised by professor Jiao Zhang, where he get his bachelor's degree in 2020. His research interests are the transport protocols in different networks, including datacenter networks and intra-host networks.

Explicit Dropping Notification in Data Centers

Qingkai Meng (Beihang University, China); Yiran Zhang (Beijing University of Posts and Telecommunication, China); Chaolei Hu, Bo Wang and Fengyuan Ren (Tsinghua University, China)

0

Datacenter applications increasingly demand microsecond-scale latency and tight tail latency. Despite recent advances in datacenter transport protocols, we notice that the timeout caused by packet loss is the killer of microsecond-scale latency. Moreover, refining the RTO setting is impractical due to the significant fluctuations in RTT. In this paper, we propose explicit dropping notification (EDN) to avoid timeouts. EDN rekindles ICMP Source Quench, where the switch notifies the source of precise packet loss information. Then the source can rapidly pinpoint dropped packets for fast retransmission instead of waiting for timeouts. More importantly, fast retransmission does not mean immediate retransmission which is prone to aggravate congestion and deteriorate latency. In light of this, we suggest finessing the timing and sending rate of retransmission. Specifically, as a reward of the paradigm shift to explicit notification, the source can pause for the queue draining time piggybacked on EDN messages and estimate connection capacity to figure out a proper sending rate, thus avoiding congestion aggravation. We implement EDN on the P4-programmable switching ASIC and Linux kernel. Evaluations show that, compared with state-of-the-art loss recovery schemes, EDN reduces the latency by up to 4.1$\times$ on average and 3.6$\times$ at the 99th-percentile.

Speaker Qingkai Meng(Beihang University)

Session Chair

Chunyi Peng (Purdue University, USA)

Enter Zoom

Quantum BGP with Online Path Selection via Network Benchmarking

Maoli Liu and Zhuohua Li (The Chinese University of Hong Kong, Hong Kong); Kechao Cai (Sun Yat-Sen University, China); Jonathan Allcock (Tencent Quantum Laboratory, Hong Kong); Shengyu Zhang (Tencent Quantum Laboratory, China); John Chi Shing Lui (Chinese University of Hong Kong, Hong Kong)

0

Large-scale quantum networks with thousands of nodes require topology-oblivious routing protocols to realize. Most existing quantum network routing protocols only consider the intra-domain scenario, where all nodes belong to a single party with complete topology knowledge. However, like the classical Internet, quantum Internet will likely be provided by multiple quantum Internet Service Providers (qISPs). In this paper, we consider the inter-domain scenario, where the network consists of multiple subnetworks owned by mutually-untrusted parties without centralized control. Under this setting, previously proposed quantum path selection mechanisms, which rely on the network topology knowledge, are no longer applicable. We propose a Quantum Border Gateway Protocol (QBGP) for efficiently routing entanglement across qISP boundaries. To guarantee high-quality information transmission, we propose an algorithm named online top-$K$ path selection. This algorithm utilizes the information gain introduced in this paper to adaptively decide on measurement parameters, allowing for the selection of high-fidelity paths and accurate fidelity estimates, while minimizing costs. Additionally, we implement a quantum network simulator and evaluate our protocol and algorithm. Our evaluation shows that QBGP effectively distributes entanglement across different qISPs, and our path selection algorithm increases the network performance by selecting high-fidelity paths with much lower resource consumption than other methods.

Speaker Zhuohua Li (The Chinese University of Hong Kong)

Zhuohua Li is a postdoctoral fellow at the Advanced Networking and System Research Laboratory (ANSRLab) at The Chinese University of Hong Kong (CUHK). He obtained his Ph.D. in Computer Science and Engineering at CUHK in 2022, under the supervision of Prof. John C.S. Lui. Before that, he completed his B.E. in Computer Science and Technology at the University of Science and Technology of China in 2017. His research focuses on the theory and applications of multi-armed bandits, quantum networks, system security, and program analysis.

Routing and Photon Source Provisioning in Quantum Key Distribution Networks

Sun Xu, Yangming Zhao and Liusheng Huang (University of Science and Technology of China, China); Chunming Qiao (University at Buffalo, USA)

0

Quantum Key Distribution (QKD) is considered to be an ultimate solution to communication security. However, current QKD devices, especially quantum photon sources, are expensive, and they can generate secret keys only at a low rate. In this paper, we design a system named RPSP for trusted relay-based QKD networks to not only minimize the number of photon sources needed in a network to ensure at least one feasible relay path exists for any potential QKD requests but also save the time to complete a batch of end-to-end QKD requests by jointly optimizing the routing of relay paths and the provisioning of photon sources along each relay path. Compared with existing works, RPSP focuses on a more practical scenario where only some of the nodes are equipped with photon sources and it leverages optical switching to enable dynamic photon source provisioning such that we can utilize such QKD devices in a more efficient way. Extensive simulations show that compared with baseline schemes, RPSP can save up to 87% of the photon sources needed in a trusted relay based QKD network, and 36% of the time to complete a batch of QKD requests.

Speaker Sun Xu (University of Science and Technology of China)

Sun Xu received B.S. degree in 2022 from the University of Electronic Science and Technology of China. He is currently studying for a master's degree in the School of Computer Science and Technology, University of Science and Technology of China(USTC). His main research interest is quantum network.

LinkSelFiE: Link Selection and Fidelity Estimation in Quantum Networks

Maoli Liu, Zhuohua Li and Xuchuang Wang (The Chinese University of Hong Kong, Hong Kong); John Chi Shing Lui (Chinese University of Hong Kong, Hong Kong)

0

Reliable transmission of fragile quantum information requires one to efficiently select and utilize high-fidelity links among multiple noisy quantum links. However, the fidelity, a quality metric of quantum links, is unknown a priori. Uniformly estimating the fidelity of all links can be expensive, especially in networks with numerous links. To address this challenge, we formulate the link selection and fidelity estimation problem as a best arm identification problem and propose an algorithm named LinkSelFiE. The algorithm efficiently identifies the optimal link from a set of quantum links and provides an accurate fidelity estimate of that link with low quantum resource consumption. LinkSelFiE estimates link fidelity based on the feedback of a vanilla network benchmarking subroutine, and adaptively eliminates inferior links throughout the whole fidelity estimation process. This elimination leverages a novel confidence interval derived in this paper for the estimates from the subroutine, which theoretically guarantees that LinkSelFiE outputs the optimal link correctly with high confidence. We also establish a provable upper bound of cost complexity for LinkSelFiE. Moreover, we perform extensive simulations under various scenarios to corroborate that LinkSelFiE outperforms other existing methods in terms of both identifying the optimal link and reducing quantum resource consumption.

Speaker Maoli Liu (The Chinese University of Hong Kong)

Maoli Liu is a fourth-year Ph.D. candidate in the Department of Computer Science and Engineering at the Chinese University of Hong Kong, under the supervision of Prof. John C.S. Lui. Before that, she completed his B.E. in Infomation Engineering at Xi'an Jiaotong University in 2020. Her research focuses on the theory and applications of multi-armed bandits, computer networks, and quantum networks.

Routing and Wavelength Assignment for Entanglement Swapping of Photonic Qubits

Yangyu Wang, Yangming Zhao and Liusheng Huang (University of Science and Technology of China, China); Chunming Qiao (University at Buffalo, USA)

0

Efficient entanglement routing in Quantum Data Networks (QDNs) is essential in order to concurrently establish as many Entanglement Connections (ECs) as possible, which in turn maximizes the network throughput. In this work, we consider a new class of QDNs with wavelength division multiplexed (WDM) quantum links where each quantum repeater will perform entanglement swapping by measuring two photonic qubits coming from some entangled photon sources directly on the same wavelength. To address unique challenges in achieving a high network throughput in such QDNs, we propose QuRWA to jointly optimize the entanglement routing and wavelength assignment. To this end, we introduce a key concept named Co-Path to improve fault-tolerance: all ELs in a Co-Path set will be assigned the same wavelength and this may serve as backup for some other ELs in the same Co-Path when establishing ECs. We design efficient algorithms to optimize the Co-Path selection and wavelength assignment to maximize resource utilization and fault tolerance. Extensive simulations demonstrate that compared with the methods without introducing Co-Path, QuRWA improves the network throughput by up to 122%.

Speaker Yangyu Wang(University of Science and Technology of China)

Yangyu Wang received B.S. degree in 2020 from the Hubei University. He is currently studying for a master's degree in the School of Computer Science and Technology, University of Science and Technology of China(USTC). His main research interest is the design and optimization of quantum network communication protocols, including research on routing and transmission protocols. Currently, his focus is mainly on issues related to quantum data networks. The paper to be shared at this conference also focuses on solving efficient entanglement routing in quantum data networks to improve resource utilization and network throughput. In the future, he will also conduct more research on scheduling problems in quantum data networks, hoping to have the opportunity to share relevant achievements with researchers in the communication field.

Session Chair

Carlee Joe-Wong (Carnegie Mellon University, USA)

Enter Zoom

Program at a Glance