Session E-3

RL Protocols

Conference
10:00 AM — 11:30 AM EDT
Local
May 12 Wed, 10:00 AM — 11:30 AM EDT

DRL-OR: Deep Reinforcement Learning-based Online Routing for Multi-type Service Requirements

Chenyi Liu, Mingwei Xu, Yuan Yang and Nan Geng (Tsinghua University, China)

0
Emerging applications raise critical QoS requirements for the Internet. The improvements of flow classification technologies, software defined networks (SDN), and programmable network devices make it possible to fast identify users' requirements and control the routing for fine-grained traffic flows. Meanwhile, the problem of optimizing the forwarding paths for traffic flows with multiple QoS requirements in an online fashion is not addressed sufficiently. To address the problem, we propose DRL-OR, an online routing algorithm using multi-agent deep reinforcement learning. DRL-OR organizes the agents to generate routes in a hop-by-hop manner, which inherently has good scalability. It adopts a comprehensive reward function, an efficient learning algorithm, and a novel deep neural network structure to learn an appropriate routing policy for different types of flow requirements. To guarantee the reliability and accelerate the online learning process, we further introduce safe learning mechanism to DRL-OR. We implement DRL-OR under SDN architecture and conduct Mininet-based experiments by using real network topologies and traffic traces. The results validate that DRL-OR can well satisfy the requirements of latency-sensitive, throughput-sensitive, latency-throughput-sensitive, and latency-loss-sensitive flows at the same time, while exhibiting great adaptiveness and reliability under the scenarios of link failure, traffic change, and partial deployment.

An Experience Driven Design for IEEE 802.11ac Rate Adaptation based on Reinforcement Learning

Syuan-Cheng Chen, Chi-Yu Li and Chui-Hao Chiu (National Chiao Tung University, Taiwan)

1
The IEEE 802.11ac supports gigabit speeds by extending 802.11n air-interface features and increases the number of rate options by more than two times. Enabling so many rate options can be a challenge to rate adaptation (RA) solutions. Particularly, they need to adapt rates to various fast-changing channels; they would suffer without scalability. In this work, we identify three limitations of current 802.11ac RAs on commodity network interface cards (NICs): no joint rate and bandwidth adaptation, lack of scalability, and no online learning capability. To address the limitations, we apply deep reinforcement learning (DRL) into designing a scalable, intelligent RA, designated as experience driven rate adaptation (EDRA). DRL enables the online learning capability of EDRA, which not only automatically identifies useful correlations between important factors and performance for the rate search, but also derives low-overhead avenues to approach highest-goodput (HG) rates by learning from experience. It can make EDRA scalable to timely locate HG rates among many rate options over time. We implement and evaluate EDRA using the Intel Wi-Fi driver and Google TensorFlow on Intel 802.11ac NICs. The evaluation result shows that EDRA can outperform the Intel and Linux default RAs by up to 821.4% and 242.8%, respectively, in various cases.

Owl: Congestion Control with Partially Invisible Networks via Reinforcement Learning

Alessio Sacco (Politecnico di Torino, Italy & Saint Louis University, USA); Matteo Flocco and Flavio Esposito (Saint Louis University, USA); Guido Marchetto (Politecnico di Torino, Italy)

1
Years of research on transport protocols have not solved the tussle between in-network and end-to-end congestion control. This debate is due to the variance of conditions and assumptions in different network scenarios, e.g., cellular versus data center networks. Recently, the community has proposed a few transport protocols driven by machine learning, nonetheless limited to end-to-end approaches.

In this paper, we present Owl, a transport protocol based on reinforcement learning, whose goal is to select the proper congestion window learning from end-to-end features and network signals, when available. We show that our solution converges to a fair resource allocation after the learning overhead. Our kernel implementation, deployed over emulated and large scale virtual network testbeds, outperforms all benchmark solutions based on end-to-end or in-network congestion control.

Leveraging Domain Knowledge for Robust Deep Reinforcement Learning in Networking

Ying Zheng, Haoyu Chen, Qingyang Duan and Lixiang Lin (Fudan University, China); Yiyang Shao and Wei Wang (Huawei, China); Xin Wang and Yuedong Xu (Fudan University, China)

1
The past few years has witnessed a surge of interest towards deep reinforcement learning (Deep RL) in computer networks. With extraordinary ability of feature extraction, Deep RL has the potential to re-engineer the fundamental resource allocation problems in networking without relying on pre-programmed models or assumptions about dynamic environments. However, such black-box systems suffer from poor robustness, showing high performance variance and poor tail performance. In this work, we propose a unified Teacher-Student learning framework that harnesses rich domain knowledge to improve robustness. The domain-specific algorithms, less performant but more trustable than Deep RL, play the role of teachers providing advice at critical states; the student neural network is steered to maximize the expected reward as usual and mimic the teacher's advice meanwhile. The Teacher-Student method comprises of three modules where the confidence check module locates wrong decisions and risky decisions, the reward shaping module designs a new updating function to incentive the learning of student network, and the prioritized experience replay module to effectively utilize the advised actions. We further implement our Teacher-Student framework in existing video streaming (Pensieve), load balancing (DeepLB) and TCP congestion control (Aurora). Experimental results manifest that the proposed approach reduces the performance standard deviation of DeepLB by 37%; it improves the 90th, 95th and 99th tail performance of Pensieve by 7.6%, 8.8%, 10.7% respectively; and it accelerates the rate of growth of Aurora by 2x at the initial stage, and achieves a more stable performance in dynamic environments.

Session Chair

Haiming Jin (Shanghai Jiao Tong University, China)

Session E-4

RL Networking

Conference
12:00 PM — 1:30 PM EDT
Local
May 12 Wed, 12:00 PM — 1:30 PM EDT

INCdeep: Intelligent Network Coding with Deep Reinforcement Learning

Qi Wang (Institute of Computing Technology, Chinese Academy of Sciences, China); Jianmin Liu (Institute of Computing Technology Chinese Academy of Sciences, China); Katia Jaffr├Ęs-Runser (University of Toulouse - Toulouse INP & IRIT Laboratory, France); Yongqing Wang, ChenTao He, Cunzhuang Liu and Yongjun Xu (Institute of Computing Technology, Chinese Academy of Sciences, China)

1
In this paper, we address the problem of building adaptive network coding coefficients under dynamic network conditions (e.g., varying link quality and changing number of relays). In existing linear network coding solutions including deterministic network coding and random linear network coding, coding coefficients are set by a heuristic or randomly chosen from a Galois field with equal probability, which can not adapt to dynamic network conditions with good decoding performance. We propose INCdeep, an adaptive Intelligent Network Coding with Deep Reinforcement Learning. Specifically, we formulate a coding coefficients selection problem where network variations can be automatically and continuously expressed as the state transitions of a Markov decision process (MDP). The key advantage is that INCdeep is able to learn and dynamically adjust the coding coefficients for the source node and each relay node according to ongoing network conditions, instead of randomly. The results show that INCdeep has generalization ability that adapts well in dynamic scenarios where link quality is changing fast, and it converges fast in the training process. Compared with the benchmark coding algorithms, INCdeep shows superior performance, including higher decoding probability and lower coding overhead through simulations and experiments.

Bound Inference and Reinforcement Learning-based Path Construction in Bandwidth Tomography

Cuiying Feng, Jianwei An and Kui Wu (University of Victoria, Canada); Jianping Wang (City University of Hong Kong, Hong Kong)

1
Inferring the bandwidth of internal links from the bandwidth of end-to-end paths, so-termed bandwidth tomography, is a long-standing open problem in the network tomography literature. The difficulty is due to the fact that no existing mathematical tool is directly applicable to solve the inverse problem with a set of min-equations. We systematically tackle this challenge by designing a polynomial-time algorithm that returns the exact bandwidth value for all identifiable links and the tightest error bound for unidentifiable links for a given set of measurement paths. When measurement paths are not given in advance, we prove the hardness of building measurement paths that can be used for deriving the global tightest error bounds for unidentifiable links. Accordingly, we develop a reinforcement learning (RL) approach for measurement path construction, that utilizes the special knowledge in bandwidth tomography and integrates both offline training and online prediction. Evaluation results with real-world ISP as well as simulated networks demonstrate that compared to other path construction methods, Random and Diversity Preferred, our RL-based path construction method can build measurement paths that result in much smaller average error bound of link bandwidth.

A Universal Transcoding and Transmission Method for Livecast with Networked Multi-Agent Reinforcement Learning

Xingyan Chen and Changqiao Xu (Beijing University of Posts and Telecommunications, China); Mu Wang (State Key Laboratory of Networking and Switching Technology, China); Zhonghui Wu and Shujie Yang (Beijing University of Posts and Telecommunications, China); Lujie Zhong (Capital Normal University, China); Gabriel-Miro Muntean (Dublin City University, Ireland)

1
Intensive video transcoding and data transmission are the most crucial tasks for large-scale Crowd-sourced Livecast Services (CLS). However, there exists no versatile model for joint optimization of computing resources (e.g., CPU) and transmission resources (e.g., bandwidth) in CLS systems, making maintaining the balance between saving resources and improving user viewing experience very challenging. In this paper, we first propose a novel universal model, called Augmented Graph Model (AGM), which converts the above joint optimization into a multi-hop routing problem. This model provides a new perspective for the analysis of resource allocation in CLS, as well as opens new avenues for problem-solving. Further, we design a decentralized Networked Multi-Agent Reinforcement Learning (MARL) approach and propose an actor-critic algorithm, allowing network nodes (agents) to distributively solve the multi-hop routing problem using AGM in a fully cooperative manner. By leveraging the computing resource of massive nodes efficiently, this approach has good scalability and can be employed in large-scale CLS. To the best of our knowledge, this work is the first attempt to apply networked MARL on CLS. Finally, we use the centralized (single-agent) RL algorithm as a benchmark to evaluate the numerical performance of our solution in a large-scale simulation. Additionally, experimental results based on a prototype system show that our solution is superior in saving resources and service performance to two alternative state-of-the-art solutions.

Reliability-aware Dynamic Service Chain Scheduling in 5G Networks based on Reinforcement Learning

Junzhong Jia and Lei Yang (South China University of Technology, China); Jiannong Cao (Hong Kong Polytechnical University, Hong Kong)

1
As a key enabler of future 5G networks, Service Function Chain (SFC) forwards the traffic flow along a chain of VNFs to provide network services flexibility. One important problem in SFC is to deploy the VNFs and schedule arriving requests among the computing nodes to achieve low latency and high reliability. Existing works consider a static network and assume that all SFC requests are known in advance, which is impractical. In this paper, we focus on a dynamic 5G network environment where the SFC requests arrive randomly and the computing nodes can redeploy all types of VNFs with a time cost. We formulate the SFC scheduling problem as a mixed integer non-linear programing. To solve the problem, we propose an efficient algorithm to decide the redundancy of the VNFs while minimizing delay. Then we present a state-of-art Reinforcement Learning (RL) to learn SFC scheduling policy to increase the success rate of SFC requests. We evaluate our method through extensive simulations. The result shows that the proposed RL solution can increase the success rate by 18.7% over the benchmark algorithms.

Session Chair

Paolo Casari (University of Trento, Italy)

Session E-5

Federated Learning 1

Conference
2:30 PM — 4:00 PM EDT
Local
May 12 Wed, 2:30 PM — 4:00 PM EDT

FAIR: Quality-Aware Federated Learning with Precise User Incentive and Model Aggregation

Yongheng Deng (Tsinghua University, China); Feng Lyu and Ju Ren (Central South University, China); Yi-Chao Chen (Shanghai Jiao Tong University, China); Peng Yang (Huazhong University of Science and Technology, China); Yuezhi Zhou and Yaoxue Zhang (Tsinghua University, China)

1
Federated learning enables distributed learning in a privacy-protected manner, but two challenging reasons can affect learning performance significantly. First, mobile users are not willing to participate in learning due to computation and energy consumption. Second, with various factors (e.g., training data size/quality), the model update quality of mobile devices can vary dramatically, inclusively aggregating low-quality model updates can deteriorate the global model quality. In this paper, we propose a novel system named FAIR, i.e., Federated leArning with qualIty awaReness. FAIR integrates three major components: 1) learning quality estimation: we leverage historical learning records to estimate the user learning quality, where the record freshness is considered and the exponential forgetting function is utilized for weight assignment; 2) quality-aware incentive mechanism: within the recruiting budget, we model a reverse auction problem to encourage the participation of high-quality learning users, and the method is proved to be truthful, individually rational, and computationally efficient; and 3) model aggregation: we devise an aggregation algorithm that integrates the model quality into aggregation and filters out non-ideal model updates, to further optimize the global learning model. Based on real-world datasets and practical learning tasks, extensive experiments are carried out to demonstrate the efficacy of FAIR.

FedSens: A Federated Learning Approach for Smart Health Sensing with Class Imbalance in Resource Constrained Edge Computing

Daniel Zhang, Ziyi Kou and Dong Wang (University of Notre Dame, USA)

1
The advance of mobile sensing and edge computing has brought new opportunities for abnormal health detection (AHD) systems where edge devices such as smartphones and wearable sensors are used to collect people's health information and provide early alerts for abnormal health conditions such as stroke and depression. The recent development of federated learning (FL) allows participants to collaboratively train powerful AHD models while keeping their health data private to local devices. This paper targets at addressing a critical challenge of adapting FL to train AHD models, where the participants' health data is highly imbalanced and contains biased class distributions. Existing FL solutions fail to address the class imbalance issue due to the strict privacy requirements of participants as well as the heterogeneous resource constraints of their edge devices. In this work, we propose FedSens, a new FL framework dedicated to address the class imbalance problem in AHD applications with explicit considerations of participant privacy and device resource constraints. We evaluate FedSens using a real-world edge computing testbed on two real-world AHD applications. The results show that FedSens can significantly improve the accuracy of AHD models in the presence of severe class imbalance with low energy cost to the edge devices.

Learning for Learning: Predictive Online Control of Federated Learning with Edge Provisioning

Yibo Jin (Nanjing University, China); Lei Jiao (University of Oregon, USA); Zhuzhong Qian, Sheng Zhang and Sanglu Lu (Nanjing University, China)

2
Operating federated learning optimally over distributed cloud-edge networks is a non-trivial task, which requires to manage data transference from user devices to edges, resource provisioning at edges, and federated learning between edges and the cloud. We formulate a non-linear mixed integer program, minimizing the long-term cumulative cost of such a federated learning system while guaranteeing the desired convergence of the machine learning models being trained. We then design a set of novel polynomial-time online algorithms to make adaptive decisions by solving continuous solutions and converting them to integers to control the system on the fly, based only on the predicted inputs about the dynamic and uncertain cloud-edge environments via online learning. We rigorously prove the competitive ratio, capturing the multiplicative gap between our approach using predicted inputs and the offline optimum using actual inputs. Extensive evaluations with real-world training datasets and system parameters confirm the empirical superiority of our approach over multiple state-of-the-art algorithms.

Resource-Efficient Federated Learning with Hierarchical Aggregation in Edge Computing

Zhiyuan Wang, Hongli Xu and Jianchun Liu (University of Science and Technology of China, China); He Huang (Soochow University, China); Chunming Qiao and Yangming Zhao (University at Buffalo, USA)

4
Federated learning (FL) has emerged in edge computing to address limited bandwidth and privacy concerns of traditional cloud-based centralized training. However, the existing FL mechanisms may lead to long training time and consume a tremendous amount of communication resources. In this paper, we propose an efficient FL mechanism, which divides the edge nodes into K clusters by balanced clustering. The edge nodes in one cluster forward their local updates to cluster header for aggregation by synchronous method, called cluster aggregation, while all cluster headers perform the asynchronous method for global aggregation. This processing procedure is called hierarchical aggregation. Our analysis shows that the convergence bound depends on the number of clusters and the training epochs. We formally define the resource-efficient federated learning with hierarchical aggregation (RFL-HA) problem. We propose an efficient algorithm to determine the optimal cluster structure (i.e., the optimal value of K) with resource constraints and extend it to deal with the dynamic network conditions. Extensive simulation results obtained from our study for different models and datasets show that the proposed algorithms can reduce completion time by 34.8%- 70% and the communication resource by 33.8%-56.5% while achieving a similar accuracy, compared with the well-known FL mechanisms.

Session Chair

Ting He (Penn State University)

Session E-6

Federated Learning 2

Conference
4:30 PM — 6:00 PM EDT
Local
May 12 Wed, 4:30 PM — 6:00 PM EDT

P-FedAvg: Parallelizing Federated Learning with Theoretical Guarantees

Zhicong Zhong (Sun Yat-sen University, China); Yipeng Zhou (Macquarie University, Australia); Di Wu (Sun Yat-Sen University, China); Xu Chen (Sun Yat-sen University, China); Min Chen (Huazhong University of Science and Technology, China); Chao Li (Tencent, China); Quan Z. Sheng (Macquarie University, Australia)

1
With the growth of participating clients, the centralized parameter server (PS) will seriously limit the scale and efficiency of Federated Learning (FL). A straightforward approach to scale up the FL system is to construct a Parallel FL (PFL) system with multiple PSes. However, it is unclear whether PFL can really achieve a faster convergence rate or not. Even if the answer is yes, it is non-trivial to design a highly efficient parameter average algorithm for a PFL system. In this paper, we propose a completely parallelizable FL algorithm called P-FedAvg under the PFL architecture. P-FedAvg extends the well-known FedAvg algorithm by allowing multiple PSes to cooperate and train a learning model together. In P-FedAvg, each PS is only responsible for a fraction of total clients, but PSes can mix model parameters in a dedicatedly designed way so that the FL model can well converge. Different from heuristic-based algorithms, P-FedAvg is with theoretical guarantees. To be rigorous, we conduct theoretical analysis on the convergence rate of P-FedAvg, and derive the optimal weights for each PS to mix parameters with its neighbors. We also examine how the overlay topology formed by PSes affects the convergence rate and robustness of a PFL system. Lastly, we perform extensive experiments with real datasets to verify our analysis and demonstrate that P-FedAvg can significantly improve convergence rates than traditional FedAvg and other competitive baselines. We believe that our work can help to lay a theoretical foundation for building more efficient PFL systems.

Cost-Effective Federated Learning Design

Bing Luo (Shenzhen Institute of Artificial Intelligence and Robotics for Society & The Chinese University of Hong Kong, Shenzhen, China); Xiang Li (The Chinese University of Hong Kong, Shenzhen, China); Shiqiang Wang (IBM T. J. Watson Research Center, USA); Jianwei Huang (The Chinese University of Hong Kong, Shenzhen, China); Leandros Tassiulas (Yale University, USA)

1
Federated learning (FL) is a distributed learning paradigm that enables a large number of devices to collaboratively learn a model without sharing their raw data. Despite its practical efficiency and effectiveness, the iterative on-device learning process incurs a considerable cost in terms of learning time and energy consumption, which depends crucially on the number of selected clients and the number of local iterations in each training round. In this paper, we analyze how to design adaptive FL that optimally chooses these essential control variables to minimize the total cost while ensuring convergence. Theoretically, we analytically establish the relationship between the total cost and the control variables with the convergence upper bound. To efficiently solve the cost minimization problem, we develop a low-cost sampling-based algorithm to learn the convergence related unknown parameters. We derive important solution properties that effectively identify the design principles for different metric preferences. Practically, we evaluate our theoretical results both in a simulated environment and on a hardware prototype. Experimental evidence verifies our derived properties and demonstrates that our proposed solution achieves near-optimal performance for various datasets, different machine learning models, and heterogeneous system settings.

Federated Learning over Wireless Networks: A Band-limited Coordinated Descent Approach

Junshan Zhang (Arizona State University, USA); Na Li (Harvard University, USA); Mehmet Dedeoglu (Arizona State University, USA)

3
We consider a many-to-one wireless architecture for federated learning at the network edge, where multiple edge devices collaboratively train a model using local data. The unreliable nature of wireless connectivity, together with constraints in computing resources at edge devices, dictates that the local updates at edge devices should be carefully crafted and compressed to match the wireless communication resources available and should work in concert with the receiver. Thus motivated, we propose SGD-based bandlimited coordinate descent algorithms for such settings. Specifically, for the wireless edge employing over-the-air computing, a common subset of k-coordinates of the gradient updates across edge devices are selected by the receiver in each iteration, and then transmitted simultaneously over k sub-carriers, each experiencing time-varying channel conditions. We characterize the impact of communication error and compression, in terms of the resulting gradient bias and mean squared error, on the convergence of the proposed algorithms. We then study learning-driven communication error minimization via joint optimization of power allocation and learning rates. Our findings reveal that optimal power allocation across different sub-carriers should take into account both the gradient values and channel conditions, thus generalizing the widely used water-filling policy. We also develop sub-optimal distributed solutions amenable to implementation.

Dual Attention-Based Federated Learning for Wireless Traffic Prediction

Chuanting Zhang and Shuping Dang (King Abdullah University of Science and Technology, Saudi Arabia); Basem Shihada (KAUST, Saudi Arabia); Mohamed-Slim Alouini (King Abdullah University of Science and Technology (KAUST), Saudi Arabia)

2
Wireless traffic prediction is essential for cellular networks to realize intelligent network operations, such as load-aware resource management and predictive control. Existing prediction approaches usually adopt centralized training architectures and require the transferring of huge amounts of traffic data, which may raise delay and privacy concerns for certain scenarios. In this work, we propose a novel wireless traffic prediction framework named Dual Attention-Based Federated Learning (FedDA), by which a high-quality prediction model is trained collaboratively by multiple edge clients. To simultaneously capture the various wireless traffic patterns and keep raw data locally, FedDA first groups the clients into different clusters by using a small augmentation dataset. Then, a quasi-global model is trained and shared among clients as prior knowledge, aiming to solve the statistical heterogeneity challenge confronted with federated learning. To construct the global model, a dual attention scheme is further proposed by aggregating the intraand inter-cluster models, instead of simply averaging the weights of local models. We conduct extensive experiments on two real-world wireless traffic datasets and results show that FedDA outperforms state-of-the-art methods. The average mean squared error performance gains on the two datasets are up to 10% and 30%, respectively.

Session Chair

Onur Altintas (Toyota Labs)

Made with in Toronto · Privacy Policy · INFOCOM 2020 · © 2021 Duetone Corp.