Session B-7

Federated Learning 1

10:00 AM — 11:30 AM EDT
May 5 Thu, 10:00 AM — 11:30 AM EDT

A Profit-Maximizing Model Marketplace with Differentially Private Federated Learning

Peng Sun (The Chinese University of Hong Kong, Shenzhen, China); Xu Chen (Sun Yat-sen University, China); Guocheng Liao (Sun Yat-Sen University, China); Jianwei Huang (The Chinese University of Hong Kong, Shenzhen, China)

Existing machine learning (ML) model marketplaces generally require data owners to share their raw data, leading to serious privacy concerns. Federated learning (FL) can partially alleviate this issue by enabling model training without raw data exchange. However, data owners are still susceptible to privacy leakage from gradient exposure in FL, which discourages their participation. In this work, we advocate a novel differentially private FL (DPFL)-based ML model marketplace. We focus on the broker-centric design. Specifically, the broker first incentivizes data owners to participate in model training via DPFL by offering privacy protection as per their privacy budgets and explicitly accounting for their privacy costs. Then, it conducts optimal model versioning and pricing to sell the obtained model versions to model buyers. In particular, we focus on the broker's profit maximization, which is challenging due to the significant difficulties in the revenue characterization of model trading and the cost estimation of DPFL model training. We propose a two-layer optimization framework to address it, i.e., revenue maximization and cost minimization under model quality constraints. The latter is still challenging due to its non-convexity and integer constraints. We hence propose efficient algorithms, and their performances are both theoretically guaranteed and empirically validated.

Communication-Efficient Device Scheduling for Federated Learning Using Stochastic Optimization

Jake Perazzone (US Army Research Lab, USA); Shiqiang Wang (IBM T. J. Watson Research Center, USA); Mingyue Ji (University of Utah, USA); Kevin S Chan (US Army Research Laboratory, USA)

Federated learning (FL) is a useful tool in distributed machine learning that utilizes users' local datasets in a privacy-preserving manner. When deploying FL in a constrained wireless environment, however, training models in a time-efficient manner can be a challenging task due to intermittent connectivity of devices, heterogeneous connection quality, and non-i.i.d. data. In this paper, we provide a novel convergence analysis of non-convex loss functions using FL on both i.i.d. and non-i.i.d. datasets with arbitrary device selection probabilities for each round. Then, using the derived convergence bound, we use stochastic optimization to develop a new client selection and power allocation algorithm that minimizes a function of the convergence bound and the average communication time under a transmit power constraint. We find the closed form solution to the minimization problem. One key feature of the algorithm is that knowledge of the channel statistics is not required and only the instantaneous channel state information needs to be known. Using the FEMNIST and CIFAR-10 datasets, we show through simulations that communication time can be significantly decreased using our algorithm, compared to uniformly random participation.

Optimal Rate Adaption in Federated Learning with Compressed Communications

Laizhong Cui and Xiaoxin Su (Shenzhen University, China); Yipeng Zhou (Macquarie University, Australia); Jiangchuan Liu (Simon Fraser University, Canada)

Federated Learning (FL) incurs high communication overhead, which can be greatly alleviated by compression for model updates. Yet the tradeoff between compression and model accuracy in the networked environment remains unclear and, for simplicity, most implementations adopt a fixed compression rate only. In this paper, we for the first time systematically examine this tradeoff, identifying the influence of the compression error on the final model accuracy with respect to the learning rate. Specifically, we factor the compression error of each global iteration into the convergence rate analysis under both strongly convex and non-convex loss functions. We then present an adaptation framework to maximize the final model accuracy by strategically adjusting the compression rate in each iteration. We have discussed the key implementation issues of our framework in practical networks with representative compression algorithms. Experiments over the popular MNIST and CIFAR-10 datasets confirm that our solution effectively reduces network traffic yet maintains high model accuracy in FL.

Towards Optimal Multi-modal Federated Learning on Non-IID Data with Hierarchical Gradient Blending

Sijia Chen and Baochun Li (University of Toronto, Canada)

Recent advances in federated learning (FL) made it feasible to train a machine learning model across multiple clients, even with non-IID data distributions. In contrast to these uni-modal models that have been studied extensively in the literature, there are few in-depth studies on how multi-modal models can be trained effectively with federated learning. Unfortunately, we empirically observed a counter-intuitive phenomenon that, compared with its uni-modal counterpart, multi-modal FL leads to a significant degradation in performance.

Our in-depth analysis of such a phenomenon shows that modality sub-networks and local models can overfit and generalize at different rates. To alleviate these inconsistencies in collaborative learning, we propose hierarchical gradient blending (HGB), which simultaneously computes the optimal blending of modalities and the optimal weighting of local models by adaptively measuring their overfitting and generalization behaviors. When HGB is applied, we present a few important theoretical insights and convergence guarantees for convex and smooth functions, and evaluate its performance in multi-modal FL. Our experimental results on an extensive array of non-IID multi-modal data have demonstrated that HGB is not only able to outperform the best uni-modal baselines but also to achieve superior accuracy and convergence speed as compared to state-of-the-art frameworks.

Session Chair

Xiaofei Wang (Tianjin University)

Session B-8

Federated Learning 2

12:00 PM — 1:30 PM EDT
May 5 Thu, 12:00 PM — 1:30 PM EDT

FLASH: Federated Learning for Automated Selection of High-band mmWave Sectors

Batool Salehihikouei, Jerry Z Gu, Debashri Roy and Kaushik Chowdhury (Northeastern University, USA)

Fast sector-steering in the mmWave band for vehicular mobility scenarios remains an open challenge. This is because standard-defined exhaustive search over predefined antenna sectors cannot be assuredly completed within short contact times. This paper proposes machine learning to speed up sector selection using data from multiple non-RF sensors, such as LiDAR, GPS, and camera images. The contributions in this paper are threefold: First, a multimodal deep learning architecture is proposed that fuses the inputs from these data sources and locally predicts the sectors for best alignment at a vehicle. Second, it studies the impact of missing data (e.g., missing LiDAR/images) during inference, which is possible due to unreliable control channels or hardware malfunction. Third, it describes the first-of-its-kind multimodal federated learning framework that combines model weights from multiple vehicles and then disseminates the final fusion architecture back to them, thus incorporating private sharing of information and reducing their individual training times. We validate the proposed architectures on a live dataset collected from an autonomous car equipped with multiple sensors (GPS, LiDAR, and camera) and roof-mounted Talon AD7200 60GHz mmWave radios. We observe 52.75% decrease in sector selection time than 802.11ad standard while maintaining 89.32% throughput with the globally optimal solution.

Joint Superposition Coding and Training for Federated Learning over Multi-Width Neural Networks

Hankyul Baek, Won Joon Yun and Yunseok Kwak (Korea University, Korea (South)); Soyi Jung (Hallym University, Korea (South)); Mingyue Ji (University of Utah, USA); Mehdi Bennis (Centre of Wireless Communications, University of Oulu, Finland); Jihong Park (Deakin University, Australia); Joongheon Kim (Korea University, Korea (South))

This paper aims to integrate two synergetic technologies, federated learning (FL) and width-adjustable slimmable neural network (SNN) architectures. FL preserves data privacy by exchanging the locally trained models of mobile devices. By adopting SNNs as local models, FL can flexibly cope with the time-varying energy capacities of mobile devices. Combining FL and SNNs is however non-trivial, particularly under wireless connections with time-varying channel conditions. Furthermore, existing multi-width SNN training algorithms are sensitive to the data distributions across devices, so are ill-suited to FL. Motivated by this, we propose a communication and energy efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models. By applying SC, SlimFL exchanges the superposition of multiple width configurations that are decoded as many as possible for a given communication throughput. Leveraging ST, SlimFL aligns the forward propagation of different width configurations, while avoiding the inter-width interference during back propagation. We formally prove the convergence of SlimFL. The result reveals that SlimFL is not only communication-efficient but also can counteract non-IID data distributions and poor channel conditions, which is also corroborated by simulations.

Tackling System and Statistical Heterogeneity for Federated Learning with Adaptive Client Sampling

Bing Luo (Shenzhen Institute of Artificial Intelligence and Robotics for Society & The Chinese University of Hong Kong, Shenzhen, China); Wenli Xiao (The Chinese University of Hong Kong, Shenzhen, China); Shiqiang Wang (IBM T. J. Watson Research Center, USA); Jianwei Huang (The Chinese University of Hong Kong, Shenzhen, China); Leandros Tassiulas (Yale University, USA)

Federated learning (FL) algorithms usually sample a fraction of clients in each round (partial participation) when the number of participants is large and the server's communication bandwidth is limited. Recent works on the convergence analysis of FL have focused on unbiased client sampling, e.g., sampling uniformly at random, which suffers from slow wall-clock time for convergence due to high degrees of system heterogeneity and statistical heterogeneity. This paper aims to design an adaptive client sampling algorithm that tackles both system and statistical heterogeneity to minimize the wall-clock convergence time. We obtain a new tractable convergence bound for FL algorithms with arbitrary client sampling probabilities. Based on the bound, we analytically establish the relationship between the total learning time and sampling probabilities, which results in a non-convex optimization problem. We design an efficient algorithm for learning the unknown parameters in the convergence bound and develop a low-complexity algorithm to approximately solve the non-convex problem. Experimental results from both hard-ware prototype and simulation demonstrate that our proposed sampling scheme significantly reduces the convergence time compared to several baseline sampling schemes. Notably, our scheme in hardware prototype spends 73% less time than the baseline uniform sampling for reaching the same target loss.

The Right to be Forgotten in Federated Learning: An Efficient Realization with Rapid Retraining

Yi Liu (City University of Hong Kong, China); Lei Xu (Nanjing University of Science and Technology, China); Xingliang Yuan (Monash University, Australia); Cong Wang (City University of Hong Kong, Hong Kong); Bo Li (Hong Kong University of Science and Technology, Hong Kong)

In Machine Learning (ML), the emergence of the right to be forgotten gave birth to a paradigm named machine unlearning, which enables data holders to proactively erase their data from a trained model. Existing machine unlearning techniques largely focus on centralized training, where access to all holders' training data is a must for the server to conduct the unlearning process. It remains largely underexplored about how to achieve the unlearning when full access to all training data becomes unavailable. One noteworthy example is Federated Learning (FL), where each participating data holder trains locally, without sharing their training data to the central server. In this paper, we investigate the problem of machine unlearning in FL systems. We start with a formal definition of the unlearning problem in FL and propose a rapid retraining approach to fully erase the data samples from a trained FL model. The resulting design allows data holders to jointly conduct the unlearning process efficiently while keeping their training data locally. Our formal convergence and complexity analysis demonstrate that our design can preserve model utility with high efficiency. Extensive evaluations on four real-world datasets illustrate the effectiveness and performance of our proposed realization.

Session Chair

Christopher Brinton (Purdue University)

Session B-9

Graph Machine Learning

2:30 PM — 4:00 PM EDT
May 5 Thu, 2:30 PM — 4:00 PM EDT

MalGraph: Hierarchical Graph Neural Networks for Robust Windows Malware Detection

Xiang Ling (Institute of Software, Chinese Academy of Sciences & Zhejiang University, China); Lingfei Wu (JD.COM Silicon Valley Research Center, USA); Wei Deng, Zhenqing Qu, Jiangyu Zhang and Sheng Zhang (Zhejiang University, China); Tengfei Ma (IBM T. J. Watson Research Center, USA); Bin Wang (Hangzhou Hikvision Digital Technology Co., Ltd, China); Chunming Wu (College of Computer Science, Zhejiang University, China); Shouling Ji (Zhejiang University, China & Georgia Institute of Technology, USA)

With the ever-increasing malware threats, malware detection plays an indispensable role in protecting information systems. Although tremendous research efforts have been made, there are still two key challenges hindering them from being applied to accurately and robustly detect malwares. Firstly, most of them represent executable programs with shallow features, but ignore their semantic and structural information. Secondly, these methods are based on representations that can be easily modified by attackers and thus cannot provide robustness against adversarial attacks. To tackle the aforementioned challenges, we present MalGraph, which first represents executable programs with hierarchical graphs and then uses an end-to-end learning framework based on graph neural networks for malware detection. In particular, the hierarchical graph consists of a function call graph that captures the interaction semantics among different functions (at the inter-function level) and corresponding control-flow graphs for learning the structural semantics of each function (at the intra-function level). We argue the abstraction and hierarchy nature of hierarchical graphs makes them not only easy to capture rich structural information of executables, but also be immune to adversarial attacks. Experimental results demonstrate that MalGraph not only outperforms state-of-the-art malware detection, but also exhibits stronger robustness against adversarial attacks by a large margin.

Nadege: When Graph Kernels meet Network Anomaly Detection

Hicham Lesfari (Université Côte d'Azur, France); Frederic Giroire (CNRS, France)

With the continuous growing level of dynamicity, heterogeneity and complexity of traffic data, anomaly detection remains one of the most critical tasks to ensure an efficient and flexible management of a network. Recently, driven by their empirical success in many domains, especially bioinformatics and computer vision, graph kernels have attracted increasing attention. Our work aims at investigating their discrimination power for detecting vulnerabilities and distilling traffic in the field of networking. In this paper, we propose Nadege, a new graph-based learning framework which aims at preventing anomalies from disrupting the network while providing assistance for traffic monitoring. Specifically, we design a graph kernel tailored for network profiling by leveraging propagation schemes which regularly adapt to contextual patterns. Moreover, we provide provably efficient algorithms and consider both offline and online detection policies. Finally, we demonstrate the potential of kernel-based models by conducting extensive experiments on a wide variety of network environments. Under different usage scenarios, Nadege significantly outperforms all baseline approaches.

RouteNet-Erlang: A Graph Neural Network for Network Performance Evaluation

Miquel Ferriol-Galmés (Universitat Politècnica de Catalunya, Spain); Krzysztof Rusek (AGH University of Science and Technology, Poland); Jose Suarez-Varela (Universitat Politècnica de Catalunya, Spain); Shihan Xiao and Xiang Shi (Huawei Technologies, China); Xiangle Cheng (University of Exeter, United Kingdom (Great Britain)); Bo Wu (Huawei Technologies, China); Pere Barlet-Ros and Albert Cabellos-Aparicio (Universitat Politècnica de Catalunya, Spain)

Network modeling is a fundamental tool in network research, design, and operation. Arguably the most popular method for modeling is Queuing Theory (QT). Its main limitation is that it imposes strong assumptions on the packet arrival process, which typically do not hold in real networks. In the field of Deep Learning, Graph Neural Networks (GNN) have emerged as a new technique to build data-driven models that can learn complex and non-linear behavior. In this paper, we present ErlangNet, a pioneering GNN architecture designed to model computer networks. ErlangNet supports complex traffic models, multi-queue scheduling policies, routing policies and can provide accurate estimates in networks not seen in the training phase. We benchmark ErlangNet against a state-of-the-art QT model, and our results show that it outperforms QT in all the network scenarios.

xNet: Improving Expressiveness and Granularity for Network Modeling with Graph Neural Networks

Mowei Wang, Linbo Hui and Yong Cui (Tsinghua University, China); Ru Liang (Huawei Technologies Co., Ltd., China); Zhenhua Liu (Huawei Technologies, China)

Today's network is notorious for its complexity and uncertainty. Network operators often rely on network models to achieve efficient network planning, operation, and optimization. The network model is tasked to understand the complex relationships between the network performance metrics (e.g. latency) and the network characteristics (e.g. traffic). However, we still lack a systematic approach to building accurate and lightweight network models that can model the influence of network configurations (i.e. expressiveness) and support fine-grained flow-level temporal predictions (i.e. granularity).

In this paper, we propose xNet, a data-driven network modeling framework based on graph neural networks (GNN). Unlike the previous proposals, xNet is not a dedicated network model designed for specific network scenarios with constraint considerations. On the contrary, xNet provides a general approach to model the network characteristics of concern with graph representations and configurable GNN blocks. xNet learns the state transition function between time steps and rolls it out to obtain the full fine-grained prediction trajectory. We implement and instantiate xNet with three use cases. The experiment results show that xNet can accurately predict different performance metrics while achieving up to 100x speedup compared with the conventional packet-level simulator.

Session Chair

Qinghua Li (University of Arkansas)

Made with in Toronto · Privacy Policy · INFOCOM 2020 · INFOCOM 2021 · © 2022 Duetone Corp.