Session G-7

Data and Datacenters

10:00 AM — 11:30 AM EDT
May 5 Thu, 10:00 AM — 11:30 AM EDT

Constrained In-network Computing with Low Congestion in Datacenter Networks

Raz Segal, Chen Avin and Gabriel Scalosub (Ben-Gurion University of the Negev, Israel)

Distributed computing has become a common practice nowadays, where recent focus has been given to the usage of smart networking devices with in-network computing capabilities. State-of-the-art switches with near-line rate computing and aggregation capabilities enable acceleration and improved performance for various modern applications like big data analytics and large-scale distributed and federated machine learning.

In this paper, we formulate and study the theoretical algorithmic foundations of such approaches, and focus on how to deploy and use constrained in-network computing capabilities within the data center. We focus our attention on reducing the network congestion, i.e., the most congested link in the network, while supporting the given workload(s). We present an efficient optimal algorithm for tree-like network topologies and show that our solution provides as much as an x13 improvement over common alternative approaches. In particular, our results show that having merely a small fraction of network devices that support in-network aggregation can significantly reduce the network congestion, both for single and multiple workloads.

Fast and Heavy Disjoint Weighted Matchings for Demand-Aware Datacenter Topologies

Kathrin Hanauer, Monika Henzinger, Stefan Schmid and Jonathan Trummer (University of Vienna, Austria)

Reconfigurable optical topologies promise to improve the performance in datacenters by dynamically optimizing the physical network in a demand-aware manner. State-of-the-art optical technologies allow to establish and update direct connectivity (in the form of edge-disjoint matchings) between top-of-rack switches within microseconds or less. However, to fully exploit temporal structure in the demand, such fine-grained reconfigurations also require fast algorithms for optimizing the interconnecting matchings.
Motivated by the desire to offload a maximum amount of demand to the reconfigurable network, this paper initiates the study of fast algorithms to find k disjoint heavy matchings in graphs. We present and analyze six algorithms, based on iterative matchings, b-matching, edge coloring, and node-rankings. We show that the problem is generally NP-hard and study the achievable approximation ratios.
An extensive empirical evaluation of our algorithms on both real-world and synthetic traces (88 in total), including traces collected in Facebook datacenters and in HPC clusters reveals that all our algorithms provide high-quality matchings, and also very fast ones come within 95% or more of the best solution. However, the running times differ significantly and what is the best algorithm depends on k and the acceptable runtime-quality tradeoff.

Jingwei: An Efficient and Adaptable Data Migration Strategy for Deduplicated Storage Systems

Geyao Cheng, Deke Guo, Lailong Luo, Junxu Xia and Yuchen Sun (National University of Defense Technology, China)

The traditional migration methods are confronted with formidable challenges when data deduplication technologies are incorporated. Firstly, the deduplication creates data-sharing dependencies in the stored files; breaking such dependencies in migration would attach extra space overhead. Secondly, the redundancy elimination heightens the risk of data unavailability during server crushes. The existing methods fail to tackle them at one shot. To this end, we propose Jingwei, an efficient and adaptable data migration strategy for deduplicated storage systems. To be specific, Jingwei tries to minimize the extra space cost in migration for space efficiency. Meanwhile, Jingwei realizes the service adaptability by encouraging replicas of hot data to spread out their data access requirements. We first model such a problem as an integer linear programming (ILP) and solve it with a commercial solver when only one empty migration target server is allowed. We then extend this problem to a scenario wherein multiple non-empty target servers are available for migration. We solve it by an effective heuristic algorithm based on the Bloom Filter-based data sketches. Trace-driven experiments show that Jingwei fortifies the file replicas by 66.7%, while only 5.7% of the extra storage space is occupied compared with the latest "Goseed" method.

Optimal Data Placement for Stripe Merging in Locally Repairable Codes

Si Wu and Qingpeng Du (University of Science and Technology of China, China); Patrick Pak-Ching Lee (The Chinese University of Hong Kong, Hong Kong); Yongkun Li and Yinlong Xu (University of Science and Technology of China, China)

Erasure coding is a storage-efficient redundancy scheme for modern clustered storage systems by storing stripes of data and parity blocks across the nodes of multiple clusters; in particular, locally repairable codes (LRC) continue to be one popular family of practical erasure codes that achieve high repair efficiency. To efficiently adapt to the dynamic requirements of access efficiency and reliability, storage systems often perform redundancy transitioning by tuning erasure coding parameters. In this paper, we apply a stripe merging approach for redundancy transitioning of LRC in clustered storage systems, by merging multiple LRC stripes to form a large LRC stripe with low storage redundancy. We show that the random placement of multiple LRC stripes that are being merged can lead to high cross-cluster transitioning bandwidth. To this end, we design an optimal data placement scheme that provably minimizes the cross-cluster traffic for stripe merging, by judiciously placing the blocks to be merged in the same cluster while maintaining the repair efficiency of LRC. We prototype and implement our optimal data placement scheme on a local cluster. Our evaluation shows that it significantly reduces the transitioning time by up to 43.2% compared to the baseline.

Session Chair

Qiao Xiang (Xiamen University)

Session G-8

Networks Protocols 1

12:00 PM — 1:30 PM EDT
May 5 Thu, 12:00 PM — 1:30 PM EDT

Add/Drop Flexibility and System Complexity Tradeoff in ROADM Designs

Lexin Pan (Shanghai Jiao Tong University, China); Tong Ye (Shanghai JiaoTong University, China)

As a key component in dynamic wavelength-routing optical networks (WRON), the contention performance at add/drop (A/D) ports of ROADMs has attracted a lot of attention in recent years. For the first time, this paper derives the close-form solutions of the blocking probability (BP) of A/D requests to characterize the fundamental tradeoff between A/D flexibility and system complexity in the ROADM. We show that the ROADM with fiber cross-connect (FXC) can decide whether a request can be satisfied based on global transceiver-usage information, and thus an exact expression of BP can be obtained. In comparison, the ROADM without FXC needs detail transceiver-usage information to make decision, and thus is hard to be analyzed. To circumvent the difficulty in analysis, we describe the evolution of the number of busy transceivers using a one-dimensional Markov chain, where the state transition rates are estimated from global transceiver-usage information. Based on this model, we obtain an approximate BP the accuracy of which is high and increases with the number of drop ports. Our analytical results provide an interesting insight into the tradeoff between A/D flexibility and system complexity, based on which we give some suggestions for system design.

Detecting and Resolving PFC Deadlocks with ITSY Entirely in the Data Plane

Xinyu Crystal Wu and T. S. Eugene Ng (Rice University, USA)

The Priority-based Flow Control (PFC) protocol is adopted to guarantee zero packet loss in many high-performance data centers. PFC, however, can induce deadlocks and in severe cases cause the entire network to be blocked. Existing solutions have focused on deadlock avoidance; unfortunately, they are not foolproof. Therefore, deadlock detection is a necessity. We propose ITSY, a novel system that correctly detects and resolves deadlocks entirely in the data plane. It works with any network topologies and routing algorithms. Unique to ITSY is the use of deadlock initial triggers, which contributes to efficient deadlock detection, mitigation, and recurrence prevention. ITSY provides three deadlock resolution mechanisms with different trade-off options. We implement ITSY for programmable switches in the P4 language. Experiments show that ITSY detects and resolves deadlocks rapidly with minimal overheads.

Mousika: Enable General In-Network Intelligence in Programmable Switches by Knowledge Distillation

Guorui Xie (Tsinghua University, China); Qing Li (Peng Cheng Laboratory, China); Yutao Dong and Guanglin Duan (Tsinghua University, China); Yong Jiang (Graduate School at Shenzhen, Tsinghua University, China); Jingpu Duan (Southern University of Science and Technology, China)

Given the power efficiency and Tbps throughput of packet processing, several works are proposed to offload the decision tree (DT) to programmable switches, i.e., in-network intelligence. Though the DT is suitable for the switches' match-action paradigm, it has several limitations. E.g., its range match rules may not be supported well due to the hardware diversity; and its implementation also consumes lots of switch resources (e.g., stages and memory). Moreover, as learning algorithms (particularly deep learning) have shown their superior performance, some more complicated learning models are emerging for networking. However, their high computational complexity and large storage requirement are cause challenges in the deployment on switches. Therefore, we propose Mousika, an in-network intelligence framework that addresses these drawbacks successfully. First, we modify the DT to the Binary Decision Tree (BDT). Compared with the DT, our BDT supports faster training, generates fewer rules, and satisfies switch constraints better. Second, we introduce the teacher-student knowledge distillation in Mousika, which enables the general translation from other learning models to the BDT. Through the translation, we can not only utilize the super learning capabilities of complicated models, but also avoid the computation/memory constraints when deploying them on switches directly for line-speed processing.

Persistent Items Tracking in Large Data Streams Based on Adaptive Sampling

Lin Chen (Sun Yat-sen University, China); Raphael C.-W. Phan (Monash University, Malaysia); Zhili Chen (East China Normal University, China); Dan Huang (University of Central Florida, USA)

We address the problem of persistent item tracking in large-scale data streams. A persistent item refers to the one that persists to occur in the stream over a long timespan. Tracking persistent items is an important and pivotal functionality for many networking and computing applications as persistent items, though not necessarily contributing significantly to the data volume, may convey valuable information on the data pattern about the stream. The state-of-the-art solutions of tracking persistent items require to know the monitoring time horizon to set the sampling rate. This limitation is further accentuated when we need to track the persistent items in recent w slots where w can be any value between 0and T to support different monitoring granularity.
Motivated by this limitation, we develop a persistent item tracking algorithm that can function without knowing the monitoring time horizon beforehand, and can thus track persistent items up to the current time t or within a certain time window at any moment. Our central technicality is adaptively reducing the sampling rate such that the total memory overhead can be limited while still meeting the target tracking accuracy. Through both theoretical and empirical analysis, we fully characterize the performance of our proposition.

Session Chair

Damla Turgut (University of Central Florida)

Session G-9

Networks Protocols 2

2:30 PM — 4:00 PM EDT
May 5 Thu, 2:30 PM — 4:00 PM EDT

AoDNN: An Auto-Offloading Approach to Optimize Deep Inference for Fostering Mobile Web

Yakun Huang and Xiuquan Qiao (Beijing University of Posts and Telecommunications, China); Schahram Dustdar (Vienna University of Technology, Austria); Yan Li (Shanxi Transportation Planning Survey and Design Institute, China)

Employing today's deep neural network (DNN) into the cross-platform web with an offloading way has been a promising means to alleviate the tension between intensive inference and limited computing resources. However, it is still challenging to directly leverage the distributed DNN execution into web apps with the following limitations, including (1) how special computing tasks such as DNN inference can provide fine-grained and efficient offloading in the inefficient JavaScript-based environment? (2) lacking the ability to balance the latency and mobile energy to partition the inference facing various web applications' requirements. (3) and ignoring that DNN inference is vulnerable to the operating environment and mobile devices' computing capability, especially dedicated web apps. This paper designs AoDNN, an automatic offloading framework to orchestrate the DNN inference across the mobile web and the edge server, with three main contributions. First, we design the DNN offloading based on providing a snapshot mechanism and use multi-threads to monitor dynamic contexts, partition decision, trigger offloading, etc. Second, we provide a learning-based latency and mobile energy prediction framework for supporting various web browsers and platforms. Third, we establish a multi-objective optimization to solve the optimal partition by balancing the latency and mobile energy.

Muses: Enabling Lightweight Learning-Based Congestion Control for Mobile Devices

Zhiren Zhong (University of Chinese Academy of Sciences, China & Huawei, China); Wei Wang and Yiyang Shao (Huawei, China); Zhenyu Li, Heng Pan and Hongtao Guan (Institute of Computing Technology, Chinese Academy of Sciences, China); Gareth Tyson (Queen Mary, University of London, United Kingdom (Great Britain)); Gaogang Xie (CNIC Chinese Academy of Sciences & University of Chinese Academy of Sciences, China); Kai Zheng (Huawei Technologies, China)

Various congestion control (CC) algorithms have been designed to target specific scenarios. To automate this process, researchers have begun to use machine learning to automatically control the congestion window. These, however, often rely on heavyweight learning models (e.g., neural networks). This can make them unsuitable for resource-constrained mobile devices. On the other hand, lightweight models (e.g., decision trees) are often incapable of reflecting the complexity of diverse mobile wireless environments. To address this, we present Muses, a learning-based approach for generating lightweight congestion control algorithms. Muses relies on imitation learning to train a universal (heavy) LSTM model, which is then used to extract (lightweight) decision tree models that are each targeted at an individual environment. Muses then dynamically selects the most appropriate decision tree on a per-flow basis. We show that Muses can generate high throughput policies across a diverse set of environments, and it is sufficiently light to operate on mobile devices.

NMMF-Stream: A Fast and Accurate Stream-Processing Scheme for Network Monitoring Data Recovery

Kun Xie and Ruotian Xie (Hunan University, China); Xin Wang (Stony Brook University, USA); Gaogang Xie (CNIC Chinese Academy of Sciences & University of Chinese Academy of Sciences, China); Dafang Zhang (Hunan University, China); Jigang Wen (Chinese Academy of Science & Institute of Computing Technology, China)

Recovery of missing network monitoring data is of great significance for network operation and maintenance tasks such as anomaly detection and traffic prediction. To exploit historical data for more accurate missing data recovery, some recent studies combine the data together as a tensor to learn more features. However, the need of performing high cost data decomposition compromises their speed and accuracy, which makes them difficult to track dynamic features from streaming monitoring data. To ensure fast and accurate recovery of network monitoring data, this paper proposes NMMF-Stream, a stream-processing scheme with a context extraction module and a generation module. To achieve fast feature extraction and missing data filling with a low sampling rate, we propose several novel techniques, including the context extraction based on both positive and negative monitoring data, context validation via measuring the Pointwise Mutual Information, GRU-based temporal feature learning and memorization, and a new composite loss function to guide the fast and accurate data filling. We have done extensive experiments using two real network traffic monitoring data sets and one network latency data set. The experimental results demonstrate that, compared with three baselines, NMMF-Stream can fill the newly arrived monitoring data very quickly with much higher accuracy.

PACC: Proactive and Accurate Congestion Feedback for RDMA Congestion Control

Xiaolong Zhong and Jiao Zhang (Beijing University of Posts and Telecommunications, China); Yali Zhang and Zixuan Guan (Huawei, China); Zirui Wan (Beijing University of Posts and Telecommunications, China)

The rapid upgrade of link speed and the prosperity of new applications in data centers lead to a rigorous demand for ultra-low latency and high throughput in end-to-end communication. Consequently, RDMA is adopted to mitigate the overhead caused by traditional software-based packet processing at end hosts, and congestion control mechanisms designed for RDMA have attracted much attention to avoid performance deterioration when packets lose. However, through comprehensive analysis, we found that existing RDMA congestion control schemes have limitations of a sluggish response to congestion and unawareness of tiny microbursts due to the long end-to-end control loop. In this paper, we propose PACC, a switch-driven RDMA congestion control algorithm with easy deployability. Benefited from PI controller-based computation, threshold-based flow discrimination and weight-based allocation at the switch, PACC leverages real-time queue length to generate accurate congestion feedback proactively and piggybacks it to the corresponding source without modification to end hosts. We theoretically analyze the stability and key parameter settings of PACC. Then, we conduct both micro-benchmark and large-scale simulations to evaluate the performance of PACC. The results show that PACC achieves fairness, fast reaction, high throughput, and 6∼69% lower FCT (Flow Completion Time) than DCQCN, TIMELY and HPCC.

Session Chair

Aaron D Striegel (University of Notre Dame)

Made with in Toronto · Privacy Policy · INFOCOM 2020 · INFOCOM 2021 · © 2022 Duetone Corp.