Session B-7


10:00 AM — 11:30 AM EDT
May 13 Thu, 10:00 AM — 11:30 AM EDT

Robust Service Mapping in Multi-Tenant Clouds

Jingzhou Wang, Gongming Zhao and Hongli Xu (University of Science and Technology of China, China); He Huang (Soochow University, China); Luyao Luo (University of Science and Technology of China, China); Yongqiang Yang (Huawei Technologies Co., Ltd, China)

In a multi-tenant cloud, cloud vendors provide services (e.g., elastic load-balancing, virtual private networks) on service nodes for tenants. Thus, the mapping of tenants' traffic and service nodes is an important issue in multi-tenant clouds. In practice, unreliability of service nodes and uncertainty/dynamics of tenants' traffic are two critical challenges that affect the tenants' QoS. However, previous works often ignore the impact of these two challenges, leading to poor system robustness when encountering system accidents.

To bridge the gap, this paper studies the problem of robust service mapping in multi-tenant clouds (RSMP). Due to traffic dynamics, we take a two-step approach: service node assignment and tenant traffic scheduling. For service node assignment, we prove its NP-Hardness and analyze its problem difficulty. Then, we propose an efficient algorithm with bounded approximation factors based on randomized rounding and knapsack. For tenant traffic scheduling, we design an approximation algorithm based on fully polynomial time approximation scheme (FPTAS). The proposed algorithm achieves the approximation factor of 2+ɛ, where ɛ is an arbitrarily small value. Both small-scale experimental results and large-scale simulation results show the superior performance of our proposed algorithms compared with other alternatives.

Scalable On-Switch Rate Limiters for the Cloud

Yongchao He and Wenfei Wu (Tsinghua University, China); Xuemin Wen and Haifeng Li (Huawei, China); Yongqiang Yang (Huawei Technologies Co., Ltd, China)

While most clouds use on-server rate limiters for bandwidth allocation, we propose to implement them on switches. On-switch rate limiters can simplify network management and promote the performance of control-plane rate limiting applications. We leverage the recent progress of programmable switches to implement on-switch rate limiters, named SwRL. In the design of SwRL, we make design choices according to the programmable hardware characteristics, we deeply optimize the memory usage of the algorithm so as to fit a cloud-scale (one million) rate limiters in a single switch, and we complement the missing computation primitives of the hardware using a pre-computed approximate table. We further developed three control-plane applications and integrate them with SwRL, showing the control-plane interoperability of SwRL. We prototype and evaluate SwRL in both testbed and production environments, demonstrating its good properties of precision rate control, scalability, interoperability, and manageability (execution environmental isolation).

Monitoring Cloud Service Unreachability at Scale

Kapil Agrawal (Microsoft Research, India); Viral Mehta (Google, India); Sundararajan Renganathan (Stanford, USA); Sreangsu Acharyya (Microsoft Research, India); Venkata N. Padmanabhan (Microsoft Research, USA); Chakri Kotipalli (Microsoft, USA); Liting Zhao (Microsoft, China)

We consider the problem of network-unreachability in a global-scale cloud-hosted service that caters to hundreds of millions of users. Even when the service itself is up, the "last mile" between where users are, and the cloud is often the weak link that could render the service unreachable. We present NetDetector, a tool for detecting network-unreachability based on measurements from a client-based HTTP-ping service. NetDetector employs two models. The first, GA (Gaussian Alerts) models temporally averaged raw success rate of the HTTP-pings as a Gaussian distribution and flags significant dips below the mean as unreachability episodes. The second, more sophisticated approach (BB, or Beta-Binomial) models the health of network connectivity as the probability of an access request succeeding, estimates health from noisy samples, and alerts based on dips in health below a client-network-specific SLO (service-level objective) derived from data. These algorithms are enhanced by a drill-down technique that identifies a more precise scope of the unreachability event. We present promising results from GA, which has been in deployment, and the experimental BB detector over a 4-month period. For instance, GA flags 49 country-level unreachability incidents, of which 42 were labelled true positives based on investigation by on-call engineers (OCEs).

Near Optimal and Dynamic Mechanisms Towards a Stable NFV Market in Multi-Tier Cloud Networks

Zichuan Xu and Haozhe Ren (Dalian University of Technology, China); Weifa Liang (The Australian National University, Australia); Qiufen Xia (Dalian University of Technology, China); Wanlei Zhou (University of Technology Sydney, Australia); Guowei WU (Dalian University of Technology, China); Pan Zhou (Huazhong University of Science and Technology, China)

With the fast development of next-generation networking techniques, a Network Function Virtualization (NFV) market is emerging as a major market that allows network service providers to trade various network services among consumers. Therefore, efficient mechanisms that guarantee stable and efficient operations of the NFV market are urgently needed. One fundamental problem in the NFV market is how to maximize the social welfare of all players, so they have incentives to participate in activities of the market. In this paper, we first formulate the social welfare maximization problem, with an aim to maximize the total revenue of all players in the NFV market. For the social welfare maximization problem, we design an efficient incentive-compatible mechanism and analyze the existence of a Nash equilibrium of the mechanism. We also consider an online social welfare maximization problem without the knowledge of future request arrivals. We devise an online learning algorithm based on Multi-Armed Bandits (MAB) to allow both customers and network service providers to make decisions with uncertainty of customers' strategy. We evaluate the performance of the proposed mechanisms by both simulations and test-bed implementations, and the results show that the proposed mechanisms obtain at most 23% higher social welfare than existing studies.

Session Chair

Ruidong Li (Kanazawa University, Japan)

Session B-8


12:00 PM — 1:30 PM EDT
May 13 Thu, 12:00 PM — 1:30 PM EDT

Blind Optimal User Association in Small-Cell Networks

Livia E. Chatzieleftheriou (Athens University of Economics and Business, Greece); Apostolos Destounis (Huawei Technologies France Research Center, France); Georgios Paschos (Amazon, Luxembourg); Iordanis Koutsopoulos (Athens University of Economics and Business, Greece)

We learn optimal user association policies for traffic from different locations to Access Points(APs), in the presence of unknown dynamic traffic demand. We aim at minimizing a broad family of α-fair cost functions that express various objectives in load assignment in the wireless downlink, such as total load or total delay minimization. Finding an optimal user association policy in dynamic environments is challenging because traffic demand fluctuations over time are non-stationary and difficult to characterize statistically, which obstructs the computation of cost-efficient associations. Assuming arbitrary traffic patterns over time, we formulate the problem of online learning of optimal user association policies using the Online Convex Optimization (OCO) framework. We introduce a periodic benchmark for OCO problems that generalizes state-of-the-art benchmarks. We exploit inherent properties of the online user association problem and propose PerOnE, a simple online learning scheme that dynamically adapts the association policy to arbitrary traffic demand variations. We compare PerOnE against our periodic benchmark and prove that it enjoys the no-regret property, with additional sublinear dependence of the network size. To the best of our knowledge, this is the first work that introduces a periodic benchmark for OCO problems and a no-regret algorithm for the online user association problem. Our theoretical findings are validated through results on a real-trace dataset.

Dynamically Choosing the Candidate Algorithm with Ostasos in Online Optimization

Weirong Chen, Jiaqi Zheng and Haoyu Yu (Nanjing University, China)

The increasing challenge in designing online algorithms lies in the distribution uncertainty. To cope with the distribution variations in online optimization, an intuitive idea is to reselect an algorithm from the candidate set that will be more suitable to future distributions. In this paper, we propose Ostasos,
an automatic algorithm selection framework that can choose the most suitable algorithm on the fly with provable guarantees. Rigorous theoretical analysis demonstrates that the performance of Ostasos is no worse than that of any candidate algorithms in terms of competitive ratio. Finally, we apply Ostasos to the online car-hailing problem and trace-driven experiments verify the effectiveness of Ostasos.

Taming Time-Varying Information Asymmetry in Fresh Status Acquisition

Zhiyuan Wang (The Chinese University of Hong Kong, Hong Kong); Lin Gao (Harbin Institute of Technology (Shenzhen), China); Jianwei Huang (The Chinese University of Hong Kong, Shenzhen, China)

Many online platforms are providing valuable real-time contents (e.g., traffic) by continuously acquiring the status of different Points of Interest (PoIs). In status acquisition, it is challenging to determine how frequently a PoI should upload its status to a platform, since they are self-interested with private and possibly time-varying preferences. This paper considers a general multi-period status acquisition system, aiming to maximize the aggregate social welfare and ensure the platform freshness. The freshness is measured by a metric termed age of information. For this goal, we devise a long-term decomposition (LtD) mechanism to resolve the time-varying information asymmetry. The key idea is to construct a virtual social welfare that only depends on the current private information, and then decompose the per-period operation into multiple distributed bidding problems for the PoIs and platforms. The LtD mechanism enables the platforms to achieve a tunable trade-off between payoff maximization and freshness conditions. Moreover, the LtD mechanism retains the same social performance compared to the benchmark with symmetric information and asymptotically ensures the platform freshness conditions. Numerical results based on real-world data show that when the platforms pay more attention to payoff maximization, each PoI still obtains a non-negative payoff in the long-term.

ToP: Time-dependent Zone-enhanced Points-of-interest Embedding-based Explainable Recommender system

En Wang, Yuanbo Xu, Yongjian Yang, Fukang Yang, Chunyu Liu and Yiheng Jiang (Jilin University, China)

Points-of-interest (POIs) recommendation plays a vital role by introducing unexplored POIs to consumers and has drawn extensive attention from both academia and industry. Existing POI recommender systems usually learn latent vectors to represent both consumers and POIs from historical check-ins and make recommendations under the spatio-temporal constraints. However, we argue that the existing works still suffer from the challenges of explaining consumers complicated check-in actions. In this paper, we first explore the interpretability of recommendations from the POI aspect, i.e., for a specific POI, its function usually changes over time, so representing a POI with a single fixed latent vector is not sufficient to describe POIs dynamic function. Besides, check-in actions to a POI is also affected by the zone it belongs to. In other words, the zone's embedding learned from POI distributions, road segments, and historical check-ins could be jointly utilized to enhance the accuracy of POI recommendations. Along this line, we propose a Time-dependent Zone-enhanced POI embedding model (ToP), a recommender system that integrates knowledge graph and topic model to introduce the spatio-temporal effects into POI embeddings for strengthening interpretability of recommendation. Specifically, ToP learns multiple latent vectors for a POI in different time to capture its dynamic functions. Jointly combining these vectors with zones representations, ToP enhances the spatio-temporal interpretability of POI recommendations. With this hybrid architecture, some existing POI recommender systems can be treated as special cases of ToP. Extensive experiments on real-world Changchun city datasets demonstrate that ToP not only achieves state-of-the-art performance in terms of common metrics, but also provides more insights for consumers POI check-in actions.

Session Chair

Bin Li (U. Rhode Island)

Session B-9

Packets and Flows

2:30 PM — 4:00 PM EDT
May 13 Thu, 2:30 PM — 4:00 PM EDT

ECLAT: An ECN Marking System for Latency Guarantee in Cellular Networks

Junseon Kim (Ulsan National Institute of Science and Technology (UNIST), Korea (South)); Youngbin Im (Ulsan National Institute of Science and Technology, Korea (South)); Kyunghan Lee (Seoul National University, Korea (South))

As the importance of latency performance increases, a number of multi-bit feedback-based congestion control mechanisms have been proposed for explicit latency control in cellular networks. However, due to their reactive nature and limited access to the network queue, while latency reduction was possible, latency guarantee has not been achieved. Also, due to the need for end-host modifications, it was hard to commonly provide latency benefit to all connected devices. To this end, we propose a novel network-assisted congestion control, ECLAT, which can always bound the queuing delay within a delay-budget through ECN-based single-bit feedback while maintaining high link utilization for any device. To do so, ECLAT 1) calculates its target operating point for each flow, which is related to the maximum allowable cwnd to meet the delay-budget under time-varying cellular networks, and 2) determines its single-bit feedback policy to limit cwnd within the target operating point. Our extensive experiments in our testbed demonstrate that ECLAT is able to bound the queuing delays of multiple flows within their delay-budget and achieve high utilization even in the dynamic cellular network environment.

PCL: Packet Classification with Limited knowledge

Vitalii Demianiuk (Ariel University, Israel); Chen Hajaj (Ariel University & Data Science and Artificial Intelligence Research Center, Israel); Kirill Kogan (Ariel University, Israel)

We introduce a novel representation of packet classifiers allowing to operate on partially available input data varying dynamically. For a given packet classifier, availability of fields or complexity of field computations, and free target specific resources, the proposed infrastructure computes a classifier representation satisfying performance and robustness requirements. We show the feasibility to reconstruct a classification result in this noisy environment, allowing for the improvement of performance and the achievement of additional robustness levels of network infrastructure. Our results are supported by extensive evaluations in various settings where only a partial input is available.

Towards the Fairness of Traffic Policer

Danfeng Shan and Peng Zhang (Xi'an Jiaotong University, China); Wanchun Jiang (Central South University, China); Hao Li (Xi'an Jiaotong University, China); Fengyuan Ren (Tsinghua University, China)

Traffic policing is widely used by ISPs to limit their customers' traffic rates. It has long been believed that a well-tuned traffic policer offers a satisfactory performance for TCP. However, we find this belief breaks with the emergence of new congestion control (CC) algorithms like BBR: flows using these new CC algorithms can easily occupy the majority of the bandwidth, starving traditional TCP flows. We confirm this problem with experiments and reveal its root cause as follows. Without buffer in traffic policers, congestion only causes packet losses, while new CC algorithms are loss-resilient, i.e. they adjust the sending rate based on other network feedback like delay. Thus, when being policed they will not reduce the sending rate until an unacceptable loss ratio for TCP is reached, resulting in low throughput for TCP. Simply adding buffer to the traffic policer improves fairness but incurs high latency. To this end, we propose FairPolicer, which can achieve fair bandwidth allocation without sacrificing latency. FairPolicer regards token as a basic unit of bandwidth and fairly allocates tokens to active flows in a round-robin manner. Testbed experiments show that FairPolicer can significantly improve the fairness and achieve much lower latency than other kinds of rate-limiters.

Jellyfish: Locality-sensitive Subflow Sketching

Yongquan Fu (National University of Defense Technology, China); Lun An (Beijing University of Posts and Telecommunications, China); Siqi Shen (Xiamen University, China); Kai Chen (Hong Kong University of Science and Technology, China); Pere Barlet-Ros (Universitat Politècnica de Catalunya, Spain)

To cope with increasing network rates and massive traffic volumes, sketch-based methods have been extensively studied to trade accuracy for memory scalability and storage cost. However, sketches are sensitive to hash collisions due to skewed keys in real world environment, and need complicated performance control for line-rate packet streams. We present Jellyfish, a locality-sensitive sketching framework to address these issues. Jellyfish goes beyond network flow-based sketching towards fragments of network flows called subflows. First, Jellyfish splits consecutive packets from each network-flow to subflow records, which not only reduces the rate contention but also provides intermediate subflow representations in form of truncated counters. Next, Jellyfish maps similar subflow records to the same bucket array and merges those from the same network-flow to reconstruct the network-flow level counters. Real-world trace-driven experiments show that Jellyfish reduces the average estimation errors by up to six orders of magnitude for per-flow queries, by six orders of magnitude for entropy queries, and up to ten times for heavy-hitter queries.

Session Chair

Shuochao Yao (George Mason University, USA)

Session B-10

Programmable Switches

4:30 PM — 6:00 PM EDT
May 13 Thu, 4:30 PM — 6:00 PM EDT

Programmable Switches for in-Networking Classification

Bruno Missi Xavier and Rafael Silva Guimaraes (Federal Institute of Espirito Santo - Campus Cachoeiro de Itapemirim, Brazil); Giovanni Comarela (Universidade Federal do Espírito Santo, Brazil); Magnos Martinello (Federal University of Espirito Santo, Brazil)

Deploying accurate machine learning algorithms into a high-throughput networking environment is a challenging task. On the one hand, machine learning has proved itself useful for traffic classification in many contexts (e.g., intrusion detection, application classification, and early heavy hitter identification). On the other hand, most of the work in the area is related to post-processing (i.e., training and testing are performed offline on previously collected samples) or to scenarios where the traffic has to leave the data plane to be classified (i.e., high latency). In this work, we tackle the problem of creating simple and reasonably accurate machine learning models that can be deployed into the data plane in a way that performance degradation is acceptable. To that purpose, we introduce a framework and discuss issues related to the translation of simple models, for handling individual packets or flows, into the P4 language. We validate our framework with an intrusion detection use case and by deploying a single decision tree into a Netronome SmartNIC (Agilio CX 2x10GbE). Our results show that high-accuracy is achievable (above 95%) with minor performance degradation, even for a large number of flows.

Fix with P6: Verifying Programmable Switches at Runtime

Apoorv Shukla (Huawei Munich Research Center, Germany); Kevin Hudemann (SAP, Germany); Zsolt Vági (SWISSCOM, Switzerland); Lily Hügerich (TU Berlin, Germany); Georgios Smaragdakis (TU Berlin and Max Planck Institute for Informatics, Germany); Artur Hecker (Huawei, Germany); Stefan Schmid (University of Vienna, Austria); Anja Feldmann (Max Planck Institute for Informatics & Saarland Informatics Campus / TU Berlin, Germany)

We design, develop, and evaluate P6, an automated approach to (a) detect, (b) localize, and (c) patch software bugs in P4 programs. Bugs are reported via a violation of pre-specified expected behavior that is captured by P6. P6 is based on machine learning-guided fuzzing that tests P4 switch non-intrusively, i.e., without modifying the P4 program for detecting runtime bugs. This enables an automated and real-time localization and patching of bugs. We used a P6 prototype to detect and patch existing bugs in various publicly available P4 application programs deployed on two different switch platforms: behavioral model (bmv2) and Tofino. Our evaluation shows that P6 significantly outperforms bug detection baselines while generating fewer packets and patches bugs in large P4 programs such as switch.p4 without triggering any regressions.

Making Multi-String Pattern Matching Scalable and Cost-Efficient with Programmable Switching ASICs

Shicheng Wang, Menghao Zhang, Guanyu Li, Chang Liu and Ying Liu (Tsinghua University, China); Xuya Jia (Huawei Technologies Co. Ltd., China); Mingwei Xu (Tsinghua University, China)

Multi-string pattern matching is a crucial building block for many network security applications, and thus of great importance. Since every byte of a packet has to be inspected by a large set of patterns, it often becomes a bottleneck of these applications and dominates the performance of an entire system. Many existing works have been devoted to alleviate this performance bottleneck either by algorithm optimization or hardware acceleration. However, neither one provides the desired scalability and costs that keep pace with the dramatic increase of the network bandwidth and network traffic today. In this paper, we present BOLT, a scalable and cost-efficient multi-string pattern matching system leveraging the capability of emerging programmable switches. BOLT combines the following two techniques, a smart state encoding scheme to fit a large number of strings into the limited memory on the programmable switch, and a variable k-stride transition mechanism to increase the throughput significantly with the same level of memory costs. We implement a prototype of BOLT and make its source code publicly available. Extensive evaluations demonstrate that BOLT could provide orders of magnitude improvement in throughput which is scalable with pattern sets and workloads, and could also significantly decrease the number of entries and memory requirement.

Traffic-aware Buffer Management in Shared Memory Switches

Sijiang Huang, Mowei Wang and Yong Cui (Tsinghua University, China)

Switch buffer serves an important role in modern internet. To achieve efficiency, today's switches often use on-chip shared memory. Shared memory switches rely on buffer management policies to allocate buffer among ports. To avoid waste of buffer resources or a few ports occupy too much buffer, existing policies tend to maximize overall buffer utilization and pursue queue length fairness. However, blind pursuit of utilization and misleading fairness definition based on queue length leads to buffer occupation with no benefit to throughput but extends queuing delay and undermines burst absorption of other ports. We contend that a buffer management policy should proactively detect port traffic and adjust buffer allocation accordingly. In this paper, we propose Traffic-aware Dynamic Threshold (TDT) policy. On the basis of classic dynamic threshold policy, TDT proactively raise or lower port threshold to absorb burst traffic or evacuate meaningless buffer occupation. We present detailed designs of port control state transition and state decision module that detect real time traffic and change port thresholds accordingly. Simulation and DPDK-based real testbed demonstrate that TDT simultaneously optimizes for throughput, loss and delay, and reduces up to 50% flow completion time.

Session Chair

Qiao Xiang (Xiamen University, China)

Made with in Toronto · Privacy Policy · INFOCOM 2020 · © 2021 Duetone Corp.