Session E-1

## Video Streaming 1

Conference
11:00 AM — 12:30 PM EDT
Local
May 17 Wed, 11:00 AM — 12:30 PM EDT
Location
Babbio 219

### Buffer Awareness Neural Adaptive Video Streaming for Avoiding Extra Buffer Consumption

Tianchi Huang (Tsinghua University, China); Chao Zhou (Beijing Kuaishou Technology Co., Ltd, China); Rui-Xiao Zhang, Chenglei Wu and Lifeng Sun (Tsinghua University, China)

0
Adaptive video streaming has already been a major scheme to transmit videos with high quality of experience~(QoE). However, the improvement of network traffics and the high compression efficiency of videos enable clients to accumulate too much buffer, which might cause colossal data waste if users close the session early before the session ends. In this paper, we consider buffer-aware adaptive bitrate~(ABR) mechanisms to overcome the above concerns. Formulating the buffer-aware rate adaptation problem as multi-objective optimization, we propose DeepBuffer, a deep reinforcement learning-based approach that jointly takes proper bitrate and controls the maximum buffer. To deal with the challenges of learning-based buffer-aware ABR composition, such as infinite possible plans, multiple bitrate levels, and complexity action space, we design adequate preference-driven inputs, separate action outputs, and invent high sample-efficiency training methodologies. We train DeepBuffer with a broad set of real-world network traces and provide a comprehensive evaluation in terms of various network scenarios and different video types. Experimental results indicate that DeepBuffer rivals or outperforms recent heuristics and learning-based ABR schemes in terms of QoE while heavily reducing the average buffer consumption by up to 90\%. Extensive real-world experiments further demonstrate the substantial superiority of DeepBuffer.
##### Speaker
Speaker biography is not available.

### From Ember to Blaze: Swift Interactive Video Adaptation via Meta-Reinforcement Learning

Xuedou Xiao, Mingxuan Yan and Yingying Zuo (Huazhong University of Science and Technology, China); Boxi Liu and Paul Ruan (Tencent Technology Co. Ltd, China); Yang Cao and Wei Wang (Huazhong University of Science and Technology, China)

0
Maximizing quality of experience (QoE) for interactive video streaming has been a long-standing challenge, as its delay-sensitive nature makes it more vulnerable to bandwidth fluctuations. While reinforcement learning (RL) has demonstrated great potential in optimizing video streaming, recent advances are either limited by fixed models or require enormous data/time for online adaptation, which struggle to fit time-varying and diverse network states. Driven by these practical concerns, we perform large-scale measurements on WeChat for Business's interactive video service to study real-world network fluctuations. Surprisingly, our analysis shows that, compared to time-varying network metrics, network sequences exhibit noticeable short-term continuity, sufficient for few-shot learning requirement. We thus propose Fiammetta, the first meta-RL-based bitrate adaptation algorithm for interactive video streaming. Building on the short-term continuity, Fiammetta accumulates learning experiences through meta-training and enables fast online adaptation to changing network states through few gradient updates. Moreover, Fiammetta innovatively incorporates probing mechanism for real-time monitoring of network states, and proposes an adaptive meta-testing mechanism for seamless adaptation. We implement Fiammetta on a testbed whose end-to-end network follows the real-world WeChat for Business traces. The results show that Fiammetta outperforms prior algorithms significantly, improving video bitrate by 3.6%-16.2% without increasing stalling rate.
##### Speaker
Speaker biography is not available.

Lianchen Jia (Tsinghua University, China); Chao Zhou (Beijing Kuaishou Technology Co., Ltd, China); Tianchi Huang, Chaoyang Li and Lifeng Sun (Tsinghua University, China)

0
With the rapid development of the streaming system, a large number of videos need to transcode to multiple copies according to the encoding ladder, which significantly increases the storage overhead than before. This scenario presents new challenges in achieving the balance between better quality for users and less storage cost. In our work, we observe two significant points. The first one is that selecting proper resolutions under certain network conditions can reduce storage costs while maintaining a great quality of experience. The second one is that segment duration is critical, especially in VBR-encoded videos. Considering these points, we propose RDladder, a resolution-duration ladder for VBR-encoded videos via imitation learning. We jointly optimize resolution and duration using neural networks to determine the combination of these two metrics considering network capacity, video information, and storage cost. To get more faithful results, we use over 500 videos, encoded to over 2,000,000 chunks, and collect rear-world network traces for more than 50 hours. We test RDladder in simulation, emulation, and real-world environments under various network conditions, and our method can achieve near-optimal performance. Furthermore, we discuss the influence between the RDladder and the ABR algorithms and summarize some characteristics of the RDladder.
##### Speaker
Speaker biography is not available.

### Energy-Efficient 360-Degree Video Streaming on Multicore-Based Mobile Devices

Xianda Chen and Guohong Cao (The Pennsylvania State University, USA)

0
Streaming (downloading and processing) 360-degree video consumes a large amount of energy on mobile devices, but little work has been done to address this problem, especially considering recent advances in the mobile architecture. Through real measurements, we found that existing systems activate all processor cores during video streaming, which causes high energy consumption, but this is unnecessary since most heavy computations in 360-degree video processing are handled by the hardware accelerators such as hardware decoder, GPU, etc. To address this problem, we propose to save energy by selectively activating the proper processor cluster and adaptively adjusting the CPU frequency based on the video quality. We model the impact of video resolution and CPU frequency on power consumption, and model the impact of video features and network effects on Quality of Experience (QoE). Based on the QoE model and the power model, we formulate the energy and QoE aware 360-degree video streaming problem as an optimization problem. We first present an optimal algorithm which can maximize QoE and minimize energy. Since the optimal algorithm requires future knowledge, we then propose a heuristic based algorithm. Evaluation results show that our heuristic based algorithm can significantly reduce the energy consumption while maintaining QoE.
##### Speaker
Speaker biography is not available.

Session E-2

## Video Streaming 2

Conference
2:00 PM — 3:30 PM EDT
Local
May 17 Wed, 2:00 PM — 3:30 PM EDT
Location
Babbio 219

### OmniSense: Towards Edge-Assisted Online Analytics for 360-Degree Videos

Miao Zhang (Simon Fraser University, Canada); Yifei Zhu (Shanghai Jiao Tong University, China); Linfeng Shen (Simon Fraser University, Canada); Fangxin Wang (The Chinese University of Hong Kong, Shenzhen, China); Jiangchuan Liu (Simon Fraser University, Canada)

0
With the reduced hardware costs of omnidirectional cameras and the proliferation of various extended reality applications, more and more $$360^\circ$$ videos are being captured. To fully unleash their potential, advanced video analytics is expected to extract actionable insights and situational knowledge without blind spots from the videos. In this paper, we present OmniSense, a novel edge-assisted framework for online immersive video analytics. OmniSense achieves both low latency and high accuracy, combating the significant computation and network resource challenges of analyzing $$360^\circ$$ videos. Motivated by our measurement insights into $$360^\circ$$ videos, OmniSense introduces a lightweight spherical region of interest (SRoI) prediction algorithm to prune redundant information in $$360^\circ$$ frames. Incorporating the video content and network dynamics, it then smartly scales vision models to analyze the predicted SRoIs with optimized resource utilization. We implement a prototype of OmniSense with commodity devices and evaluate it on diverse real-world collected $$360^\circ$$ videos. Extensive evaluation results show that compared to resource-agnostic baselines, it improves the accuracy by $$19.8\%$$ - $$114.6\%$$ with similar end-to-end latencies. Meanwhile, it hits $$2.0\times$$ - $$2.4\times$$ speedups while keeping the accuracy on a par with the highest accuracy of baselines.
##### Speaker
Speaker biography is not available.

### Meta Reinforcement Learning for Rate Adaptation

Abdelhak Bentaleb (Concordia University, Canada); May Lim (National University of Singapore, Singapore); Mehmet N Akcay and Ali C. Begen (Ozyegin University, Turkey); Roger Zimmermann (National University of Singapore, Singapore)

0
The goal of an adaptive bitrate (ABR) scheme is to enable streaming clients to adapt to time-varying network and device conditions to deliver a stall-free viewing experience. Today, most ABR schemes use manually tuned heuristics or learning-based methods. Heuristics are easy to implement but do not always perform well, whereas learning-based methods generally perform well but are difficult to deploy on low-resource devices. To make the most out of both worlds, we develop Ahaggar, a learning-based scheme running on the server side that provides quality-aware bitrate guidance to the streaming clients that run their own heuristics. The novelty behind Ahaggar is the meta reinforcement learning approach taking network conditions, clients' statuses and device resolutions, and streamed content as input features to perform bitrate guidance. Ahaggar uses the emerging CMCD/SD (Common Media Client/Server Data) protocols to exchange the necessary metadata between the servers and clients. Experiments run on a full (open-source) system show that Ahaggar adapts to unseen conditions fast and outperforms its competitors in terms of several viewer experience metrics.
##### Speaker
Speaker biography is not available.

### Cross-Camera Inference on the Constrained Edge

Jingzong Li (City University of Hong Kong, Hong Kong); Libin Liu (Zhongguancun Laboratory, China); Hong Xu (The Chinese University of Hong Kong, Hong Kong); Shudeng Wu (Tsinghua University, China); Chun Xue (City University of Hong Kong, Hong Kong)

0
The proliferation of edge devices has pushed computing from the cloud to the data sources, and video analytics is among the most promising applications of edge computing. Running video analytics is compute-intensive and latency-sensitive, as video frames are analyzed by complex deep neural networks (DNNs) which pose severe pressure on resource-constrained edge devices. To resolve the tension between inference latency and resource cost, we present Polly, a cross-camera inference system that enables co-located cameras that have different but overlapping fields of views (FoVs) to share inference results between each other, thus eliminating the redundant inference work for objects in the same physical area. Polly's design solves two basic challenges of cross-camera inference: how to identify overlapping FoVs automatically, and how to share inference results accurately across cameras. Evaluation on NVIDIA Jetson Nano with a real-world traffic surveillance dataset shows that Polly reduces the inference latency by up to 71.6% while achieving almost the same detection accuracy with state-of-the-art systems.
##### Speaker Jingzong Li (City University of Hong Kong)

Ying Chen (Duke University, USA); Hazer Inaltekin (Macquarie University, Australia); Maria Gorlatova (Duke University, USA)

0
Edge computing is increasingly proposed as a solution for reducing resource consumption of mobile devices running simultaneous localization and mapping (SLAM) algorithms, with most edge-assisted SLAM systems assuming the communication resources between the mobile device and the edge server to be unlimited, or relying on heuristics to choose the information to be transmitted to the edge. This paper presents AdaptSLAM, an edge-assisted visual (V) and visual-inertial (VI) SLAM system that adapts to the available communication and computation resources, based on a theoretically grounded method we developed to select the subset of keyframes (the representative frames) for constructing the best local and global maps in the mobile device and the edge server under resource constraints. We implemented AdaptSLAM to work with the state-of-the-art open-source V- and VI-SLAM ORB-SLAM3 framework, and demonstrated that, under constrained network bandwidth, AdaptSLAM reduces the tracking error by 62% compared to the best baseline.
##### Speaker
Speaker biography is not available.

Session E-3

## Video Streaming 3

Conference
4:00 PM — 5:30 PM EDT
Local
May 17 Wed, 4:00 PM — 5:30 PM EDT
Location
Babbio 219

### Who is the Rising Star? Demystifying the Promising Streamers in Crowdsourced Live Streaming

Rui-Xiao Zhang, Tianchi Huang, Chenglei Wu and Lifeng Sun (Tsinghua University, China)

0
Streamers are the core competency of the crowdsourced live streaming (CLS) platform. However, little work has explored how different factors relate to their popularity evolution patterns. In this paper, we will investigate a critical problem, i.e., \emph{how to discover the promising streamers in their early stage?} . We find that streamers can indeed be clustered into two evolution types (i.e., rising type and normal type), and these two types of streamers will show differences in some inherent properties. Traditional time-sequential models cannot handle this problem, because they are unable to capture the complicated interactivity and extensive heterogeneity in CLS scenarios. To address their shortcomings, we further propose Niffler, a novel heterogeneous attention temporal graph framework (HATG) for predicting the evolution types of CLS streamers. Specifically, through the graph neural network (GNN) and gated-recurrent-unit (GRU) structure, Niffler can capture both the interactive features and the evolutionary dynamics. Moreover, by integrating the attention mechanism in the model design, Niffler can intelligently preserve the heterogeneity when learning different levels of node representations. We systematically compare Niffler against multiple baselines from different categories, and the experimental results show that our proposed model can achieve the best prediction performance.
##### Speaker
Speaker biography is not available.

### StreamSwitch: Fulfilling Latency Service-Layer Agreement for Stateful Streaming

Zhaochen She, Yancan Mao, Hailin Xiang, Xin Wang and Richard T. B. Ma (National University of Singapore, Singapore)

0
Distributed stream systems provide low latency by processing data as it arrives. However, existing systems do not provide latency guarantee, a critical requirement of realtime analytics, especially for stateful operators under burst and skewed workload. We present StreamSwitch, a control plane for stream systems to bound operator latency while optimizing resource usage. Based on a novel stream switch abstraction that unifies dynamic scaling and load balancing into a holistic control framework, our design incorporates reactive and predictive metrics to deduce the healthiness of executors and prescribes practically optimal scaling and load balancing decisions in time. We implement a prototype of StreamSwitch and integrate it with Apache Flink and Samza. Experimental evaluations on real-world applications and benchmarks show that StreamSwitch provides cost-effective solutions for bounding latency and outperforms the state-of-the-art alternative solutions.
##### Speaker
Speaker biography is not available.

### Latency-Oriented Elastic Memory Management at Task-Granularity for Stateful Streaming Processing

Rengan Dou and Richard T. B. Ma (National University of Singapore, Singapore)

0