Delivery of omnidirectional video using saliency prediction and optimal bitrate allocation

  • PDF / 958,191 Bytes
  • 8 Pages / 595.276 x 790.866 pts Page_size
  • 56 Downloads / 166 Views

DOWNLOAD

REPORT


ORIGINAL PAPER

Delivery of omnidirectional video using saliency prediction and optimal bitrate allocation Cagri Ozcinar1

· Nevrez ˙Imamoglu ˘ 2 · Weimin Wang2 · Aljosa Smolic1

Received: 2 May 2020 / Revised: 10 August 2020 / Accepted: 25 August 2020 © Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract In this work, we propose and investigate a user-centric framework for the delivery of omnidirectional video (ODV) on VR systems by taking advantage of visual attention (saliency) models for bitrate allocation module. For this purpose, we formulate a new bitrate allocation algorithm that takes saliency map and nonlinear sphere-to-plane mapping into account for each ODV and solve the formulated problem using linear integer programming. For visual attention models, we use both image- and video-based saliency prediction results; moreover, we explore two types of attention model approaches: (i) salient object detection with transfer learning using pre-trained networks, (ii) saliency prediction with supervised networks trained on eyefixation dataset. Experimental evaluations on saliency integration of models are discussed with interesting findings on transfer learning and supervised saliency approaches. Keywords 360◦ Video streaming · Attention-based bitrate allocation · Saliency maps with transfer learning and supervision

1 Introduction Recent technological advancements in media streaming networks and virtual reality (VR) devices have made it feasible to deliver omnidirectional video (ODV) with high quality. A live ODV streaming service via the 5G network at the world of the Olympic Winter Games [10] is a decisive proof of relevance. ODV technology provides an interactive VR video This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under the Grant Number 15/RP/27760, V-SENSE, Trinity College Dublin, Ireland. This paper is partly based on the results obtained from a Project commissioned by Public/Private R&D Investment Strategic Expansion Program (PRISM), AIST, Japan. Cagri Ozcinar and Nevrez ˙Imamo˘glu equally contributed to this work.

B

Cagri Ozcinar [email protected] Nevrez ˙Imamo˘glu [email protected] Weimin Wang [email protected] Aljosa Smolic [email protected]

1

V-SENSE, Trinity College Dublin, Dublin, Ireland

2

Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan

experience than that available through traditional 2D video displayed with a flat screen. ODV covers 360◦ of a scene and can be viewed through a head-mounted display (HMD) that allows viewers to look around a scene from a central point of view in VR. ODVs are stored in 2D planar representations, equirectangular projection (ERP), to be compatible with the existing video technology systems. Thanks to its immersive and interactive nature, ODV can be used in different applications such as entertainment, e-commerce, social media, and even job training. Despite recent technological improvements an