Christian Timmerer – Bitmovin

144th MPEG Meeting Takeaways: Understanding Quality Impacts of Learning-based Codecs and Enhancing Green Metadata

Christian Timmerer — Sun, 07 Jan 2024 21:01:46 +0000

Preface

Bitmovin has been “Shaping the Future of Video” for over 10 years now and in addition to our own innovations, we’ve been actively taking part in standardization activities to improve the quality of video technologies for the wider industry. I have been a member and attendant of the Moving Pictures Experts Group for 15+ years and have been documenting the progress since early 2010. Recently, we’ve been working on several new initiatives including the use of learning-based codecs and enhancing support for more energy-efficient media consumption.

The 144th MPEG meeting highlights

The 144th MPEG meeting was held in Hannover, Germany! For those interested, the press release with all the details is available. It’s always great to see and hear about progress being made in person.

Attendees of the 144th MPEG meeting in Hannover, Germany.

The main outcome of this meeting is as follows:

MPEG issues Call for Learning-Based Video Codecs for Study of Quality Assessment
MPEG evaluates Call for Proposals on Feature Compression for Video Coding for Machines
MPEG progresses ISOBMFF-related Standards for the Carriage of Network Abstraction Layer Video Data
MPEG enhances the Support of Energy-Efficient Media Consumption
MPEG ratifies the Support of Temporal Scalability for Geometry-based Point Cloud Compression
MPEG reaches the First Milestone for the Interchange of 3D Graphics Formats
MPEG announces Completion of Coding of Genomic Annotations

This post will focus on MPEG Systems-related standards and visual quality assessment. As usual, the column will end with an update on MPEG-DASH.

Visual Quality Assessment

MPEG does not create standards in the visual quality assessment domain. However, it conducts visual quality assessments for its standards during various stages of the standardization process. For instance, it evaluates responses to call for proposals, conducts verification tests of its final standards, and so on.

MPEG Visual Quality Assessment (AG 5) issued an open call to study quality assessment for learning-based video codecs. AG 5 has been conducting subjective quality evaluations for coded video content and studying their correlation with objective quality metrics. Most of these studies have focused on the High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC) standards. To facilitate the study of visual quality, MPEG maintains the Compressed Video for the study of Quality Metrics (CVQM) dataset.

With the recent advancements in learning-based video compression algorithms, MPEG is now studying compression using these codecs. It is expected that reconstructed videos compressed using learning-based codecs will have different types of distortion compared to those induced by traditional block-based motion-compensated video coding designs. To gain a deeper understanding of these distortions and their impact on visual quality, MPEG has issued a public call related to learning-based video codecs. MPEG is open to inputs in response to the call and will invite responses that meet the call’s requirements to submit compressed bitstreams for further study of their subjective quality and potential inclusion into the CVQM dataset.

Considering the rapid advancements in the development of learning-based video compression algorithms, MPEG will keep this call open and anticipates future updates to the call.

Interested parties are kindly requested to contact the MPEG AG 5 Convenor Mathias Wien (wien@lfb.rwth- aachen.de) and submit responses for review at the 145th MPEG meeting in January 2024. Further details are given in the call, issued as AG 5 document N 104 and available from the mpeg.org website.

Learning-based data compression (e.g., for image, audio, video content) is a hot research topic. Research on this topic relies on datasets offering a set of common test sequences, sometimes also common test conditions, that are publicly available and allow for comparison across different schemes. MPEG’s Compressed Video for the study of Quality Metrics (CVQM) dataset is such a dataset, available here, and ready to be used also by researchers and scientists outside of MPEG. The call mentioned above is open for everyone inside/outside of MPEG and allows researchers to participate in international standards efforts (note: to attend meetings, one must become a delegate of a national body).

Bitmovin and the ATHENA research lab have been working together on ML-based enhancements to boost visual quality and improve QoE. You can read more about our published research in this blog post.

At the 144th MPEG meeting, MPEG Systems (WG 3) produced three news-worthy items as follows:

Progression of ISOBMFF-related standards for the carriage of Network Abstraction Layer (NAL) video data.
Enhancement of the support of energy-efficient media consumption.
Support of temporal scalability for geometry-based Point Cloud Compression (PPC).

ISO/IEC 14496-15, a part of the family of ISOBMFF-related standards, defines the carriage of Network Abstraction Layer (NAL) unit structured video data such as Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), Essential Video Coding (EVC), and Low Complexity Enhancement Video Coding (LCEVC). This standard has been further improved with the approval of the Final Draft Amendment (FDAM), which adds support for enhanced features such as Picture-in-Picture (PiP) use cases enabled by VVC.

In addition to the improvements made to ISO/IEC 14496-15, separately developed amendments have been consolidated in the 7th edition of the standard. This edition has been promoted to Final Draft International Standard (FDIS), marking the final milestone of the formal standard development.

Another important standard in development is the 2nd edition of ISO/IEC14496-32 (file format reference software and conformance). This standard, currently at the Committee Draft (CD) stage of development, is planned to be completed and reach the status of Final Draft International Standard (FDIS) by the beginning of 2025. This standard will be essential for industry professionals who require a reliable and standardized method of verifying the conformance of their implementation.

MPEG Systems (WG 3) also promoted ISO/IEC 23001-11 (energy-efficient media consumption (green metadata)) Amendment 1 to Final Draft Amendment (FDAM). This amendment introduces energy-efficient media consumption (green metadata) for Essential Video Coding (EVC) and defines metadata that enables a reduction in decoder power consumption. At the same time, ISO/IEC 23001-11 Amendment 2 has been promoted to the Committee Draft Amendment (CDAM) stage of development. This amendment introduces a novel way to carry metadata about display power reduction encoded as a video elementary stream interleaved with the video it describes. The amendment is expected to be completed and reach the status of Final Draft Amendment (FDAM) by the beginning of 2025.

Finally, MPEG Systems (WG 3) promoted ISO/IEC 23090-18 (carriage of geometry-based point cloud compression data) Amendment 1 to Final Draft Amendment (FDAM). This amendment enables the compression of a single elementary stream of point cloud data using ISO/IEC 23090-9 (geometry-based point cloud compression) and storing it in more than one track of ISO Base Media File Format (ISOBMFF)-based files. This enables support for applications that require multiple frame rates within a single file and introduces a track grouping mechanism to indicate multiple tracks carrying a specific temporal layer of a single elementary stream separately.

MPEG Systems usually provides standards on top of existing compression standards, enabling efficient storage and delivery of media data (among others). Researchers may use these standards (including reference software and conformance bitstreams) to conduct research in the general area of multimedia systems (cf. ACM MMSys) or, specifically on green multimedia systems (cf. ACM GMSys).

Enhancements to green metadata are welcome and necessary additions to the toolkit for everyone working on reducing the carbon footprint of video streaming workflows. Bitmovin and the GAIA project have been conducting focused research in this area for over a year now and through testing, benchmarking and developing new methods, hope to significantly improve our industry’s environmental sustainability. You can read more about our progress in this report.

MPEG-DASH Updates

The current status of MPEG-DASH is shown in the figure below with only minor updates compared to the last meeting.

MPEG-DASH Status, October 2023.

In particular, the 6th edition of MPEG-DASH is scheduled for 2024 but may not include all amendments under development. An overview of existing amendments can be found in the blog post from the last meeting. Current amendments have been (slightly) updated and progressed toward completion in the upcoming meetings. The signaling of haptics in DASH has been discussed and accepted for inclusion in the Technologies under Consideration (TuC) document. The TuC document comprises candidate technologies for possible future amendments to the MPEG-DASH standard and is publicly available here.

MPEG-DASH has been heavily researched in the multimedia systems, quality, and communications research communities. Adding haptics to MPEG-DASH would provide another dimension worth considering within research, including, but not limited to, performance aspects and Quality of Experience (QoE).

The 145th MPEG meeting will be online from January 22-26, 2024. Click here for more information about MPEG meetings and their developments.

Want to learn more about the latest research from the ATHENA lab and its potential applications? check out this post summarizing the projects from the first cohort of finishing PhD candidates.

Notes and highlights from previous MPEG meetings can be found here.

The post 144th MPEG Meeting Takeaways: Understanding Quality Impacts of Learning-based Codecs and Enhancing Green Metadata appeared first on Bitmovin.

143rd MPEG Meeting Takeaways: Green metadata support added to VVC for improved energy efficiency

Christian Timmerer — Tue, 22 Aug 2023 15:11:18 +0000

Preface

Bitmovin is a proud member and contributor to several organizations working to shape the future of video, including the Moving Pictures Expert Group (MPEG), where I along with a few senior developers at Bitmovin are active members. Personally, I have been a member and attendant of MPEG for 20+ years and have been documenting the progress since early 2010. Today, we’re working hard to further improve the capabilities and energy efficiency of the industry’s newest standards, such as VVC, while maintaining and modernizing older codecs like HEVC and AVC to take advantage of advancements in neural network post-processing.

The 143rd MPEG Meeting Highlights

The official press release of the 143rd MPEG meeting can be found here and comprises the following items:

MPEG finalizes the Carriage of Uncompressed Video and Images in ISOBMFF
MPEG reaches the First Milestone for two ISOBMFF Enhancements
MPEG ratifies Third Editions of VVC and VSEI
MPEG reaches the First Milestone of AVC (11th Edition) and HEVC Amendment
MPEG Genomic Coding extended to support Joint Structured Storage and Transport of Sequencing Data, Annotation Data, and Metadata
MPEG completes Reference Software and Conformance for Geometry-based Point Cloud Compression

In this report, I’d like to focus on ISOBMFF and video codecs and, as always, I will conclude with an update on MPEG-DASH.

ISOBMFF Enhancements

The ISO Base Media File Format (ISOBMFF) supports the carriage of a wide range of media data such as video, audio, point clouds, haptics, etc., which has now been further extended to uncompressed video and images.

ISO/IEC 23001-17 – Carriage of uncompressed video and images in ISOBMFF – specifies how uncompressed 2D image and video data is carried in files that comply with the ISOBMFF family of standards. This encompasses a range of data types, including monochromatic and colour data, transparency (alpha) information, and depth information. The standard enables the industry to effectively exchange uncompressed video and image data while utilizing all additional information provided by the ISOBMFF, such as timing, color space, and sample aspect ratio for interoperable interpretation and/or display of uncompressed video and image data.

ISO/IEC 14496-15, formerly known as MP4 file format (and based on ISOBMFF), provides the basis for “network abstraction layer (NAL) unit structured video coding formats” such as AVC, HEVC, and VVC. The current version is the 6th edition, which has been amended to support neural-network post-filter supplemental enhancement information (SEI) messages. This amendment defines the carriage of the neural-network post-filter characteristics (NNPFC) SEI messages and the neural-network post-filter activation (NNPFA) SEI messages to enable the delivery of (i) a base post-processing filter and (ii) a series of neural network updates synchronized with the input video pictures/frames.

Bitmovin has supported ISOBFF in our encoding pipeline and API from day 1 and will continue to do so. For more details and information about container file formats, check out this blog.

Video Codec Enhancements

MPEG finalized the specifications of the third editions of the Versatile Video Coding (VVC, ISO/IEC 23090-3) and the Versatile Supplemental Enhancement Information (VSEI, ISO/IEC 23002-7) standards. Additionally, MPEG issued the Committee Draft (CD) text of the eleventh edition of the Advanced Video Coding (AVC, ISO/IEC 14496-10) standard and the Committee Draft Amendment (CDAM) text on top of the High Efficiency Video Coding standard (HEVC, ISO/IEC 23008-2).

These SEI messages include two systems-related SEI messages, (a) one for signaling of green metadata as specified in ISO/IEC 23001-11 and (b) the other for signaling of an alternative video decoding interface for immersive media as specified in ISO/IEC 23090-13. Furthermore, the neural network post-filter characteristics SEI message and the neural-network post-processing filter activation SEI message have been added to AVC, HEVC, and VVC.

The two SEI messages for describing and activating post-filters using neural network technology in video bitstreams could, for example, be used for reducing coding noise, spatial and temporal upsampling (i.e., super-resolution and frame interpolation), color improvement, or general denoising of the decoder output. The description of the neural network architecture itself is based on MPEG’s neural network representation standard (ISO/IEC 15938 17). As results from an exploration experiment have shown, neural network-based post-filters can deliver better results than conventional filtering methods. Processes for invoking these new post-filters have already been tested in a software framework and will be made available in an upcoming version of the VVC reference software (ISO/IEC 23090-16).

Bitmovin and our partner ATHENA research lab have been exploring several applications of neural networks to improve the quality of experience for video streaming services. You can read the summaries with links to full publications in this blog post.

The latest MPEG-DASH Update

The current status of MPEG-DASH is depicted in the figure below:

The latest edition of MPEG-DASH is the 5th edition (ISO/IEC 23009-1:2022) which is publicly/freely available here. There are currently three amendments under development:

ISO/IEC 23009-1:2022 Amendment 1: Preroll, nonlinear playback, and other extensions. This amendment has been ratified already and is currently being integrated into the 5th edition of part 1 of the MPEG-DASH specification.
ISO/IEC 23009-1:2022 Amendment 2: EDRAP streaming and other extensions. EDRAP stands for Extended Dependent Random Access Point and at this meeting the Draft Amendment (DAM) has been approved. EDRAP increases the coding efficiency for random access and has been adopted within VVC.
ISO/IEC 23009-1:2022 Amendment 3: Segment sequences for random access and switching. This amendment is at Committee Draft Amendment (CDAM) stage, the first milestone of the formal standardization process. This amendment aims at improving tune-in time for low latency streaming.

Additionally, MPEG Technologies under Consideration (TuC) comprises a few new work items, such as content selection and adaptation logic based on device orientation and signaling of haptics data within DASH.

Finally, part 9 of MPEG-DASH — redundant encoding and packaging for segmented live media (REAP) — has been promoted to Draft International Standard (DIS). It is expected to be finalized in the upcoming MPEG meetings.

Bitmovin recently announced its new Player Web X which was reimagined and built from the ground up with structured concurrency. You can read more about it and why structured concurrency matters in this recent blog series.

The next meeting will be held in Hannover, Germany, from October 16-20, 2023. Further details can be found here.

Click here for more information about MPEG meetings and their developments.

Are you currently using the ISOBMFF or CMAF as a container format for fragmented MP4 files? Do you prefer hard-parted fMP4 or single-file MP4 with byte-range addressing? Vote in our poll and check out the Bitmovin Community to learn more.

Looking for more info on streaming formats and codecs? Here are some useful resources:

[Blog] VVC: Benefits, Supported Devices, and Bitmovin’s Implementation
[Blog] Live Low Latency Streaming Tech Deep Dive
[Demo] Low Latency ABR player demo

The post 143rd MPEG Meeting Takeaways: Green metadata support added to VVC for improved energy efficiency appeared first on Bitmovin.

142nd MPEG Meeting Takeaways: MPEG issues Call for Proposals for Feature Coding for Machines

Christian Timmerer — Wed, 24 May 2023 14:49:18 +0000

Preface

Bitmovin is a proud member and contributor to several organizations working to shape the future of video, none for longer than the Moving Pictures Expert Group (MPEG), where I along with a few senior developers at Bitmovin are active members. Personally, I have been a member and attendant of MPEG for 20+ years and have been documenting the progress since early 2010. Today, we’re working hard to further improve the capabilities and efficiency of the industry’s newest standards, while exploring the potential applications of machine learning and neural networks.

The 142nd MPEG Meeting – MPEG issues Call for Proposals for Feature Coding for Machines

The official press release of the 142nd MPEG meeting can be found here and comprises the following items:

MPEG issues Call for Proposals for Feature Coding for Machines
MPEG finalizes the 9th Edition of MPEG-2 Systems
MPEG reaches the First Milestone for Storage and Delivery of Haptics Data
MPEG completes 2nd Edition of Neural Network Coding (NNC)
MPEG completes Verification Test Report and Conformance and Reference Software for MPEG Immersive Video
MPEG finalizes work on metadata-based MPEG-D DRC Loudness Leveling

In this report, I’d like to focus on Feature Coding for Machines, MPEG-2 Systems, Haptics, Neural Network Coding (NNC), MPEG Immersive Video, and a brief update about DASH (as usual).

Feature Coding for Machines

At the 142nd MPEG meeting, MPEG Technical Requirements (WG 2) issued a Call for Proposals (CfP) for technologies and solutions enabling efficient feature compression for video coding for machine vision tasks. This work on “Feature Coding for Video Coding for Machines (FCVCM)” aims at compressing intermediate features within neural networks for machine tasks. As applications for neural networks become more prevalent and the neural networks increase in complexity, use cases such as computational offload become more relevant to facilitate widespread deployment of applications utilizing such networks. Initially as part of the “Video Coding for Machines” activity, over the last four years, MPEG has investigated potential technologies for efficient compression of feature data encountered within neural networks. This activity has resulted in establishing a set of ‘feature anchors’ that demonstrate the achievable performance for compressing feature data using state-of-the-art standardized technology. These feature anchors include tasks performed on four datasets.

9th Edition of MPEG-2 Systems

MPEG-2 Systems was first standardized in 1994, defining two container formats: program stream (e.g., used for DVDs) and transport stream. The latter, also known as MPEG-2 Transport Stream (M2TS), is used for broadcast and internet TV applications and services. MPEG-2 Systems has been awarded a Technology and Engineering Emmy® in 2013 and at the 142nd MPEG meeting, MPEG Systems (WG 3) ratified the 9th edition of ISO/IEC 13818-1 MPEG-2 Systems. The new edition includes support for Low Complexity Enhancement Video Coding (LCEVC), the youngest in the MPEG family of video coding standards on top of more than 50 media stream types, including, but not limited to, 3D Audio and Versatile Video Coding (VVC). The new edition also supports new options for signaling different kinds of media, which can aid the selection of the best audio or other media tracks for specific purposes or user preferences. As an example, it can indicate that a media track provides information about a current emergency.

Storage and Delivery of Haptics Data

At the 142nd MPEG meeting, MPEG Systems (WG 3) reached the first milestone for ISO/IEC 23090-32 entitled “Carriage of haptics data” by promoting the text to Committee Draft (CD) status. This specification enables the storage and delivery of haptics data (defined by ISO/IEC 23090-31) in the ISO Base Media File Format (ISOBMFF; ISO/IEC 14496-12). Considering the nature of haptics data composed of spatial and temporal components, a data unit with various spatial or temporal data packets is used as a basic entity like an access unit of audio-visual media. Additionally, an explicit indication of a silent period considering the sparse nature of haptics data, has been introduced in this draft. The standard is planned to be completed, i.e., to reach the status of Final Draft International Standard (FDIS), by the end of 2024.

Neural Network Coding (NNC)

Many applications of artificial neural networks for multimedia analysis and processing (e.g., visual and acoustic classification, extraction of multimedia descriptors, or image and video coding) utilize edge-based content processing or federated training. The trained neural networks for these applications contain many parameters (weights), resulting in a considerable size. Therefore, the MPEG standard for the compressed representation of neural networks for multimedia content description and analysis (NNC, ISO/IEC 15938-17, published in 2022) was developed, which provides a broad set of technologies for parameter reduction and quantization to compress entire neural networks efficiently.

Recently, an increasing number of artificial intelligence applications, such as edge-based content processing, content-adaptive video post-processing filters, or federated training, need to exchange updates of neural networks (e.g., after training on additional data or fine-tuning to specific content). Such updates include changes of the neural network parameters but may also involve structural changes in the neural network (e.g., when extending a classification method with a new class). In scenarios like federated training, these updates must be exchanged frequently, such that much more bandwidth over time is required, e.g., in contrast to the initial deployment of trained neural networks.

The second edition of NNC addresses these applications through efficient representation and coding of incremental updates and extending the set of compression tools that can be applied to both entire neural networks and updates. Trained models can be compressed to at least 10-20% and, for several architectures, even below 3% of their original size without performance loss. Higher compression rates are possible at moderate performance degradation. In a distributed training scenario, a model update after a training iteration can be represented at 1% or less of the base model size on average without sacrificing the classification performance of the neural network. NNC also provides synchronization mechanisms, particularly for distributed artificial intelligence scenarios, e.g., if clients in a federated learning environment drop out and later rejoin.

Verification Test Report and Conformance and Reference Software for MPEG Immersive Video

At the 142nd MPEG meeting, MPEG Video Coding (WG 4) issued the verification test report of ISO/IEC 23090-12 MPEG immersive video (MIV) and completed the development of the conformance and reference software for MIV (ISO/IEC 23090-23), promoting it to the Final Draft International Standard (FDIS) stage.

MIV was developed to support the compression of immersive video content, in which multiple real or virtual cameras capture a real or virtual 3D scene. The standard enables the storage and distribution of immersive video content over existing and future networks for playback with 6 degrees of freedom (6DoF) of view position and orientation. MIV is a flexible standard for multi-view video plus depth (MVD) and multi-planar video (MPI) that leverages strong hardware support for commonly used video formats to compress volumetric video.

ISO/IEC 23090-23 specifies how to conduct conformance tests and provides reference encoder and decoder software for MIV. This draft includes 23 verified and validated conformance bitstreams spanning all profiles and encoding and decoding reference software based on version 15.1.1 of the test model for MPEG immersive video (TMIV). The test model, objective metrics, and other tools are publicly available at https://gitlab.com/mpeg-i-visual.

The latest MPEG-DASH Update

Finally, I’d like to provide a quick update regarding MPEG-DASH, which has a new part, namely redundant encoding and packaging for segmented live media (REAP; ISO/IEC 23009-9). The following figure provides the reference workflow for redundant encoding and packaging of live segmented media.

The reference workflow comprises (i) Ingest Media Presentation Description (I-MPD), (ii) Distribution Media Presentation Description (D-MPD), and (iii) Storage Media Presentation Description (S-MPD), among others; each defining constraints on the MPD and tracks of ISO base media file format (ISOBMFF).

Additionally, the MPEG-DASH Break out Group discussed various technologies under consideration, such as (a) combining HTTP GET requests, (b) signaling common media client data (CMCD) and common media server data (CMSD) in a MPEG-DASH MPD, (c) image and video overlays in DASH, and (d) updates on lower latency.

An updated overview of DASH standards/features can be found in the Figure below.

MPEG-DASH Status – April 2023

The next meeting will be held in Geneva, Switzerland, from July 17-21, 2023. Further details can be found here.

Click here for more information about MPEG meetings and their developments.

Have any thoughts or questions about neural networks or the other updates described above? Check out Bitmovin’s Video Developer Community and join the conversation!

Looking for more info on video streaming formats and codecs? Here are some useful resources:

[Guide] The Definitive Guide to Video Codecs
[Tutorial] Encoding VR and 360 video for Meta Quest Headsets
[Blog] The 20 Best Live Streaming Encoders

The post 142nd MPEG Meeting Takeaways: MPEG issues Call for Proposals for Feature Coding for Machines appeared first on Bitmovin.

GAIA Research Project : A 3 Month Look Back

Christian Timmerer — Tue, 14 Feb 2023 11:03:00 +0000

A few months ago, Bitmovin and the University of Klagenfurt announced a new collaboration on a research project called GAIA, which aims to make video streaming more sustainable. Project ‘GAIA’ is co-funded by the Austrian Research Promotion Agency (FFG) and will focus on helping the video-streaming industry reduce its carbon footprint.

Dr Christian Timmerer is a Professor at the Institute of Information Technology (ITEC) at the University of Klagenfurt and one of the co-founders of Bitmovin. We had a quick chat with him to see how Project GAIA is progressing.

For those who aren’t aware, why is it important to reduce video streaming’s carbon footprint?

Climate change is the biggest threat facing this generation, requiring urgent action. We are already seeing the impact of climate change around the world, with record-breaking temperatures and more natural disasters. Everyone has to work together in the coming years if we are to turn the tide against it, including everyone working in the video streaming industry.

Currently, internet data traffic is responsible for more than half of digital technology’s global impact, which is 55% of energy consumption annually. Video processing and streaming generate 306 million tons of CO2, which is 20% of digital technology’s total GHG emissions and nearly 1% of worldwide GHG emissions. It’s why Bitmovin and the University of Klagenfurt are working together on Project GAIA to enable more climate-friendly video streaming solutions providing better energy awareness and efficiency through the end-to-end video workflow.

Combining Bitmovin’s history of innovation in video streaming with the University of Klagenfurt’s strong academic background in technology research means we are well-primed to help make video streaming more sustainable.

We are now three months into Project GAIA. What have been some of the key learnings?

Over the last three months, we have been deeply focused on investigating the challenges and opportunities associated with reducing emissions in video streaming. One thing we have been examining is data centres, which handle the video encoding process and storage of video content. They consume huge amounts of power, but there are ways to make them more sustainable, including selecting energy-optimized and sustainable cloud services to help reduce CO2 emission; identifying cloud computing regions with low carbon footpring; using more efficient and faster transcoders and encoders; and optimizing the video encoding parameters to reduce the bitrates of encoded videos without affecting quality.
We have also identified challenges and opportunities in video delivery within heterogeneous networks. Ways of reducing carbon emissions centre around energy-efficient network technology for video streaming and lower data transmission to reduce energy consumption. Lastly, we have also examined challenges and opportunities in end-user devices. Research actually shows that end-user devices and decoding hardware make up the greatest amount of energy consumption and CO2 emissions in the video delivery chain. We believe the best carbon emission reduction strategies lie in improving the energy efficiency of end users’ devices by improving screen display technologies or shifting from desktops to using more energy-efficient laptops, tablets, and smartphones.

What role will GAIA play in helping reduce video streaming’s carbon footprint?

I am incredibly excited about Project GAIA and the results it will yield. Our aim is to design a climate-friendly adaptive video streaming platform that provides complete energy awareness and accountability, including energy consumption and GHG emissions along the entire delivery chain, from content creation and server-side encoding to video transmission and client-side rendering; and reduced energy consumption and GHG emissions through advanced analytics and optimizations on all phases of the video delivery chain.
Our research will focus on providing benchmarking, energy-aware and machine learning-based modelling, optimization algorithms, monitoring, and auto-tuning, which will provide more quantifiable data on energy consumption in video streaming through the video delivery chain. Eventually, we hope to be able to use our findings to optimize encoding, streaming and playback concerning energy consumption.

The post GAIA Research Project : A 3 Month Look Back appeared first on Bitmovin.

The GAIA Research Project: Creating a Climate-Friendly Video Streaming Platform

Christian Timmerer — Thu, 06 Oct 2022 19:54:47 +0000

We’re excited to share that Bitmovin and the University of Klagenfurt are collaborating on a new research project with the goal of making video streaming more sustainable. Project ‘GAIA’ is co-funded by the Austrian Research Promotion Agency (FFG) and will help enable more climate-friendly video streaming solutions by providing better energy awareness and efficiency through the end-to-end video workflow.

Dr. Christian Timmerer is an Associate Professor at the Institute of Information Technology (ITEC) at the University of Klagenfurt and one of the co-founders of Bitmovin. We asked him a few questions to learn more about the goals and motivation behind the ‘GAIA’ project.

When you co-founded Bitmovin back in 2013, was there any focus on sustainability? What changes have you seen over the last 10+ years?

Christian: With our research background, we always tried to utilize the latest technology and research results which includes our focus on video codecs. For example, our first FFG-funded project termed “AdvUHD-DASH” aimed at integrating HEVC into our video encoding workflow; later, we were among the first to showcase AV1 live streaming (2017 NAB award); and now we’re already successfully experimenting with VVC (Collaboration with Fraunhofer HHI).

Each new generation of video codec reduces the amount of storage by approximately 50%, which contributes to sustainability goals. Over the past 10+ years, there has been a shift to focus on more efficient usage of the available resources, where in the beginning of video streaming over the internet, much was solved using massive over-provisioning. I think this is no longer the case, and people are starting to think about environmental and climate-friendly video streaming solutions in the industry.

GAIA is a two-year joint research project between Bitmovin and the University of Klagenfurt. What is the end goal, and how soon do you think there will be actionable results and recommendations?

Christian: The results of the GAIA project will (i) enable complete awareness and accountability of the energy consumption and GHG emissions and (ii) provide efficient strategies from encoding and streaming to playback and analytics that will minimize average energy consumption.

In the beginning, we will mainly focus on collecting data and benchmarking systems with regard to energy consumption that will hopefully lead to publicly available datasets useful for both industry and academia at large, like a baseline for later improvements. Later we will use those findings to optimize encoding, streaming and playback concerning energy consumption by following and repeating the traditional “design – implement – analyze” work cycles to iteratively devise and improve solutions.

Will the results of this research be exclusive to Bitmovin?

Christian: We will showcase results at the usual trade shows like NAB and IBC., while scientific results and findings will be published in renowned conferences and journals. We will try to make them publicly available as much as possible to increase the impact and adoption of these technologies within the industry and academia.

This is the fourth time Bitmovin and the University of Klagenfurt have collaborated on a research project. What makes this one unique?

Christian: Environmental-friendliness was always implicitly addressed within the scope of previous research projects; GAIA is unique as it makes this an explicit goal, allowing to address these issues as the top priority.

You can read more details about the GAIA research project here

Ready to make your own video workflows more eco-friendly with HEVC or AV1 encoding? Sign up for a free Bitmovin trial today!

The post The GAIA Research Project: Creating a Climate-Friendly Video Streaming Platform appeared first on Bitmovin.

139th MPEG Meeting Takeaways: MPEG issues Call for Evidence for Video Coding for Machines

Christian Timmerer — Wed, 24 Aug 2022 14:28:00 +0000

Preface

Bitmovin is a proud member and contributor to several organizations working to shape the future of video, none for longer than the Moving Pictures Expert Group (MPEG), where I along with a few senior developers at Bitmovin are active members. Personally, I have been a member and attendant of MPEG for 15+ years and have been documenting the progress since early 2010. Today, we’re working hard to further improve the capabilities and efficiency of the industry’s newest standards, such VVC, LCEVC, and MIV.

The 139th MPEG Meeting – MPEG issues Call for Evidence to drive the future of computer vision and smart transportation

The past few months of research and progression in the world of video standards setting at MPEG (and Bitmovin alike) have been quite busy and though we didn’t publish a quarterly blog for the 138th MPEG meeting, it’s worth sharing again that MPEG was awarded two Technology & Engineering Emmy® Awards for its MPEG-DASH and Open Font Format standards. The latest developments in the standards space have expectedly been focused around improvements to VVC & LCEVC, however, there have also been recent updates made to CMAF and progress with energy efficiency standards and immersive media codecs. I’ve addressed most of the recent updates. The official press release of the 139th MPEG meeting can be found here and comprises the following items:

MPEG Issues Call for Evidence for Video Coding for Machines (VCM)
MPEG Ratifies the Third Edition of Green Metadata, a Standard for Energy-Efficient Media Consumption
MPEG Completes the Third Edition of the Common Media Application Format (CMAF) by adding Support for 8K and High Frame Rate for High Efficiency Video Coding
MPEG Scene Descriptions adds Support for Immersive Media Codecs
MPEG Starts New Amendment of VSEI containing Technology for Neural Network-based Post Filtering
MPEG Starts New Edition of Video Coding-Independent Code Points Standard
MPEG White Paper on the Third Edition of the Common Media Application Format

In this report, I’d like to focus on VCM, Green Metadata, CMAF, VSEI, and a brief update about DASH (as usual).

Video Coding for Machines (VCM)

MPEG’s exploration work on Video Coding for Machines (VCM) aims at compressing features for machine-performed tasks such as video object detection and event analysis. As neural networks increase in complexity, architectures such as collaborative intelligence, whereby a network is distributed across an edge device and the cloud, become advantageous. With the rise of newer network architectures being deployed amongst a heterogenous population of edge devices, such architectures bring flexibility to systems implementers. Due to such architectures, there is a need to efficiently compress intermediate feature information for transport over wide area networks (WANs). As feature information differs substantially from conventional image or video data, coding technologies and solutions for machine usage could differ from conventional human-viewing-oriented applications to achieve optimized performance. With the rise of machine learning technologies and machine vision applications, the amount of video and images consumed by machines has rapidly grown.

Typical use cases include intelligent transportation, smart city technology, intelligent content management, etc., which incorporate machine vision tasks such as object detection, instance segmentation, and object tracking. Due to the large volume of video data, extracting and compressing the feature from a video is essential for efficient transmission and storage. Feature compression technology solicited in this Call for Evidence (CfE) can also be helpful in other regards, such as computational offloading and privacy protection.

Over the last three years, MPEG has investigated potential technologies for efficiently compressing feature data for machine vision tasks and established an evaluation mechanism that includes feature anchors, rate-distortion-based metrics, and evaluation pipelines. The evaluation framework of VCM depicted below comprises neural network tasks (typically informative) at both ends as well as VCM encoder and VCM decoder, respectively. The normative part of VCM typically includes the bitstream syntax which implicitly defines the decoder whereas other parts are usually left open for industry competition and research.

Further details about the CfP and how interested parties are able to respond can be found in the official press release here.

Green Metadata

MPEG Systems has been working on Green Metadata for the last ten years to enable the adaptation of the client’s power consumption according to the complexity of the bitstream. Many modern implementations of video decoders can adjust their operating voltage or clock speed to adjust the power consumption level according to the required computational power. Thus, if the decoder implementation knows the variation in the complexity of the incoming bitstream, then the decoder can adjust its power consumption level to the complexity of the bitstream. This will allow less energy use in general and extended video playback for the battery-powered devices.

The third edition enables support for Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) encoded bitstreams and enhances the capability of this standard for real-time communication applications and services. While finalizing the support of VVC, MPEG Systems has also started the development of a new amendment to the Green Metadata standard, adding the support of Essential Video Coding (EVC, ISO/IEC 23094-1) encoded bitstreams.

Making video coding and systems sustainable and environmentally-friendly will become a major issue in the years to come, specifically since more and more video services become available. However, we need a holistic approach considering all entities from production to consumption and Bitmovin is committed to contribute its share to these efforts.

Third Edition of Common Media Application Format (CMAF)

The third edition of CMAF adds two new media profiles for High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), namely for (i) 8K and (ii) High Frame Rate (HFR). Regarding the former, the media profile supporting 8K resolution video encoded with HEVC (Main 10 profile, Main Tier with 10 bits per colour component) has been added to the list of CMAF media profiles for HEVC. The profile will be branded as ‘c8k0’ and will support videos with up to 7680×4320 pixels (8K) and up to 60 frames per second. Regarding the latter, another media profile has been added to the list of CMAF media profiles, branded as ‘c8k1’ and supports HEVC encoded video with up to 8K resolution and up to 120 frames per second. Finally, chroma location indication support has been added to the 3rd edition of CMAF.

CMAF is an integral part of the video streaming system and enabler for (live) low-latency streaming. Bitmovin and its co-funded research lab ATHENA significantly contributed to enable (live) low latency streaming use cases through our joint solution with Akamai for chunked CMAF low latency delivery as well as our research projects exploring the challenges of real-world deployments and the best methods to optimize those implementations.

New Amendment for Versatile Supplemental Enhancement Information (VSEI) containing Technology for Neural Network-based Post Filtering

At the 139th MPEG meeting, the MPEG Joint Video Experts Team with ITU-T SG 16 (WG 5; JVET) issued a Committee Draft Amendment (CDAM) text for the Versatile Supplemental Enhancement Information (VSEI) standard (ISO/IEC 23002-7, a.k.a. ITU-T H.274). Beyond the SEI message for shutter interval indication, which is already known from its specification in Advanced Video Coding (AVC, ISO/IEC 14496-10, a.k.a. ITU-T H.264) and High Efficiency Video Coding (HEVC, ISO/IEC 23008-2, a.k.a. ITU-T H.265), and a new indicator for subsampling phase indication which is relevant for variable-resolution video streaming, this new amendment contains two Supplemental Enhancement Information (SEI) messages for describing and activating post filters using neural network technology in video bitstreams. This could reduce coding noise, upsampling, colour improvement, or denoising. The description of the neural network architecture itself is based on MPEG’s neural network coding standard (ISO/IEC 15938-17). Results from an exploration experiment have shown that neural network-based post filters can deliver better performance than conventional filtering methods. Processes for invoking these new post-processing filters have already been tested in a software framework and will be made available in an upcoming version of the Versatile Video Coding (VVC, ISO/IEC 23090-3, a.k.a. ITU-T H.266) reference software (ISO/IEC 23090-16, a.k.a. ITU-T H.266.2).

Neural network-based video processing (incl. coding) is gaining momentum and end user devices are becoming more and more powerful for such complex operations. Bitmovin and its co-funded research lab ATHENA investigated and researched such options; recently proposed LiDeR, a lightweight dense residual network for video super resolution on mobile devices that can compete with other state-of-the-art neural networks, while executing ~300% faster.

The latest MPEG-DASH Update

Finally, I’d like to provide a brief update on MPEG-DASH! At the 139^th MPEG meeting, MPEG Systems issued a new working draft related to Extended Dependent Random Access Point (EDRAP) streaming and other extensions which it will be further discussed during the Ad-hoc Group (AhG) period (please join the dash email list for further details/announcements). Furthermore, Defects under Investigation (DuI) and Technologies under Consideration (TuC) have been updated. Finally, a new part has been added (ISO/IEC 23009-9) which is called encoder and packager synchronization for which also a working draft has been produced. Publicly available documents (if any) can be found here.

An updated overview of DASH standards/features can be found in the Figure below.

The next meeting will be face-to-face in Mainz, Germany from October 24-28, 2022. Further details can be found here.

Click here for more information about MPEG meetings and their developments.

Have any questions about the formats and standards described above? Do you think MPEG is taking the first step toward enabling Skynet and Terminators by advancing video coding for machines? Check out Bitmovin’s Video Developer Community and let us know your thoughts.

Looking for more info on streaming formats and codecs? Here are some useful resources:

[E-Book] Ultimate Guide to Container Formats
[Blog] Live Low Latency Streaming Tech Deep Dive
[Guide] Practical Guide to HDR

The post 139th MPEG Meeting Takeaways: MPEG issues Call for Evidence for Video Coding for Machines appeared first on Bitmovin.

A Brief History of MPEG-DASH: From Early Development to Emmy® Award Win

Christian Timmerer — Thu, 19 May 2022 09:59:00 +0000

Video streaming is ubiquitous. It permeates every aspect of our lives. We watch viral videos on TikTok, attend work conferences via Zoom, use it to supplement our education at academic institutions and even use it to work up a sweat via connected gym equipment. Netflix had a transformative impact on how we access our favorite content by delivering it over the Internet and providing consumers with the flexibility to watch their favorite films and TV shows from anywhere and on any device. What makes the impact of video streaming on our day-to-day lives even more astonishing is that it’s still a nascent industry that only took off at the turn of the century. However, it wouldn’t have advanced so quickly without the Moving Picture Experts Group (MPEG), which recently won a Technology & Engineering Emmy® Award for its groundbreaking MPEG-DASH standard.
The development of MPEG-DASH began in 2010 when the likes of YouTube and Netflix laid the framework for the popularization of video streaming among consumers. However, the quality of streams was often sub-par and plagued with stalls, buffering, missing/wrong plug-ins, and poor image quality. MPEG-DASH aimed to create a new video streaming standard to deliver high-quality streams to users with minimal issues. MPEG-DASH uses adaptive bitrate technology to break down videos into smaller chunks and encode them at different quality levels. Adaptive bitrate streaming detects the user’s bandwidth in real-time and adjusts the quality of the stream.
MPEG-DASH was standardized in 2012, and it is the first adaptive bitrate streaming solution that is an international standard. What makes MPEG-DASH groundbreaking is that it allows internet-connected devices to receive high-quality streams, regardless of bandwidth quality. Its standardization was significant because it gave the industry confidence that it could universally adopt its capabilities compared to proprietary solutions. Furthermore, the fact it is codec agnostic means content can be encoded with any encoding format – making it possible for the entire media industry to improve the quality of their streams. The first live MPEG-DASH demonstration took place in August 2012. VRT offered its audience the chance to experience the Olympic Games broadcast on their devices via the newly standardized streaming standard.
The impact of MPEG-DASH is far-reaching and completely transformed the entire video streaming industry, including on-demand, live and low latency streaming – even 5G. It’s relied on by Hulu, Netflix and YouTube to empower them to deliver superior viewing experiences and accounts for more than 50% of the world-wide internet traffic today. Currently, MPEG is working on its 5th edition to address and meet the needs of the constantly evolving video streaming ecosystem and ensure its compatibility with new technologies.
MPEG-DASH is also deeply embedded in the DNA of Bitmovin, which was founded in 2013 and provided the springboard for the company’s success. MPEG-DASH was co-created by my fellow Bitmovin co-founders, Stefan Lederer and Chris Mueller, which sparked the development of the Bitmovin Player and Bitmovin Encoder – the first commercial solutions made for this video streaming standard. Bitmovin’s solutions were, and continue to be, backed by strong academic research, and it is one of the primary drivers behind our rapid growth. We have outpaced our competitors in under ten years and become the category leader for video streaming infrastructure. The competitiveness of our solutions is exemplified by the fact we are powering the world’s largest OTT online video providers, including the BBC, ClassPass, discovery+, Globo, The New York Times and Red Bull Media House many more.
MPEG’s Technology & Engineering Emmy® Award win is the culmination of years of hard work dedicated to optimizing video streams and providing audiences worldwide with superior viewing experiences. MPEG has been instrumental in some of the most significant technological advancements in the video streaming ecosystem. It is a fantastic achievement for the team, comprising over 90 researchers and engineers from around 60 companies worldwide, to receive this tremendous accolade. Congratulations again to the team!

The post A Brief History of MPEG-DASH: From Early Development to Emmy® Award Win appeared first on Bitmovin.

Efficiently Predicting Quality with ATHENA’s Video Complexity Analyzer (VCA) project

Christian Timmerer — Fri, 25 Mar 2022 19:30:34 +0000

For online prediction in live streaming applications, selecting low-complexity features is critical to ensure low-latency video streaming without disruptions. For each frame/ video/ video segment, two features, i.e., the average texture energy and the average gradient of the texture energy are determined. A DCT-based energy function is introduced to determine the block-wise texture of each frame. The spatial and temporal features of the video/ video segment are derived from the DCT-based energy function. The Video Complexity Analyzer (VCA) project is launched in 2022, aiming to provide the most efficient, highest performance spatial and temporal complexity prediction of each frame/ video/ video segment which can be used for a variety of applications like shot/scene detection, online per-title encoding.

What is the Video Complexity Analyzer

The primary objective of the Video Complexity Analyzer is to become the best spatial and temporal complexity predictor for every frame/ video segment/ video which aids in predicting encoding parameters for applications like scene-cut detection and online per-title encoding. VCA leverages x86 SIMD and multi-threading optimizations for effective performance. While VCA is primarily designed as a video complexity analyzer library, a command-line executable is provided to facilitate testing and development. We expect VCA to be utilized in many leading video encoding solutions in the coming years.

VCA is available as an open-source library, published under the GPLv3 license. For more details, please visit the software online documentation here. The source code can be found here.

Heatmap of spatial complexity (E)

Heatmap of temporal complexity (h)

A performance comparison (frames analyzed per second) of VCA (with different levels of threading enabled) compared to Spatial Information/Temporal Information (SITI) [Github] is shown below

Visual Complexity Analyzer vs Spatial Information/Temporal Information

How to Build a Video Complexity Analyzer

The software is tested mostly in Linux and Windows OS. It requires some pre-requisite software to be installed before compiling. The steps to build the project in Linux and Windows are explained below.

Prerequisites

CMake version 3.13 or higher.
Git.
C++ compiler with C++11 support
NASM assembly compiler (for x86 SIMD support)

The following C++11 compilers have been known to work:

Visual Studio 2015 or later
GCC 4.8 or later
Clang 3.3 or later

Execute Build

The following commands will check out the project source code and create a directory called ‘build’ where the compiler output will be placed. CMake is then used for generating build files and compiling the VCA binaries.

$ git clone https://github.com/cd-athena/VCA.git
$ cd VCA
$ mkdir build
$ cd build
$ cmake ../
$ cmake --build .

This will create VCA binaries in the VCA/build/source/apps/ folder.

Command-Line Options

General

Displaying Help Text:

--help, -h

Displaying version details:

--version, -v

Logging/Statistic Options

--complexity-csv

Write the spatial (E) and temporal complexity (h), epsilon, brightness (L) statistics to a Comma Separated Values log file. Creates the file if it doesn’t already exist. The following statistics are available:

POC Picture Order Count – The display order of the frames
E Spatial complexity of the frame
h Temporal complexity of the frame
epsilon Gradient of the temporal complexity of the frame
L Brightness of the frame

Unless option:–no-chroma is used, the following chroma statistics are also available:

avgU Average U chroma component of the frame
energyU Average U chroma texture of the frame
avgV Average V chroma component of the frame
energyV Average V chroma texture of the frame

--shot-csv < filename>

Write the shot id, the first POC of every shot to a Comma Separated Values log file. Creates the file if it doesn’t already exist.

--yuvview-stats

Write the per block results (L, E, h) to a stats file that can be visualized using YUView.

Performance Options

--no-chroma

Disable analysis of chroma planes (which is enabled by default).

--no-simd

The Video Complexity Analyzer will use all detected CPU SIMD architectures by default. This will disable that detection.

--threads

Specify the number of threads to use. Default: 0 (autodetect).

Input/Output

--input

Input filename. Raw YUV or Y4M supported. Use stdin for stdin. For example piping input from ffmpeg works like this:

ffmpeg.exe -i Sintel.2010.1080p.mkv -f yuv4mpegpipe - | vca.exe --input stdin

--y4m

Parse input stream as YUV4MPEG2 regardless of file extension. Primarily intended for use with stdin. This option is implied if the input filename has a “.y4m” extension

--input-depth

Bit-depth of input file or stream. Any value between 8 and 16. Default is 8. For Y4M files, this is read from the Y4M header.

--input-res

Source picture size [w x h]. For Y4M files, this is read from the Y4M header.

--input-csp

Chroma Subsampling. 4:0:0(monochrome), 4:2:0, 4:2:2, and 4:4:4 are supported. For Y4M files, this is read from the Y4M header.

--input-fps

The framerate of the input. For Y4M files, this is read from the Y4M header.

--skip

Number of frames to skip at start of input file. Default 0.

--frames, -f

Number of frames of input sequence to be analyzed. Default 0 (all).

Analyzer Configuration

--block-size <8/16/32>

Size of the non-overlapping blocks used to determine the E, h features. Default: 32.

--min-thresh

Minimum threshold of epsilon for shot detection.

--max-thresh

Maximum threshold of epsilon for shot detection.

Using the VCA API

VCA is written primarily in C++ and x86 assembly language. This API is wholly defined within :file: vcaLib.h in the source/lib/ folder of our source tree. All of the functions and variables and enumerations meant to be used by the end-user are present in this header.

vca_analyzer_open(vca_param param)

Create a new analyzer handler, all parameters from vca_param are copied. The returned pointer is then passed to all of the functions pertaining to this analyzer. Since vca_param is copied internally, the user may release their copy after allocating the analyzer. Changes made to their copy of the param structure have no affect on the analyzer after it has been allocated.

vca_result vca_analyzer_push(vca_analyzer *enc, vca_frame *frame)

Push a frame to the analyzer and start the analysis. Note that only the pointers will be copied but no ownership of the memory is transferred to the library. The caller must make sure that the pointers are valid until the frame was analyzed. Once a results for a frame was pulled the library will not use pointers anymore. This may block until there is a slot available to work on. The number of frames that will be processed in parallel can be set using nrFrameThreads.

bool vca_result_available(vca_analyzer *enc)

Check if a result is available to pull.

vca_result vca_analyzer_pull_frame_result(vca_analyzer *enc, vca_frame_results *result)

Pull a result from the analyzer. This may block until a result is available. Use vca_result_available() if you want to only check if a result is ready.

void vca_analyzer_close(vca_analyzer *enc)

Finally, the analyzer must be closed in order to free all of its resources. An analyzer that has been flushed cannot be restarted and reused. Once vca_analyzer_close() has been called, the analyzer handle must be discarded.
Try out the video complexity analyzer for yourself, amongst other exciting innovations both at https://athena.itec.aau.at/ and bitmovin.com

The post Efficiently Predicting Quality with ATHENA’s Video Complexity Analyzer (VCA) project appeared first on Bitmovin.

137th MPEG Meeting Takeaways: MPEG Wins Two More Emmy® Awards

Christian Timmerer — Tue, 08 Feb 2022 10:48:47 +0000

Preface

Bitmovin isn’t the only organization whose sole purpose is to shape the future of video – a few senior developers at Bitmovin along with me are active members of the Moving Pictures Expert Group (MPEG). Personally, I have been a member and attendant of MPEG for 15+ years and have been documenting the progress since early 2010. Today, we’re working hard to further improve the capabilities and efficiency of the industry’s newest standards, such as VVC, LCEVC, and MIV.

The 137th MPEG Meeting – Immersive Experiences Move Forward

It’s been a long six months of research and progression in the world of video standards-setting – as MPEG (and Bitmovin alike) have had their heads down to run new efficiency experiments to improve codecs such as VVC/h.266 in collaboration with Fraunhofer HHI, I haven’t had the chance to publish one of my quarterly meeting reports for the 136th MPEG meeting. So, this month’s report will cover both the 136th and 137th MPEG Meetings. The latest developments in the standards space have expectedly been focused around improvements to VVC & LCEVC, however, what’s progressing fast than usual are technologies that focus in on both audio & visual immersive experiences.
I’ve addressed most of the recent updates in the official press release of the 137th MPEG meeting can be found here and comprises the following items:

- MPEG Systems Wins Two More Technology & Engineering Emmy® Awards
- MPEG Audio Coding selects 6DoF Technology for MPEG-I Immersive Audio
- MPEG Requirements issues Call for Proposals for Encoder and Packager Synchronization
- MPEG Systems promotes MPEG-I Scene Description to the Final Stage
- MPEG Systems promotes Smart Contracts for Media to the Final Stage
- MPEG Systems further enhanced the ISOBMFF Standard
- MPEG Video Coding completes Conformance and Reference Software for LCEVC
- MPEG Video Coding issues Committee Draft of Conformance and Reference Software for MPEG Immersive Video
- JVET produces Second Editions of VVC & VSEI and finalizes VVC Reference Software
- JVET promotes Tenth Edition of AVC to Final Draft International Standard
- JVET extends HEVC for High-Capability Applications up to 16K and Beyond
- MPEG Genomic Coding evaluated Responses on New Advanced Genomics Features and Technologies

MPEG White Papers

Neural Network Coding (NNC)

Low Complexity Enhancement Video Coding (LCEVC)
MPEG Immersive Video

In this report, I’d like to focus on the Emmy® Awards, video coding updates (AVC, HEVC, VVC, and beyond), and a brief update about DASH (as usual).

MPEG Systems Wins Two More Technology & Engineering Emmy® Awards

MPEG Systems is pleased to report that MPEG is being recognized this year by the National Academy for Television Arts and Sciences (NATAS) with two Technology & Engineering Emmy® Awards, for (i) “standardization of font technology for custom downloadable fonts and typography for Web and TV devices and for (ii) “standardization of HTTP encapsulated protocols”, respectively.
The first of these Emmys is related to MPEG’s Open Font Format (ISO/IEC 14496-22) and the second of these Emmys is related to MPEG Dynamic Adaptive Streaming over HTTP (i.e., MPEG DASH, ISO/IEC 23009). The MPEG DASH standard is the only commercially deployed international standard technology for media streaming over HTTP and it is widely used in many products. MPEG developed the first edition of the DASH standard in 2012 in collaboration with 3GPP and since then has produced four more editions amending the core specification by adding new features and extended functionality. Furthermore, MPEG has developed six other standards as additional “parts” of ISO/IEC 23009 enabling the effective use of the MPEG DASH standards with reference software and conformance testing tools, guidelines, and enhancements for additional deployment scenarios. MPEG DASH has dramatically changed the streaming industry by providing a standard that is widely adopted by various consortia such as 3GPP, ATSC, DVB, and HbbTV, and across different sectors. The success of this standard is due to its technical excellence, large participation of the industry in its development, addressing the market needs, and working with all sectors of industry all under ISO/IEC JTC 1/SC 29 MPEG Systems’ standard development practices and leadership.
These are MPEG’s fifth and sixth Technology & Engineering Emmy® Awards (after MPEG-1 and MPEG-2 together with JPEG in 1996, Advanced Video Coding (AVC) in 2008, MPEG-2 Transport Stream in 2013, and ISO Base Media File Format in 2021) and MPEG’s seventh and eighth overall Emmy® Awards (including the Primetime Engineering Emmy® Awards for Advanced Video Coding (AVC) High Profile in 2008 and High-Efficiency Video Coding (HEVC) in 2017).
Bitmovin and its founders have been actively contributing to the MPEG DASH standard since its inception. My initial blog post dates back to 2010 and the first edition of MPEG DASH was published in 2012. A more detailed MPEG DASH timeline provides many pointers to the Institute of Information Technology (ITEC) at the Alpen-Adria-Universität Klagenfurt and its DASH activities that is now continued within the Christian Doppler Laboratory ATHENA. In the end, the MPEG DASH community of contributors to and users of the standards can be very proud of this achievement only after 10 years of the first edition being published. Thus, also happy 10th birthday MPEG DASH and what a nice birthday gift.

Video Coding Updates

In terms of video coding, there have been many updates across various standards’ projects at the 137th MPEG Meeting.

Advanced Video Coding

Starting with Advanced Video Coding (AVC), the 10th edition of Advanced Video Coding (AVC, ISO/IEC 14496-10 | ITU-T H.264) has been promoted to Final Draft International Standard (FDIS) which is the final stage of the standardization process. Beyond various text improvements, this specifies a new SEI message for describing the shutter interval applied during video capture. This can be variable in video cameras, and conveying this information can be valuable for analysis and post-processing of the decoded video.

High-Efficiency Video Coding

The High-Efficiency Video Coding (HEVC, ISO/IEC 23008-2 | ITU-T H.265) standard has been extended to support high-capability applications. It defines new levels and tiers providing support for very high bit rates and video resolutions up to 16K, as well as defining an unconstrained level. This will enable the usage of HEVC in new application domains, including professional, scientific, and medical video sectors.

Versatile Video Coding

The second editions of Versatile Video Coding (VVC, ISO/IEC 23090-3 | ITU-T H.266) and Versatile supplemental enhancement information messages for coded video bitstreams (VSEI, ISO/IEC 23002-7 | ITU-T H.274) have reached FDIS status. The new VVC version defines profiles and levels supporting larger bit depths (up to 16 bits), including some low-level coding tool modifications to obtain improved compression efficiency with high bit-depth video at high bit rates. VSEI version 2 adds SEI messages giving additional support for scalability, multi-view, display adaptation, improved stream access, and other use cases. Furthermore, a Committee Draft Amendment (CDAM) for the next amendment of VVC was issued to begin the formal approval process to enable linking VVC with the Green Metadata (ISO/IEC 23001-11) and Video Decoding Interface (ISO/IEC 23090-13) standards and add a new unconstrained level for exceptionally high capability applications such as certain uses in professional, scientific, and medical application scenarios. Finally, the reference software package for VVC (ISO/IEC 23090-16) was also completed with its achievement of FDIS status. Reference software is extremely helpful for developers of VVC devices, helping them in testing their implementations for conformance to the video coding specification.

Beyond VVC

The activities in terms of video coding beyond VVC capabilities, the Enhanced Compression Model (ECM 3.1) performance over VTM-11.0 + JVET-V0056 (i.e., VVC reference software) shows an improvement of close to 15% for Random Access Main 10. This is indeed encouraging and, in general, these activities are currently managed within two exploration experiments (EEs). The first is on neural network-based (NN) video coding technology (EE1) and the second is on enhanced compression beyond VVC capability (EE2). EE1 currently plans to further investigate (i) enhancement filters (loop and post) and (ii) super-resolution (JVET-Y2023). It will further investigate selected NN technologies on top of ECM 4 and the implementation of selected NN technologies in the software library, for platform-independent cross-checking and integerization. Enhanced Compression Model 4 (ECM 4) comprises new elements on MRL for intra, various GPM/affine/MV-coding improvements including TM, adaptive intra MTS, coefficient sign prediction, CCSAO improvements, bug fixes, and encoder improvements (JVET-Y2025). EE2 will investigate intra prediction improvements, inter prediction improvements, improved screen content tools, and improved entropy coding (JVET-Y2024).
Bitmovin Encoding supports AVC and HEVC for many years and currently investigates the integration of VVC. Bitmovin Player is coding format agnostic and utilizes the underlying hardware/software platform for decoding but efficiently handles multi-codec use cases.

The latest MPEG-DASH Update

Finally, I’d like to provide a brief update on MPEG-DASH! At the 137th MPEG meeting, MPEG Systems issued a draft amendment to the core MPEG-DASH specification (i.e., ISO/IEC 23009-1) about Extended Dependent Random Access Point (EDRAP) streaming and other extensions which it will be further discussed during the Ad-hoc Group (AhG) period (please join the dash email list for further details/announcements). Furthermore, Defects under Investigation (DuI) and Technologies under Consideration (TuC) are available here.
An updated overview of DASH standards/features can be found in the Figure below.

MPEG DASH Status – January 2022

The next meeting will be again an online meeting in April 2022.
Click here for more information about MPEG meetings and their developments.
Check out the following links for other great reads!
A little lost about the formats and standards described above? Check out some other great educational content to learn more!

The post 137th MPEG Meeting Takeaways: MPEG Wins Two More Emmy® Awards appeared first on Bitmovin.

Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming

Christian Timmerer — Thu, 23 Dec 2021 16:25:36 +0000

The Future of HTTP Adaptive Streaming (HAS)

According to multiple reports, video viewing will account for as much as 82% of all internet traffic by the end of 2022, as such the popularity of HTTP Adaptive Streaming (HAS) is steadily increasing to efficiently support modern demand. Furthermore, improvements in video characteristics such as frame rate, resolution, and bit depth raise the need to develop a large-scale, highly efficient video encoding environment. This is even more crucial for DASH-based content provisioning as it requires encoding multiple representations of the same video content. Each video is encoded at multiple bitrates and spatial resolutions (i.e., representations) to adapt to the heterogeneity of network conditions, device characteristics, and end-user preferences as shown in Figure 1. However, encoding the same content at multiple representations requires substantial resources and costs for content providers.

Figure 1: A systematic representation of encoding scheme in HAS

As seen in Figure 2, as resolution doubles, encoding time complexity also doubles! To address this challenge, we must employ multi-encoding schemes to accelerate the encoding process of multiple representations without impacting quality. This is achieved by exploiting a high correlation of encoder analysis decisions (like block partitioning and prediction mode decisions) across multiple representations.

Figure 2: Relative time complexity of encoding representations in x265 HEVC encoding.

What is Multi-Encoding?

To encode multiple renditions of the same video at multiple bitrates and resolution, we reuse encoder analysis information across various renditions. This is due to the fact that there is a strong correlation of encoder decisions across various bitrate and resolution renditions. The scheme of sharing analysis information across multiple bitrates within a resolution is termed “multi-rate encoding” while sharing the information across multiple resolutions is termed as “multi-resolution encoding”. “Multi-encoding” is a generalized term that combines both multi-rate and multi-resolution encoding schemes.

Proposed Heuristics:

To aid the encoding process of the dependent renditions in HEVC, the ATHENA Labs research team proposes a few new encoder decision heuristics, Prediction Mode and Motion Estimation:
Prediction Mode Heuristics:
Prediction Mode heuristics are those where the selected Coding Unit (CU) size for the dependent renditions is the same as the reference representation – this can be further broken down into the following modes:

If the SKIP mode was chosen in the highest bitrate rendition, rate-distortion optimization is evaluated for only MERGE/SKIP modes.
If the 2Nx2N mode was chosen in the highest bitrate rendition, RDO is skipped for AMP modes.
If the inter-prediction mode was chosen in the highest bitrate rendition, RDO is skipped for intra-prediction modes.
If the intra-prediction mode was chosen for the highest and lowest bitrate rendition, RDO is evaluated for only intra-prediction modes in the intermediate renditions.

Motion Estimation Heuristics:
Motion Estimation heuristics are those where the CU size and PU selected for the dependent representations are the same as the reference representation:

The same reference frame is forced as that of the highest bitrate rendition.
The Motion Vector Predictor (MVP) is set to be the Motion Vector (MV) of the highest bitrate rendition.
The motion search range is decreased to a smaller window if the MVs of the highest and the lowest bitrate renditions are close to each other.

Based on the above-mentioned heuristics, two multi-encoding schemes are proposed.

Proposed Multi-encoding Schemes

In our first proposed multi-encoding approach we perform the following steps:

The first resolution tier (i.e, 540p in our example) is encoded using the combination of double-bound for CU depth estimation (c.f. Previous blog post), Prediction Mode Heuristics, and Motion Estimation Heuristics.
The CU depth from the highest bit rate representation of the first resolution tier (i.e., 540p) is shared with the highest bit rate representation of the next resolution tier (i.e., 1080p in our example). In particular, the information is used as a lower bound, i.e., the CU is forced to split if the current encode depth is lower than the reference encode CU depth. The remaining bitrate representations of this resolution tier are encoded using the multi-rate scheme as used in Step 1.
Repeat Step 2 for the remaining resolution tiers in ascending order with respect to the resolution until no more resolution tiers are left (i.e., only for 2160p in our example).

Figure 3: An example of the first proposed multi-encoding scheme.

The second proposed multi-encoding scheme is a minor variation of the first scheme which aims to extend the double-bound for CU depth estimation scheme across resolution tiers. It is performed in the following steps:

The first resolution tier (i.e., 540p in our example) is encoded using the combination of double-bound for CU depth estimation, Prediction Mode Heuristics, and Motion Estimation Heuristics.
The CU depth from the lowest bit rate representation of the first resolution tier (i.e., 540p) is shared with the highest bit rate representation of the next resolution tier (i.e., 1080p in our example). In particular, the information is used as a lower bound, i.e., the CU is forced to split if the current encode depth is lower than the reference encode CU depth.
The scaled CU depth from the lowest bit rate representation of the previous resolution tier (i.e., 540p) and CU depth information from the highest bit rate representation of the current resolution tier are shared with the lowest bit rate representation of the current resolution tier (i.e., 1080p in our example) and are used as the lower bound and upper bound respectively for CU depth search.
The remaining bit rate representations of this resolution tier (i.e., 1080p) are encoded using the multi-rate scheme as used in Step 1.
Repeat Step 2 for the remaining resolution tiers in ascending order with respect to the resolution until no more resolution tiers are left (i.e., only for 2160p in our example).

Figure 4: An example of the second proposed multi-encoding scheme.

Results

It is observed that the state-of-the-art scheme yields the highest average encoding time-saving, i.e., 80.05%, but it comes with a bitrate increase of 13.53% and 9.59% to maintain the same PSNR and VMAF respectively as compared to the stand-alone encodings. The first proposed multi-encoding scheme has the lowest increase in bitrate to maintain the same PSNR and VMAF (2.32% and 1.55%) respectively as compared to the stand-alone encodings. The second proposed multi-encoding scheme improves the encoding time savings of the first proposed multi-encoding scheme by 11% with a negligible increase in bitrate to maintain the same PSNR and VMAF. This result is shown in Table 1, where Delta T represents the overall encoding time-savings compared to the stand-alone encodings, BDR_P and BDR_V refer to the average difference in bitrate with respect to stand-alone encodings to maintain the same PSNR and VMAF, respectively.

Results of the proposed multi-encoding schemes

View the full multi-encoding research paper from ATHENA here.
If you liked this article, check out some of our other great ATHENA content at the following links:

The post Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming appeared first on Bitmovin.

Christian Timmerer – Bitmovin

144th MPEG Meeting Takeaways: Understanding Quality Impacts of Learning-based Codecs and Enhancing Green Metadata

Table of Contents

Preface

The 144th MPEG meeting highlights

Visual Quality Assessment

MPEG Systems-related Standards

MPEG-DASH Updates

143rd MPEG Meeting Takeaways: Green metadata support added to VVC for improved energy efficiency

Table of Contents

Preface

The 143rd MPEG Meeting Highlights

ISOBMFF Enhancements

Video Codec Enhancements

The latest MPEG-DASH Update

142nd MPEG Meeting Takeaways: MPEG issues Call for Proposals for Feature Coding for Machines

Preface

The 142nd MPEG Meeting – MPEG issues Call for Proposals for Feature Coding for Machines

Feature Coding for Machines

9th Edition of MPEG-2 Systems

Storage and Delivery of Haptics Data

Neural Network Coding (NNC)

Verification Test Report and Conformance and Reference Software for MPEG Immersive Video

The latest MPEG-DASH Update

GAIA Research Project : A 3 Month Look Back

For those who aren’t aware, why is it important to reduce video streaming’s carbon footprint?

We are now three months into Project GAIA. What have been some of the key learnings?

What role will GAIA play in helping reduce video streaming’s carbon footprint?

The GAIA Research Project: Creating a Climate-Friendly Video Streaming Platform

When you co-founded Bitmovin back in 2013, was there any focus on sustainability? What changes have you seen over the last 10+ years?

GAIA is a two-year joint research project between Bitmovin and the University of Klagenfurt. What is the end goal, and how soon do you think there will be actionable results and recommendations?

Will the results of this research be exclusive to Bitmovin?

This is the fourth time Bitmovin and the University of Klagenfurt have collaborated on a research project. What makes this one unique?

139th MPEG Meeting Takeaways: MPEG issues Call for Evidence for Video Coding for Machines

Preface

The 139th MPEG Meeting – MPEG issues Call for Evidence to drive the future of computer vision and smart transportation

Video Coding for Machines (VCM)

Green Metadata

Third Edition of Common Media Application Format (CMAF)

New Amendment for Versatile Supplemental Enhancement Information (VSEI) containing Technology for Neural Network-based Post Filtering

The latest MPEG-DASH Update

A Brief History of MPEG-DASH: From Early Development to Emmy® Award Win

Efficiently Predicting Quality with ATHENA’s Video Complexity Analyzer (VCA) project

What is the Video Complexity Analyzer

How to Build a Video Complexity Analyzer

Prerequisites

Execute Build

Command-Line Options

General

Logging/Statistic Options

Performance Options

Input/Output

Analyzer Configuration

Using the VCA API

137th MPEG Meeting Takeaways: MPEG Wins Two More Emmy® Awards

Preface

The 137th MPEG Meeting – Immersive Experiences Move Forward

MPEG Systems Wins Two More Technology & Engineering Emmy® Awards

Video Coding Updates

Advanced Video Coding

High-Efficiency Video Coding

Versatile Video Coding

Beyond VVC

The latest MPEG-DASH Update

Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming

The Future of HTTP Adaptive Streaming (HAS)

What is Multi-Encoding?

Proposed Heuristics:

Proposed Multi-encoding Schemes

Results