<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	 xmlns:media="http://search.yahoo.com/mrss/" >

<channel>
	<title>Mohammad Ghanbari &#8211; Bitmovin</title>
	<atom:link href="https://bitmovin.com/author/mohammad-ghanbari/feed" rel="self" type="application/rss+xml" />
	<link>https://bitmovin.com</link>
	<description>Bitmovin provides adaptive streaming infrastructure for video publishers and integrators. Fastest cloud encoding and HTML5 Player. Play Video Anywhere.</description>
	<lastBuildDate>Mon, 09 Jan 2023 10:49:15 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://bitmovin.com/wp-content/uploads/2023/11/bitmovin_favicon.svg</url>
	<title>Mohammad Ghanbari &#8211; Bitmovin</title>
	<link>https://bitmovin.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Efficiently Predicting Quality with ATHENA&#8217;s Video Complexity Analyzer (VCA) project</title>
		<link>https://bitmovin.com/video-complexity-analyzer-vca</link>
		
		<dc:creator><![CDATA[Christian Timmerer]]></dc:creator>
		<pubDate>Fri, 25 Mar 2022 19:30:34 +0000</pubDate>
				<category><![CDATA[Innovation]]></category>
		<category><![CDATA[athena]]></category>
		<guid isPermaLink="false">https://bitmovin.com/?p=224553</guid>

					<description><![CDATA[<p>For online prediction in live streaming applications, selecting low-complexity features is critical to ensure low-latency video streaming without disruptions. For each frame/ video/ video segment, two features, i.e., the average texture energy and the average gradient of the texture energy are determined. A DCT-based energy function is introduced to determine the block-wise texture of each...</p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/video-complexity-analyzer-vca">Efficiently Predicting Quality with ATHENA&#8217;s Video Complexity Analyzer (VCA) project</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><span style="font-weight: 400;">For online prediction in live streaming applications, selecting low-complexity features is critical to ensure low-latency video streaming without disruptions. For each frame/ video/ video segment, two features, i.e., the average texture energy and the average gradient of the texture energy are determined. A DCT-based energy function is introduced to determine the block-wise texture of each frame. The spatial and temporal features of the video/ video segment are derived from the DCT-based energy function. The Video Complexity Analyzer (VCA) project is launched in 2022, aiming to provide the most efficient, highest performance spatial and temporal complexity prediction of each frame/ video/ video segment which can be used for a variety of applications like shot/scene detection, online per-title encoding.</span></p>
<h2><span style="font-weight: 400;">What is the Video Complexity Analyzer</span></h2>
<p style="text-align: left;"><span style="font-weight: 400;">The primary objective of the Video Complexity Analyzer is to become the best spatial and temporal complexity predictor for every frame/ video segment/ video which aids in predicting encoding parameters for applications like scene-cut detection and online per-title encoding. VCA leverages x86 SIMD and multi-threading optimizations for effective performance. While VCA is primarily designed as a video complexity analyzer library, a command-line executable is provided to facilitate testing and development. We expect VCA to be utilized in many leading video encoding solutions in the coming years.</span></p>
<p><span style="font-weight: 400;">VCA is available as an open-source library, published under the GPLv3 license. For more details, please visit the software online documentation </span><a href="https://cd-athena.github.io/VCA/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">here</span></a><span style="font-weight: 400;">. The source code can be found </span><a href="https://github.com/cd-athena/VCA" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">here</span></a><span style="font-weight: 400;">.</span><br />
&nbsp;</p>
<p style="text-align: center;"><img fetchpriority="high" decoding="async" class="aligncenter" src="https://athena.itec.aau.at/wp-content/uploads/sites/12/2022/02/beauty_E.gif" width="426" height="480" alt="- Bitmovin"><br />
<strong>Heatmap of spatial complexity (E)</strong></p>
<p style="text-align: center;"><img decoding="async" class="" src="https://athena.itec.aau.at/wp-content/uploads/sites/12/2022/02/beauty_h.gif" width="426" height="480" alt="- Bitmovin"><br />
<strong>Heatmap of temporal complexity (h)</strong></p>
<p><span style="font-weight: 400;">A performance comparison (frames analyzed per second) of VCA (with different levels of threading enabled) compared to Spatial Information/Temporal Information (SITI) [</span><a href="https://github.com/Telecommunication-Telemedia-Assessment/SITI" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Github</span></a><span style="font-weight: 400;">] is shown below</span></p>
<figure id="attachment_224831" aria-describedby="caption-attachment-224831" style="width: 512px" class="wp-caption aligncenter"><img decoding="async" class="wp-image-224831" src="https://bitmovin.com/wp-content/uploads/2022/03/SITI-EH2-1-300x197.png" alt="Visual Complexity Analyzer vs Spatial Information/Temporal Information_Bar Chart" width="512" height="337"><figcaption id="caption-attachment-224831" class="wp-caption-text">Visual Complexity Analyzer vs Spatial Information/Temporal Information</figcaption></figure>
<h2><span style="font-weight: 400;">How to Build a Video Complexity Analyzer</span></h2>
<p><span style="font-weight: 400;">The software is tested mostly in Linux and Windows OS. It requires some pre-requisite software to be installed before compiling. The steps to build the project in Linux and Windows are explained below.</span></p>
<h3><span style="font-weight: 400;">Prerequisites</span></h3>
<ol>
<li style="font-weight: 400;" aria-level="1"><a href="https://cmake.org/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">CMake</span></a><span style="font-weight: 400;"> version 3.13 or higher.</span></li>
<li style="font-weight: 400;" aria-level="1"><a href="https://git-scm.com/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Git</span></a><span style="font-weight: 400;">.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">C++ compiler with C++11 support</span></li>
<li style="font-weight: 400;" aria-level="1"><a href="https://nasm.us/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">NASM</span></a><span style="font-weight: 400;"> assembly compiler (for x86 SIMD support)</span></li>
</ol>
<p><span style="font-weight: 400;">The following C++11 compilers have been known to work:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Visual Studio 2015 or later</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">GCC 4.8 or later</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Clang 3.3 or later</span></li>
</ul>
<h3><span style="font-weight: 400;">Execute Build</span></h3>
<p><span style="font-weight: 400;">The following commands will check out the project source code and create a directory called ‘build’ where the compiler output will be placed. CMake is then used for generating build files and compiling the VCA binaries.</span></p>
<pre><span style="font-weight: 400;">$ git clone https://github.com/cd-athena/VCA.git</span>
<span style="font-weight: 400;">$ cd VCA</span>
<span style="font-weight: 400;">$ mkdir build</span>
<span style="font-weight: 400;">$ cd build</span>
<span style="font-weight: 400;">$ cmake ../</span>
<span style="font-weight: 400;">$ cmake --build .</span></pre>
<p><span style="font-weight: 400;">This will create VCA binaries in the VCA/build/source/apps/ folder.</span></p>
<h3><span style="font-weight: 400;">Command-Line Options</span></h3>
<h4><strong>General</strong></h4>
<p>Displaying Help Text:</p>
<pre><span style="font-weight: 400;">--help, -h</span>
</pre>
<p>Displaying version details:</p>
<pre><span style="font-weight: 400;">--version, -v</span></pre>
<h4><strong>Logging/Statistic Options</strong></h4>
<pre><span style="font-weight: 400;">--complexity-csv &lt;filename&gt;</span></pre>
<p><span style="font-weight: 400;">Write the spatial (E) and temporal complexity (h), epsilon, brightness (L) statistics to a Comma Separated Values log file. Creates the file if it doesn’t already exist. The following statistics are available:</span></p>
<ul>
<li><span style="font-weight: 400;">POC</span><span style="font-weight: 400;"> Picture Order Count &#8211; The display order of the frames</span></li>
<li><span style="font-weight: 400;">E</span><span style="font-weight: 400;"> Spatial complexity of the frame</span></li>
<li><span style="font-weight: 400;">h</span><span style="font-weight: 400;"> Temporal complexity of the frame</span></li>
<li><span style="font-weight: 400;">epsilon</span><span style="font-weight: 400;"> Gradient of the temporal complexity of the frame</span></li>
<li><span style="font-weight: 400;">L</span><span style="font-weight: 400;"> Brightness of the frame</span></li>
</ul>
<p><span style="font-weight: 400;">Unless option:</span><span style="font-weight: 400;">&#8211;no-chroma</span><span style="font-weight: 400;"> is used, the following chroma statistics are also available:</span></p>
<ul>
<li><span style="font-weight: 400;">avgU</span><span style="font-weight: 400;"> Average U chroma component of the frame</span></li>
<li><span style="font-weight: 400;">energyU</span><span style="font-weight: 400;"> Average U chroma texture of the frame</span></li>
<li><span style="font-weight: 400;">avgV</span><span style="font-weight: 400;"> Average V chroma component of the frame</span></li>
<li><span style="font-weight: 400;">energyV</span><span style="font-weight: 400;"> Average V chroma texture of the frame</span></li>
</ul>
<pre><span style="font-weight: 400;">--shot-csv &lt; filename&gt;</span></pre>
<p><span style="font-weight: 400;">Write the shot id, the first POC of every shot to a Comma Separated Values log file. Creates the file if it doesn’t already exist.</span></p>
<pre><span style="font-weight: 400;">--yuvview-stats &lt;filename&gt;</span></pre>
<p><span style="font-weight: 400;">Write the per block results (L, E, h) to a stats file that can be visualized using YUView.</span></p>
<h4><strong>Performance Options</strong></h4>
<pre><span style="font-weight: 400;">--no-chroma</span></pre>
<p><span style="font-weight: 400;">Disable analysis of chroma planes (which is enabled by default).</span></p>
<pre><span style="font-weight: 400;">--no-simd</span>
</pre>
<p><span style="font-weight: 400;">The Video Complexity Analyzer will use all detected CPU SIMD architectures by default. This will disable that detection.</span></p>
<pre><span style="font-weight: 400;">--threads &lt;integer&gt;</span></pre>
<p><span style="font-weight: 400;">Specify the number of threads to use. Default: 0 (autodetect).</span></p>
<h4><strong>Input/Output</strong></h4>
<pre><span style="font-weight: 400;">--input &lt;filename&gt;</span></pre>
<p><span style="font-weight: 400;">Input filename. Raw YUV or Y4M supported. Use </span><span style="font-weight: 400;">stdin</span><span style="font-weight: 400;"> for stdin. For example piping input from ffmpeg works like this:</span></p>
<pre><span style="font-weight: 400;">ffmpeg.exe -i Sintel.2010.1080p.mkv -f yuv4mpegpipe - | vca.exe --input stdin</span></pre>
<pre><span style="font-weight: 400;">--y4m</span>
</pre>
<p><span style="font-weight: 400;">Parse input stream as YUV4MPEG2 regardless of file extension. Primarily intended for use with stdin. This option is implied if the input filename has a “.y4m” extension</span></p>
<pre><span style="font-weight: 400;">--input-depth &lt;integer&gt;</span>
</pre>
<p><span style="font-weight: 400;">Bit-depth of input file or stream. Any value between 8 and 16. Default is 8. For Y4M files, this is read from the Y4M header.</span></p>
<pre><span style="font-weight: 400;">--input-res &lt;wxh&gt;</span>
</pre>
<p><span style="font-weight: 400;">Source picture size [w x h]. For Y4M files, this is read from the Y4M header.</span></p>
<pre><span style="font-weight: 400;">--input-csp &lt;integer or string&gt;</span>
</pre>
<p><span style="font-weight: 400;">Chroma Subsampling. 4:0:0(monochrome), 4:2:0, 4:2:2, and 4:4:4 are supported. For Y4M files, this is read from the Y4M header.</span></p>
<pre><span style="font-weight: 400;">--input-fps &lt;double&gt;</span>
</pre>
<p><span style="font-weight: 400;">The framerate of the input. For Y4M files, this is read from the Y4M header.</span></p>
<pre><span style="font-weight: 400;">--skip &lt;integer&gt;</span>
</pre>
<p><span style="font-weight: 400;">Number of frames to skip at start of input file. Default 0.</span></p>
<pre><span style="font-weight: 400;">--frames, -f &lt;integer&gt;</span>
</pre>
<p><span style="font-weight: 400;">Number of frames of input sequence to be analyzed. Default 0 (all).</span></p>
<h4><strong>Analyzer Configuration</strong></h4>
<pre><span style="font-weight: 400;">--block-size &lt;8/16/32&gt;</span>
</pre>
<p><span style="font-weight: 400;">Size of the non-overlapping blocks used to determine the E, h features. Default: 32.</span></p>
<pre><span style="font-weight: 400;">--min-thresh &lt;double&gt;</span>
</pre>
<p><span style="font-weight: 400;">Minimum threshold of epsilon for shot detection.</span></p>
<pre><span style="font-weight: 400;">--max-thresh &lt;double&gt;</span>
</pre>
<p><span style="font-weight: 400;">Maximum threshold of epsilon for shot detection.</span></p>
<h2><span style="font-weight: 400;">Using the VCA API</span></h2>
<p><span style="font-weight: 400;">VCA is written primarily in C++ and x86 assembly language. This API is wholly defined within :file: </span><span style="font-weight: 400;">vcaLib.h</span><span style="font-weight: 400;"> in the source/lib/ folder of our source tree. All of the functions and variables and enumerations meant to be used by the end-user are present in this header.</span></p>
<pre><span style="font-weight: 400;">vca_analyzer_open(vca_param param)</span>
</pre>
<p><span style="font-weight: 400;">Create a new analyzer handler, all parameters from vca_param are copied. The returned pointer is then passed to all of the functions pertaining to this analyzer. Since </span><span style="font-weight: 400;">vca_param</span><span style="font-weight: 400;"> is copied internally, the user may release their copy after allocating the analyzer. Changes made to their copy of the param structure have no affect on the analyzer after it has been allocated.</span></p>
<pre><span style="font-weight: 400;">vca_result vca_analyzer_push(vca_analyzer *enc, vca_frame *frame)</span></pre>
<p><span style="font-weight: 400;">Push a frame to the analyzer and start the analysis. Note that only the pointers will be copied but no ownership of the memory is transferred to the library. The caller must make sure that the pointers are valid until the frame was analyzed. Once a results for a frame was pulled the library will not use pointers anymore. This may block until there is a slot available to work on. The number of frames that will be processed in parallel can be set using nrFrameThreads.</span></p>
<pre><span style="font-weight: 400;">bool vca_result_available(vca_analyzer *enc)</span>
</pre>
<p><span style="font-weight: 400;">Check if a result is available to pull.</span></p>
<pre><span style="font-weight: 400;">vca_result vca_analyzer_pull_frame_result(vca_analyzer *enc, vca_frame_results *result)</span></pre>
<p><span style="font-weight: 400;">Pull a result from the analyzer. This may block until a result is available. Use </span><span style="font-weight: 400;">vca_result_available()</span><span style="font-weight: 400;"> if you want to only check if a result is ready.</span></p>
<pre><span style="font-weight: 400;">void vca_analyzer_close(vca_analyzer *enc)</span>
</pre>
<p><span style="font-weight: 400;">Finally, the analyzer must be closed in order to free all of its resources. An analyzer that has been flushed cannot be restarted and reused. Once </span><span style="font-weight: 400;">vca_analyzer_close()</span><span style="font-weight: 400;"> has been called, the analyzer handle must be discarded.</span><br />
Try out the video complexity analyzer for yourself, amongst other exciting innovations both at <a href="https://athena.itec.aau.at/" rel="nofollow noopener" target="_blank">https://athena.itec.aau.at/</a> and <a href="https://bitmovin.com">bitmovin.com</a></p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/video-complexity-analyzer-vca">Efficiently Predicting Quality with ATHENA&#8217;s Video Complexity Analyzer (VCA) project</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming</title>
		<link>https://bitmovin.com/multi-encoding-has</link>
		
		<dc:creator><![CDATA[Christian Timmerer]]></dc:creator>
		<pubDate>Thu, 23 Dec 2021 16:25:36 +0000</pubDate>
				<category><![CDATA[Innovation]]></category>
		<category><![CDATA[athena]]></category>
		<category><![CDATA[video encoding]]></category>
		<guid isPermaLink="false">https://bitmovin.com/?p=209148</guid>

					<description><![CDATA[<p>The Future of HTTP Adaptive Streaming (HAS) According to multiple reports, video viewing will account for as much as 82% of all internet traffic by the end of 2022, as such the popularity of HTTP Adaptive Streaming (HAS) is steadily increasing to efficiently support modern demand. Furthermore, improvements in video characteristics such as frame rate,...</p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/multi-encoding-has">Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2><b>The Future of HTTP Adaptive Streaming (HAS)</b></h2>
<p><span style="font-weight: 400;">According to multiple reports, video viewing will account for as much as <a href="https://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=144177" rel="nofollow noopener" target="_blank">82% of all internet traffic by the end of 2022</a>, as such the popularity of <a href="https://bitmovin.com/adaptive-streaming/">HTTP Adaptive Streaming (HAS)</a> is steadily increasing to efficiently support modern demand. Furthermore, improvements in video characteristics such as frame rate, resolution, and bit depth raise the need to develop a large-scale, highly efficient video encoding environment. This is even more crucial for DASH-based content provisioning as it requires encoding multiple representations of the same video content. Each video is encoded at multiple bitrates and spatial resolutions (i.e., representations) to adapt to the heterogeneity of network conditions, device characteristics, and end-user preferences as shown in Figure 1. However, encoding the same content at multiple representations requires substantial resources and costs for content providers.</span></p>
<figure id="attachment_209151" aria-describedby="caption-attachment-209151" style="width: 300px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-medium wp-image-209151 zoooom" src="https://bitmovin.com/wp-content/uploads/2021/12/A-systematic-representation-of-encoding-scheme-in-HTTP-Adaptive-Streaming-HAS_decision-tree-300x161.png" alt="A systematic representation of encoding scheme in HTTP Adaptive Streaming (HAS)_decision tree" width="300" height="161" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/12/A-systematic-representation-of-encoding-scheme-in-HTTP-Adaptive-Streaming-HAS_decision-tree-300x161.png?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/12/A-systematic-representation-of-encoding-scheme-in-HTTP-Adaptive-Streaming-HAS_decision-tree.jpg?lossy=2&amp;strip=1&amp;webp=1 512w" sizes="(max-width: 300px) 100vw, 300px" /><figcaption id="caption-attachment-209151" class="wp-caption-text">Figure 1: A systematic representation of encoding scheme in HAS</figcaption></figure>
<p><span style="font-weight: 400;">As seen in Figure 2</span><span style="font-weight: 400;">, as resolution doubles, encoding time complexity also doubles! To address this challenge, we must employ multi-encoding schemes</span><span style="font-weight: 400;">&nbsp;to accelerate the encoding process of multiple representations without impacting quality. This is achieved by exploiting a high correlation of encoder analysis decisions (like block partitioning and prediction mode decisions) across multiple representations.</span></p>
<figure id="attachment_209152" aria-describedby="caption-attachment-209152" style="width: 300px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-medium wp-image-209152 zoooom" src="https://bitmovin.com/wp-content/uploads/2021/12/Relative-time-complexity-of-encoding-representations-in-x265-HEVC-encoding_Bar-Chart-300x90.png" alt="Relative time complexity of encoding representations in x265 HEVC encoding_Bar Chart" width="300" height="90" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/12/Relative-time-complexity-of-encoding-representations-in-x265-HEVC-encoding_Bar-Chart-300x90.png?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/12/Relative-time-complexity-of-encoding-representations-in-x265-HEVC-encoding_Bar-Chart.jpg?lossy=2&amp;strip=1&amp;webp=1 512w" sizes="(max-width: 300px) 100vw, 300px" /><figcaption id="caption-attachment-209152" class="wp-caption-text">Figure 2: Relative time complexity of encoding representations in x265 HEVC encoding.</figcaption></figure>
<h2>What is Multi-Encoding?</h2>
<p><span style="font-weight: 400;">To encode multiple renditions of the same video at multiple bitrates and resolution, we reuse encoder analysis information across various renditions. This is due to the fact that there is a strong correlation of encoder decisions across various bitrate and resolution renditions. The scheme of sharing analysis information across multiple bitrates within a resolution is termed “multi-rate encoding” while sharing the information across multiple resolutions is termed as “multi-resolution encoding”. &#8220;Multi-encoding&#8221; is a generalized term that combines both <a href="https://bitmovin.com/multi-rate-encoding-fares-ml/">multi-rate</a> and multi-resolution encoding schemes.</span></p>
<h3>Proposed Heuristics:</h3>
<p><span style="font-weight: 400;">To aid the encoding process of the dependent renditions in HEVC, the ATHENA Labs research team proposes a few new encoder decision heuristics, Prediction Mode and Motion Estimation:</span><br />
<i><span style="font-weight: 400;">Prediction Mode Heuristics:</span></i><br />
<span style="font-weight: 400;">Prediction Mode heuristics are those where the selected Coding Unit (CU) size for the dependent renditions is the same as the reference representation &#8211; this can be further broken down into the following modes:</span></p>
<ol>
<li><span style="font-weight: 400;"> If the <a href="https://www.researchgate.net/publication/276133857_Early_Skip_Mode_Decision_for_HEVC_Encoder_with_Emphasis_on_Coding_Quality" rel="nofollow noopener" target="_blank">SKIP mode</a> was chosen in the highest bitrate rendition, rate-distortion optimization is evaluated for only MERGE/SKIP modes.</span></li>
<li><span style="font-weight: 400;"> If the 2Nx2N mode was chosen in the highest bitrate rendition, RDO is skipped for AMP modes.</span></li>
<li><span style="font-weight: 400;"> If the inter-prediction mode was chosen in the highest bitrate rendition, RDO is skipped for intra-prediction modes.</span></li>
<li><span style="font-weight: 400;"> If the intra-prediction mode was chosen for the highest and lowest bitrate rendition, RDO is evaluated for only intra-prediction modes in the intermediate renditions.</span></li>
</ol>
<p><i><span style="font-weight: 400;">Motion Estimation Heuristics:</span></i><br />
<span style="font-weight: 400;">Motion Estimation heuristics are those where the CU size and PU selected for the dependent representations are the same as the reference representation:</span></p>
<ol>
<li><span style="font-weight: 400;"> The same reference frame is forced as that of the highest bitrate rendition.</span></li>
<li><span style="font-weight: 400;"> The Motion Vector Predictor (MVP) is set to be the Motion Vector (MV) of the highest bitrate rendition.</span></li>
<li><span style="font-weight: 400;"> The motion search range is decreased to a smaller window if the MVs of the highest and the lowest bitrate renditions are close to each other.</span></li>
</ol>
<p><span style="font-weight: 400;">Based on the above-mentioned heuristics, two multi-encoding schemes are proposed.</span></p>
<h3><b>Proposed Multi-encoding Schemes</b></h3>
<p><span style="font-weight: 400;">In our first proposed <a href="https://athena.itec.aau.at/2021/04/efficient-multi-encoding-algorithms-for-http-adaptive-bitrate-streaming/" rel="nofollow noopener" target="_blank">multi-encoding</a> approach we perform the following steps:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The first resolution tier (i.e, 540p in our example) is encoded using the combination of double-bound for CU depth estimation (c.f. Previous blog post), </span><i><span style="font-weight: 400;">Prediction Mode Heuristics, </span></i><span style="font-weight: 400;">and</span><i><span style="font-weight: 400;"> Motion Estimation Heuristics</span></i><span style="font-weight: 400;">.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The CU depth from the highest bit rate representation of the first resolution tier (i.e., 540p) is shared with the highest bit rate representation of the next resolution tier (i.e., 1080p in our example). In particular, the information is used as a lower bound, i.e., the CU is forced to split if the current encode depth is lower than the reference encode CU depth. The remaining bitrate representations of this resolution tier are encoded using the multi-rate scheme as used in Step 1.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Repeat Step 2 for the remaining resolution tiers in ascending order with respect to the resolution until no more resolution tiers are left (i.e., only for 2160p in our example).</span></li>
</ol>
<p><span style="font-weight: 400;">Figure 3: An example of the first proposed multi-encoding scheme.</span></p>
<figure id="attachment_209153" aria-describedby="caption-attachment-209153" style="width: 300px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-medium wp-image-209153 zoooom" src="https://bitmovin.com/wp-content/uploads/2021/12/Multi-Encoding-Scheme-Proposal-1_Decision-Tree-2-300x255.png" alt="Multi-Encoding Scheme Proposal 1_Decision Tree" width="300" height="255"><figcaption id="caption-attachment-209153" class="wp-caption-text">Figure 3: An example of the first proposed multi-encoding scheme.</figcaption></figure>
<p><span style="font-weight: 400;">The second proposed multi-encoding scheme is a minor variation of the first scheme which aims to extend the double-bound for CU depth estimation scheme across resolution tiers. It is performed in the following steps:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The first resolution tier (i.e., 540p in our example) is encoded using the combination of double-bound for CU depth estimation, </span><i><span style="font-weight: 400;">Prediction Mode Heuristics, </span></i><span style="font-weight: 400;">and</span><i><span style="font-weight: 400;"> Motion Estimation Heuristics</span></i><span style="font-weight: 400;">.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The CU depth from the lowest bit rate representation of the first resolution tier (i.e., 540p) is shared with the highest bit rate representation of the next resolution tier (i.e., 1080p in our example). In particular, the information is used as a lower bound, i.e., the CU is forced to split if the current encode depth is lower than the reference encode CU depth.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The scaled CU depth from the lowest bit rate representation of the previous resolution tier (i.e., 540p) and CU depth information from the highest bit rate representation of the current resolution tier are shared with the lowest bit rate representation of the current resolution tier (i.e., 1080p in our example) and are used as the lower bound and upper bound respectively for CU depth search.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The remaining bit rate representations of this resolution tier (i.e., 1080p) are encoded using the multi-rate scheme as used in Step 1.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Repeat Step 2 for the remaining resolution tiers in ascending order with respect to the resolution until no more resolution tiers are left (i.e., only for 2160p in our example).</span></li>
</ol>
<figure id="attachment_209154" aria-describedby="caption-attachment-209154" style="width: 300px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-medium wp-image-209154 zoooom" src="https://bitmovin.com/wp-content/uploads/2021/12/Multi-Encoding-Scheme-Proposal-2_Decision-Tree-1-300x243.png" alt="Multi-Encoding Scheme Proposal 2_Decision Tree" width="300" height="243"><figcaption id="caption-attachment-209154" class="wp-caption-text">Figure 4: An example of the second proposed multi-encoding scheme.</figcaption></figure>
<h2><b>Results</b></h2>
<p><span style="font-weight: 400;">It is observed that the state-of-the-art scheme yields the highest average encoding time-saving, i.e., 80.05%, but it comes with a bitrate increase of 13.53% and 9.59% to maintain the same&nbsp; PSNR and VMAF respectively as compared to the stand-alone encodings. The first proposed multi-encoding scheme has the lowest increase in bitrate to maintain the same&nbsp; PSNR and VMAF (2.32% and 1.55%) respectively as compared to the stand-alone encodings.&nbsp; The second proposed multi-encoding scheme improves the encoding time savings of the first proposed multi-encoding scheme by 11% with a negligible increase in bitrate to maintain the same PSNR and VMAF. This result is shown in Table 1, where Delta T represents the overall encoding time-savings compared to the stand-alone encodings, BDR_P and BDR_V refer to the average difference in bitrate with respect to stand-alone encodings to maintain the same PSNR and VMAF, respectively.</span></p>
<figure id="attachment_209155" aria-describedby="caption-attachment-209155" style="width: 300px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-medium wp-image-209155 zoooom" src="https://bitmovin.com/wp-content/uploads/2021/12/Quality-Results-of-the-Proposed-multi-encoding-schemes_Chart-1-300x73.png" alt="Quality Results of the Proposed multi-encoding schemes_Chart" width="300" height="73"><figcaption id="caption-attachment-209155" class="wp-caption-text">Results of the proposed multi-encoding schemes</figcaption></figure>
<p>View the full multi-encoding research paper from <a href="https://athena.itec.aau.at/2021/04/efficient-multi-encoding-algorithms-for-http-adaptive-bitrate-streaming/" rel="nofollow noopener" target="_blank">ATHENA here</a>.<br />
If you liked this article, check out some of our other great ATHENA content at the following links:</p>
<ul>
<li aria-level="1"><a href="https://bitmovin.com/scalable-light-field-coding/">Scalable Light Field Coding – Improving the Quality of Experience (QoE) | Bitmovin x ATHENA Labs</a></li>
<li aria-level="1"><a href="https://bitmovin.com/multicast-live-video-streaming-oscar/">Replacing the Multicast Live Video Streaming Approach with OSCAR | Bitmovin x ATHENA Labs</a></li>
<li aria-level="1"><a href="https://bitmovin.com/multi-rate-encoding-fares-ml/">Multi-Rate Encoding for HTTP Adaptive Streaming | Bitmovin x ATHENA Labs</a></li>
</ul>
<p>The post <a rel="nofollow" href="https://bitmovin.com/multi-encoding-has">Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>ATHENA Lab: Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML)</title>
		<link>https://bitmovin.com/multi-rate-encoding-fares-ml</link>
		
		<dc:creator><![CDATA[Christian Timmerer]]></dc:creator>
		<pubDate>Wed, 14 Jul 2021 11:49:13 +0000</pubDate>
				<category><![CDATA[Innovation]]></category>
		<category><![CDATA[athena lab]]></category>
		<category><![CDATA[video encoding]]></category>
		<guid isPermaLink="false">https://bitmovin.com/?p=179400</guid>

					<description><![CDATA[<p>The heterogeneity of the devices on the Internet and the difference among the network conditions of the users make designing a video delivery tool that can adapt to all these differences while maximizing the quality of experience (QoE) for each user a tricky problem. HTTP Adaptive Streaming (HAS) is the de-facto solution for video delivery...</p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/multi-rate-encoding-fares-ml">ATHENA Lab: Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML)</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><img loading="lazy" decoding="async" class="aligncenter size-large wp-image-179408" src="https://bitmovin.com/wp-content/uploads/2021/07/BLOG-POST_FaRes-ML-1024x537.png" alt="- Bitmovin" width="1024" height="537" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/07/BLOG-POST_FaRes-ML-300x157.png?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/07/BLOG-POST_FaRes-ML.png?size=384x201&amp;lossy=2&amp;strip=1&amp;webp=1 384w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/07/BLOG-POST_FaRes-ML-768x402.png?lossy=2&amp;strip=1&amp;webp=1 768w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/07/BLOG-POST_FaRes-ML-1024x537.png?lossy=2&amp;strip=1&amp;webp=1 1024w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/07/BLOG-POST_FaRes-ML.png?lossy=2&amp;strip=1&amp;webp=1 1080w" sizes="(max-width: 1024px) 100vw, 1024px" /><br />
<span style="font-weight: 400;">The heterogeneity of the devices on the Internet and the difference among the network conditions of the users make designing a video delivery tool that can adapt to all these differences while maximizing the quality of experience (QoE) for each user a tricky problem. </span><a href="https://bitmovin.com/adaptive-streaming/"><span style="font-weight: 400;">HTTP Adaptive Streaming</span></a><span style="font-weight: 400;"> (HAS) is the de-facto solution for video delivery over the Internet. In HAS, multiple representations are stored for each video, with each representation having a different quality level and/or resolution. This way, HAS streaming sessions can alternate between different quality options based on the network and viewing conditions while delivering the content. However, the requirement to store multiple representations for a single video in HAS brings additional encoding challenges since the source video needs to be encoded efficiently at multiple bitrates and resolutions. Multi-Rate encoding aims to tackle this problem.&nbsp;</span><br />
<span style="font-weight: 400;">This blog post introduces our new approach to multi-rate encoding, called FaRes-ML, </span><b>Fa</b><span style="font-weight: 400;">st Multi-</span><b>Res</b><span style="font-weight: 400;">olution and Multi-Rate Encoding for HTTP Adaptive Streaming Using </span><b>M</b><span style="font-weight: 400;">achine</span><b> L</b><span style="font-weight: 400;">earning (FaRes-ML). But first&#8230;</span></p>
<h2><span style="font-weight: 400;">What is Multi-Rate Encoding?</span></h2>
<p><span style="font-weight: 400;">In multi-rate encoding, a single source video needs to be encoded at multiple bitrates and resolutions in order to provide a suitable representation for a variety of network and viewing conditions. The quality level of the encoded video is controlled by the quantization parameter (QP) in the encoder. An example multi-rate encoding scheme is given in Fig.1.</span></p>
<figure id="attachment_179405" aria-describedby="caption-attachment-179405" style="width: 800px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-179405" src="https://bitmovin.com/wp-content/uploads/2021/07/MultiRate.gif" alt="Multi-Rate Encoding workflow_animated gif" width="800" height="450"><figcaption id="caption-attachment-179405" class="wp-caption-text">Multi-Rate Encoding workflow</figcaption></figure>
<p><span style="font-weight: 400;">This is a computationally expensive process due to the high data size of videos and the high complexity of video codecs. However, since all of these representations consist of the same content, there is a nice amount of redundancy. Multi-rate encoding approaches exploit this redundancy to speed up the encoding process.</span><br />
<span style="font-weight: 400;">In multi-rate encoding, a representation is chosen as the </span><i><span style="font-weight: 400;">reference representation </span></i><span style="font-weight: 400;">(usually the highest [1] or the lowest quality [2] representation),</span> <span style="font-weight: 400;">and its information is used to speed up the remaining </span><i><span style="font-weight: 400;">dependent</span></i><span style="font-weight: 400;"> representations. Since block partitioning is one of the most time-consuming processes in the encoding pipeline, a majority of the multi-rate encoding approach focuses on speeding up this portion of the process.</span><br />
<span style="font-weight: 400;">In block partitioning, each frame is divided into smaller pieces called </span><b><i>blocks</i></b><span style="font-weight: 400;"> to achieve more precise motion compensation. Smaller block sizes are used for motion intense areas while larger block sizes are used for stationary areas.&nbsp;</span><br />
<a href="https://bitmovin.com/developer-network/lesson-1-9-high-efficiency-video-coding-hevc"><i><span style="font-weight: 400;">High-Efficiency Video Coding</span></i></a><span style="font-weight: 400;"> (HEVC) standard uses a Coding Tree Unit (CTU) for block partitioning. By default, each CTU covers a 64&#215;64 pixels-sized square region and each CTU can be divided recursively up to three times with the smallest block size being 8&#215;8 pixels. Each split operation increases the depth level by 1 (</span><i><span style="font-weight: 400;">i.e. </span></i><b><i>depth 0</i></b><span style="font-weight: 400;"> for </span><b>64&#215;64</b><span style="font-weight: 400;"> pixels and </span><b><i>depth 3</i></b><span style="font-weight: 400;"> for </span><b>8&#215;8 </b><span style="font-weight: 400;">pixels). An example of block partitioning for a frame is illustrated in Fig.2.</span></p>
<figure id="attachment_179410" aria-describedby="caption-attachment-179410" style="width: 800px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-179410" src="https://bitmovin.com/wp-content/uploads/2021/07/Block-partioning-in-Multi-rate-Encoding_animated-gif-example.gif" alt="Block partioning in Multi-rate Encoding_animated gif example" width="800" height="450"><figcaption id="caption-attachment-179410" class="wp-caption-text">Block partitioning using a CTU</figcaption></figure>
<h2><span style="font-weight: 400;">Introducing the FaRes-ML</span></h2>
<p><span style="font-weight: 400;">FaRes-ML uses </span><a href="https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Convolutional Neural Networks</span></a><span style="font-weight: 400;"> (CNNs) to predict the CTU split decision for the dependent representations. The highest quality representation from the lowest resolution is chosen as the reference representation. The reference representation is selected from the lowest resolution to speed up the parallel encoding performance since, in parallel encoding, the highest complexity representation bounds the overall encoding time. Thus choosing the reference from a low resolution can increase the parallel encoding performance.&nbsp;</span><br />
<span style="font-weight: 400;">The encoding process in FaRes-ML consists of three main steps:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The reference representation is encoded with the HEVC reference encoder. Then, the encoding information obtained is stored to be used while encoding the dependent representations.&nbsp;</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Once the encoding information is obtained, the pixel values from the source video in corresponding resolution and the encoding information from the reference representation are fed into the CNN for the given quality level and resolution.&nbsp;</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The output from the CNN is the split decision for the given depth level. This decision is used to speed up the encoding of the dependent representation.</span></li>
</ol>
<p><span style="font-weight: 400;">The overall encoding scheme of FaRes-ML is given in Fig.3.</span></p>
<figure id="attachment_179406" aria-describedby="caption-attachment-179406" style="width: 800px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-179406" src="https://bitmovin.com/wp-content/uploads/2021/07/FaRes.gif" alt="Fast Multi-rate encoding scheme_animated workflow" width="800" height="450"><figcaption id="caption-attachment-179406" class="wp-caption-text">FaRes-ML Encoding Scheme Workflow</figcaption></figure>
<p><span style="font-weight: 400;">To measure the encoding performance of the FaRes-ML approach, we compared the results to the HEVC reference software (</span><a href="https://vcgit.hhi.fraunhofer.de/jvet/HM/-/tags/HM-16.21" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">HM 16.21</span></a><span style="font-weight: 400;">) and the lower bound approach [3]. FaRes-ML achieves&nbsp; </span><b>27.71 %</b><span style="font-weight: 400;"> time saving for the </span><b>parallel</b><span style="font-weight: 400;"> encoding and </span><b>46.27%</b><span style="font-weight: 400;"> for the </span><b>overall</b><span style="font-weight: 400;"> encoding while maintaining a minimal bitrate increase (</span><b>2.05 %</b><span style="font-weight: 400;">).</span> <span style="font-weight: 400;">The resulting normalized encoding time graph is given in Fig.4.</span></p>
<figure id="attachment_179404" aria-describedby="caption-attachment-179404" style="width: 512px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-179404" src="https://bitmovin.com/wp-content/uploads/2021/07/Fast-Multi-Rate-Encoding-efficiency-comparison_FaRes-ML-vs-Lower-Bound-vs-HEVC_Bar-Graph.jpg" alt="Fast Multi-Rate Encoding efficiency comparison_FaRes-ML vs Lower Bound vs HEVC_Bar Graph" width="512" height="288" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/07/Fast-Multi-Rate-Encoding-efficiency-comparison_FaRes-ML-vs-Lower-Bound-vs-HEVC_Bar-Graph-300x169.png?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/07/Fast-Multi-Rate-Encoding-efficiency-comparison_FaRes-ML-vs-Lower-Bound-vs-HEVC_Bar-Graph.jpg?size=384x216&amp;lossy=2&amp;strip=1&amp;webp=1 384w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/07/Fast-Multi-Rate-Encoding-efficiency-comparison_FaRes-ML-vs-Lower-Bound-vs-HEVC_Bar-Graph.jpg?lossy=2&amp;strip=1&amp;webp=1 512w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption id="caption-attachment-179404" class="wp-caption-text">Fast Multi-Rate Encoding efficiency comparison vs Lower Bound vs HEVC</figcaption></figure>
<h2><span style="font-weight: 400;">Conclusion</span></h2>
<p><span style="font-weight: 400;">As the quality of content resolution improves to new heights with 4K+ resolutions becoming the norm, organizations and researchers are finding new ways to improve the back-end delivery technologies to match the content to its respective device. One of the latest approaches to improving the speed of encoding is the FaRes-ML method, a machine learning-based approach that handles multiple representations in different qualities and resolutions. By applying CNNs to exploit the redundant information in the multi-rate encoding pipeline, FaRes-ML is capable of speeding up overall encodings by nearly 50% in ATHENA’s early-stage experiments with additional improvement parallel encoding methods, all while maintaining a minimal bitrate increase.&nbsp;</span><br />
<span style="font-weight: 400;">Although the FaRes-ML method has been proven in lab environments for single and parallel encodes, its potential can be extended to cover even more encoding decisions (e.g., reference frame selection) to further improve the encoding performance in the near future. Furthermore, the extension of the proposed method for recent video codecs such as </span><a href="https://bitmovin.com/compression-standards-vvc-2020/"><i><span style="font-weight: 400;">Versatile Video Coding</span></i><span style="font-weight: 400;"> (VVC)</span></a><span style="font-weight: 400;"> can be interesting due to the increased encoding complexity of recent video encoding standards, which would significantly decrease the amount of time organizations that operate a back-end workflow could implement the brand new codec.</span><br />
<span style="font-weight: 400;">The team at ATHENA will work closely with Bitmovin in the coming months to determine how FaRes-ML works in real-world applications. If you’re interested in learning more about the Fast Multi-Resolution and Multi-Rate Encoding approach, you can find the full study published in the </span><i><span style="font-weight: 400;">IEEE Open Journal of Signal Processing </span></i><span style="font-weight: 400;">journal as an open-access article. More information about the full study can be found in the following links:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><a href="https://ieeexplore.ieee.org/document/9427195" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Full paper</span></a><span style="font-weight: 400;"> [PDF]</span></li>
<li style="font-weight: 400;" aria-level="1"><a href="https://athena.itec.aau.at/2021/05/ieee-oj-sp-fast-multi-resolution-and-multi-rate-encoding-for-http-adaptive-streaming-using-machine-learning/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Blog post&nbsp;</span></a></li>
<li style="font-weight: 400;" aria-level="1"><a href="https://athena.itec.aau.at/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">More information about the ATHENA project</span></a></li>
</ul>
<p><span style="font-weight: 400;">If you liked this article, check out some of our other great ATHENA content at the following links:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><a href="https://bitmovin.com/scalable-light-field-coding/">Scalable Light Field Coding &#8211; Improving the Quality of Experience (QoE) | Bitmovin x ATHENA Labs</a></li>
<li aria-level="1"><a href="https://bitmovin.com/multicast-live-video-streaming-oscar/">Replacing the Multicast Live Video Streaming Approach with OSCAR | Bitmovin x ATHENA Labs</a></li>
</ul>
<h2><span style="font-weight: 400;">Sources</span></h2>
<p><span style="font-weight: 400;">[1]&nbsp;D. Schroeder, A. Ilangovan, M. Reisslein, and E. Steinbach, “Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming,” I</span><i><span style="font-weight: 400;">EEE Trans. Circuits Syst. Video Technol.</span></i><span style="font-weight: 400;">, vol. 28, no. 1, pp. 143–157, Jan. 2018.</span><br />
<span style="font-weight: 400;">[2] K. Goswami et al., “Adaptive multi-resolution encoding for ABR streaming,” in </span><i><span style="font-weight: 400;">Proc. 25th IEEE Int. Conf. Image Process.</span></i><span style="font-weight: 400;">, 2018, pp. 1008–1012.</span><br />
<span style="font-weight: 400;">[3] H. Amirpour, E. Çetinkaya, C. Timmerer, and M. Ghanbari, “Fast multi-rate encoding for adaptive HTTP streaming,” in </span><i><span style="font-weight: 400;">Proc. Data Compression Conf.</span></i><span style="font-weight: 400;">, 2020, pp. 358–358..</span></p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/multi-rate-encoding-fares-ml">ATHENA Lab: Fast Multi-Resolution and Multi-Rate Encoding for HTTP Adaptive Streaming Using Machine Learning (FaRes-ML)</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>ATHENA Lab: Improving Viewer Experiences with Scalable Light Field Coding (SLFC)</title>
		<link>https://bitmovin.com/scalable-light-field-coding</link>
		
		<dc:creator><![CDATA[Christian Timmerer]]></dc:creator>
		<pubDate>Tue, 15 Jun 2021 08:58:33 +0000</pubDate>
				<category><![CDATA[Innovation]]></category>
		<category><![CDATA[athena lab]]></category>
		<guid isPermaLink="false">https://bitmovin.com/?p=173882</guid>

					<description><![CDATA[<p>Immersive Viewer Experiences with Light Field Imaging&#160; Light field imaging is a promising technology that will provide a more immersive viewing experience. It enables some post-processing tasks like depth estimation, changing the viewport, refocusing, etc. To this end, a huge amount of data needs to be collected, processed, stored, and transmitted, which leaves the challenging...</p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/scalable-light-field-coding">ATHENA Lab: Improving Viewer Experiences with Scalable Light Field Coding (SLFC)</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2><img loading="lazy" decoding="async" class="aligncenter size-large wp-image-173885" src="https://bitmovin.com/wp-content/uploads/2021/06/BLOG-POST_Scalable-Light-Field-Coding-1-1024x537.png" alt="- Bitmovin" width="1024" height="537"></h2>
<h2><span style="font-weight: 400;">Immersive Viewer Experiences with Light Field Imaging&nbsp;</span></h2>
<p><span style="font-weight: 400;">Light field imaging is a promising technology that will provide a more immersive viewing experience. It enables some post-processing tasks like depth estimation, changing the viewport, refocusing, etc. To this end, a huge amount of data needs to be collected, processed, stored, and transmitted, which leaves the challenging task of compression and transmission of light field images [1]. Unlike conventional photography that integrates the rays from all directions into a pixel,&nbsp; light field imaging collects the rays from all directions resulting in a multiview representation of the scene. An example of a multiview representation of a light field image is shown in Fig 1 below and in </span><a href="https://augmentedperception.github.io/deepviewvideo/simple_viewer/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">an interactive format here</span></a><span style="font-weight: 400;">:</span><br />
<figure id="attachment_174018" aria-describedby="caption-attachment-174018" style="width: 512px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-174018" src="https://bitmovin.com/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Multiview-Representation-of-an-Image_Multiple-Images-1.jpeg" alt="Scalable Light Field Coding_Multiview Representation of an Image_Multiple Images" width="512" height="292"><figcaption id="caption-attachment-174018" class="wp-caption-text">Fig 1. Multiview representation of a light field image. (u,v) represents the view number, and (x,y) denotes pixels inside each view. [2]</figcaption></figure><span style="font-weight: 400;">Light field image coding solutions exploit the high redundancy that exists between multiview of a light field. Pseudo Video Sequence-based (PVS) solutions convert multiview of a light field into a sequence of pictures and encode pseudo videos using an advanced video encoder. This methodology leverages the increasing dependency between views and resulting in decreased redundancy inside multiple views to improve the encoding efficiency of light field compression. In other words, PSV employs a similar method of efficiency optimization as per-title encoding, wherein similar features are identified and carried over from view to view to reduce the reuse of recurring factors. However, as the technology behind PVS solutions develops further, new challenges for other important functionalities of light field coding arise; such as viewport scalability, quality scalability, viewport random access, and uniform quality distribution among viewports.</span><br />
<span style="font-weight: 400;">In this post, we introduce a novel light field coding solution, namely, Scalable Light field Coding (SLFC), which addresses the above-mentioned functionalities in addition to the encoding efficiency.&nbsp;</span></p>
<h2><span style="font-weight: 400;">Functionalities of Light field Coding</span></h2>
<p><span style="font-weight: 400;">Aside from the baseline function of reducing redundancies by collecting and comparing images from multiple views, the complexity of Light field Coding is affected by four key factors:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><b>Viewport scalability:</b><span style="font-weight: 400;"> Unlike conventional 2D displayed images (and media in general), light field image coding solutions require that all views are encoded, transmitted, and decoded to alleviate high dependency between views, thereby enabling more arbitrary views (such as the standard 2D central view). Contrarily,&nbsp; conventional 2D displays </span><i><span style="font-weight: 400;">only</span></i><span style="font-weight: 400;"> display a central view, which by comparison, is a significantly less immersive experience. The scalability limitation of these multi-viewports is that in order to increase the compatibility of Light field Coding solutions with capturing devices, displays, network condition, processing power, and storage capacity, viewports must be grouped into different layers [3] and so that they can be encoded, transmitted, decoded, and displayed one after another, a significantly more complex task than conventional coding.&nbsp;</span></li>
<li style="font-weight: 400;" aria-level="1"><b>Quality scalability</b><span style="font-weight: 400;">: To increase compatibility with the network condition and processing power, light field images can be provided in two (or more) quality levels. With the increasing available bandwidth and/or power, the quality of light field images can be improved by transmitting the remaining layers.</span></li>
<li style="font-weight: 400;" aria-level="1"><b>Viewport random access: </b><span style="font-weight: 400;">To avoid decoding delay, high bandwidth requirement, and huge processing power while navigating between various viewports, random access (the number of views required to access a specific view) to the image views should be considered in light field image coding.&nbsp;</span></li>
<li style="font-weight: 400;" aria-level="1"><b>Uniform quality distribution: </b><span style="font-weight: 400;">To avoid facing quality fluctuation when navigating between viewports, light field image views should have similar qualities at each bitrate.</span></li>
</ul>
<h2><span style="font-weight: 400;">Introducing SLFC: Scalable Light Field Coding</span></h2>
<p><span style="font-weight: 400;">To address the additional complexities that come with standard Light field Coding, we propose the Scalable Light Field Coding (SLFC) solution. The first function that SLFC addresses are the viewport scalability issue by dividing multiviews into seven layers and encoding them for efficiency.&nbsp;</span></p>
<figure id="attachment_174019" aria-describedby="caption-attachment-174019" style="width: 719px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="wp-image-174019" src="https://bitmovin.com/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Seven-Multiview-Encoding-Layers_Graphs-1.jpg" alt="Scalable Light Field Coding_Seven Multiview Encoding Layers_Graphs" width="719" height="118"><figcaption id="caption-attachment-174019" class="wp-caption-text">Fig 2. Seven layers of multiview encoding</figcaption></figure>
<p><span style="font-weight: 400;">In each layer, views represented by red belong to that layer, while gray views belong to the previous layers, and black views belong to the next layers. To provide compatibility with 2D displays, the first layer contains the central view.&nbsp; The second layer contains the four corner views. For the remaining layers, available horizontal and vertical intermediate views are added.</span></p>
<h3><span style="font-weight: 400;">Encoding the views</span></h3>
<p><span style="font-weight: 400;">The process of encoding each layer is a three-step process that’s defined by the horizontal and vertical differences between each layer/view:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Firstly, the central view (the first layer) is independently intra-coded, primarily defined by the red central dot.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The second step takes the views from the second layer and is encoded independently of each other while using the central view as their reference image.&nbsp;</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The remaining layers are made of horizontal and vertical intermediate views of previously encoded views. For example in layer 3, four possible horizontal and vertical intermediate views are added. In each layer (3 to 7), two views from the previously encoded layers are used to synthesize their intermediate view. </span><a href="https://github.com/sniklaus/sepconv-slomo" rel="nofollow noopener" target="_blank"><i><span style="font-weight: 400;">Sepconv</span></i></a><span style="font-weight: 400;"> [4] which has been designed for video interpolation is used for view synthesis. You can see an example of this process in the image below:</span></li>
</ul>
<p>&nbsp;</p>
<figure id="attachment_174017" aria-describedby="caption-attachment-174017" style="width: 720px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="wp-image-174017" src="https://bitmovin.com/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Synthesizing-Encoding-Layers-using-Speconv_Workflow-e1623747051191-1.jpg" alt="Scalable Light Field Coding_Synthesizing Encoding Layers using Speconv_Workflow" width="720" height="509"><figcaption id="caption-attachment-174017" class="wp-caption-text">Fig 3. The most right view in layer 3 is synthesized using the top-right and bottom-right view from layer 2.</figcaption></figure>
<p><span style="font-weight: 400;">In the example above, the layers are synthesized from the top-right and bottom-right views to create the most accurate representation of the multiview approach. As a result, the synthesized view has less residual data compared to the individual top-right and bottom-right views. Therefore,&nbsp; this synthesized view is added to the reference list in the video encoder as a virtual reference frame. All-in-all, four reference views are used for encoding each view in layers 3 to 7: (i) the most central view, (ii, iii) two views that are used for synthesizing the virtual reference frame, (v) and the synthesized view.</span></p>
<h2><span style="font-weight: 400;">Experimental results of Applied SLFC&nbsp;</span></h2>
<p><b>Encoding efficiency:</b><span style="font-weight: 400;"> Encoding efficiency of </span><i><span style="font-weight: 400;">Table </span></i><span style="font-weight: 400;">light field test image [5], compared to <a href="https://jpeg.org/jpegpleno/documentation.html" rel="nofollow noopener" target="_blank">JPEG Pleno anchor</a> [6], <a href="https://ieeexplore.ieee.org/document/8611756" rel="nofollow noopener" target="_blank">WaSP</a> [7], <a href="https://ieeexplore.ieee.org/document/8451684" rel="nofollow noopener" target="_blank">MuLE</a> [8], and <a href="https://ieeexplore.ieee.org/document/7972959" rel="nofollow noopener" target="_blank">PSB</a> [9] is shown in Fig. 4. BD-Rate and BD-PSNR for other test images against the best competitor (PSB) are given in Table. 1.</span></p>
<figure id="attachment_174016" aria-describedby="caption-attachment-174016" style="width: 512px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-174016" src="https://bitmovin.com/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_encoding-efficiency-results-comparison_linear-graph-1.png" alt="Scalable Light Field Coding_encoding efficiency results comparison_linear graph" width="512" height="341"><figcaption id="caption-attachment-174016" class="wp-caption-text">Fig. 4: RD-curves for the Table test image.</figcaption></figure>
<p>&nbsp;</p>
<figure id="attachment_174015" aria-describedby="caption-attachment-174015" style="width: 512px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-174015" src="https://bitmovin.com/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_BD-Rate-BD-PSNR-of-SLFC_Table-1.png" alt="Scalable Light Field Coding_BD Rate &amp; BD PSNR of SLFC vs PSB_Table" width="512" height="103"><figcaption id="caption-attachment-174015" class="wp-caption-text">SLFC vs PSB BD-Rate %</figcaption></figure>
<p><b>Scalability:</b><span style="font-weight: 400;"> the number of views inside each layer, and allocated bitrate to each layer at bpp=0.75 for different layers are shown in Fig. 5.</span></p>
<figure id="attachment_174014" aria-describedby="caption-attachment-174014" style="width: 512px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-174014" src="https://bitmovin.com/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Number-of-Views-with-each-Encoding-layer_Bar-Chart.jpg" alt="Scalable Light Field Coding_Number of Views with each Encoding layer_Bar Chart" width="512" height="238" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Number-of-Views-with-each-Encoding-layer_Bar-Chart-300x139.jpg?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Number-of-Views-with-each-Encoding-layer_Bar-Chart.jpg?size=384x179&amp;lossy=2&amp;strip=1&amp;webp=1 384w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Number-of-Views-with-each-Encoding-layer_Bar-Chart.jpg?lossy=2&amp;strip=1&amp;webp=1 512w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption id="caption-attachment-174014" class="wp-caption-text">Fig. 5: (left) the number of views inside each layer, (right) allocated bitrate to each layer at bpp=0.75 for different layers.</figcaption></figure>
<p><b>Random Access: </b><span style="font-weight: 400;">The required bitrate to access each view at bpp = 0.75 for </span><i><span style="font-weight: 400;">Table</span></i><span style="font-weight: 400;"> test image is shown in Fig. 6.</span></p>
<figure id="attachment_174013" aria-describedby="caption-attachment-174013" style="width: 512px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-174013" src="https://bitmovin.com/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Required-Bitrate-for-each-view_Light-Grapg.jpg" alt="Scalable Light Field Coding_Required Bitrate for each view_Light Graph" width="512" height="427" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Required-Bitrate-for-each-view_Light-Grapg-300x250.png?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Required-Bitrate-for-each-view_Light-Grapg.jpg?size=384x320&amp;lossy=2&amp;strip=1&amp;webp=1 384w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_Required-Bitrate-for-each-view_Light-Grapg.jpg?lossy=2&amp;strip=1&amp;webp=1 512w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption id="caption-attachment-174013" class="wp-caption-text">Fig. 6: The required bitrate to access each view at bpp=0.75.</figcaption></figure>
<p><b>ScalabQuality Scalability: </b><span style="font-weight: 400;">The synthesized view is considered as quality layer 1 and </span><span style="font-weight: 400;">utilizing the synthesized view for inter-coding results in quality layer 2.</span><br />
<b>Quality Distribution: </b><span style="font-weight: 400;">PSNR heatmap plot for </span><i><span style="font-weight: 400;">Table</span></i><span style="font-weight: 400;"> light field images bpp = 0.005 is shown in Fig. 7.</span></p>
<figure id="attachment_174012" aria-describedby="caption-attachment-174012" style="width: 512px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-full wp-image-174012" src="https://bitmovin.com/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_PSNR-Heatmap-by-ImageView_Heatmap.jpg" alt="Scalable Light Field Coding_PSNR Heatmap by Image/View_Heatmap" width="512" height="427" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_PSNR-Heatmap-by-ImageView_Heatmap-300x250.png?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_PSNR-Heatmap-by-ImageView_Heatmap.jpg?size=384x320&amp;lossy=2&amp;strip=1&amp;webp=1 384w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/06/Scalable-Light-Field-Coding_PSNR-Heatmap-by-ImageView_Heatmap.jpg?lossy=2&amp;strip=1&amp;webp=1 512w" sizes="(max-width: 512px) 100vw, 512px" /><figcaption id="caption-attachment-174012" class="wp-caption-text">Fig. 7: PSNR Heatmap plot for Table light field test image at bpp = 0.005.</figcaption></figure>
<h2><span style="font-weight: 400;">Conclusion</span></h2>
<p><span style="font-weight: 400;">The study of Scalable Light Field Coding (SLFC) was enacted in an attempt to optimize the process of “standard” light field coding by improving the applied compression. Our methodology added multiple critical compression features, such as viewport scalability (how many views are delivered), quality scalability, random access, and uniform quality distribution (wherein there are very few differences in quality between different views). The results of our research were that the SFLC method improves the quality of experience (QoE) for multiview content by a significant margin. In the future, applying SLFC to video and image workflows will help create a more immersive and higher-quality VR/AR experience. Conceivably allowing consumers to truly feel like they are within the environment that they are simulating.</span><br />
<a href="https://athena.itec.aau.at/2020/12/dcc21-slfc-scalable-light-field-coding/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Check out our full study and more at the following link here</span></a></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">[PDF] </span><a href="https://ieeexplore.ieee.org/document/9418753" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Full study</span></a></li>
<li style="font-weight: 400;" aria-level="1"><a href="https://www.slideshare.net/christian.timmerer/slfc-scalable-light-field-coding" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Slidedeck</span></a></li>
<li style="font-weight: 400;" aria-level="1"><a href="https://www.youtube.com/watch?v=21QJKmvRxZM" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Presentation</span></a></li>
</ul>
<h2>Sources:</h2>
<p><span style="font-weight: 400;">[1] C. Conti, L. D. Soares, and P. Nunes, &#8220;Dense Light Field Coding: A Survey,&#8221; in </span><i><span style="font-weight: 400;">IEEE Access</span></i><span style="font-weight: 400;">, vol. 8, pp. 49244-49284, 2020, DOI: 10.1109/ACCESS.2020.2977767.</span><br />
<span style="font-weight: 400;">[2] G. Wu </span><i><span style="font-weight: 400;">et al</span></i><span style="font-weight: 400;">., &#8220;Light Field Image Processing: An Overview,&#8221; in </span><i><span style="font-weight: 400;">IEEE Journal of Selected Topics in Signal Processing</span></i><span style="font-weight: 400;">, vol. 11, no. 7, pp. 926-954, Oct. 2017, DOI: 10.1109/JSTSP.2017.2747126.</span><br />
<span style="font-weight: 400;">[3] Ricardo Jorge Santos Monteiro, “Scalable light field representation and coding,” 2020.</span><br />
<span style="font-weight: 400;">[4] S. Niklaus, L. Mai, and F. Liu, &#8220;Video Frame Interpolation via Adaptive Separable Convolution,&#8221; </span><i><span style="font-weight: 400;">2017 IEEE International Conference on Computer Vision (ICCV)</span></i><span style="font-weight: 400;">, Venice, 2017, pp. 261-270, DOI: 10.1109/ICCV.2017.37.</span><br />
<span style="font-weight: 400;">[5] Katrin Honauer, Ole Johannsen, Daniel Kondermann, and Bastian Goldluecke, “A dataset and evaluation methodology for depth estimation on 4D light fields,” in Computer Vision – ACCV 2016, Shang-Hong Lai, Vincent Lepetit, Ko Nishino, and Yoichi Sato, Eds., Cham, 2017, pp. 19–34, Springer International Publishing.</span><br />
<span style="font-weight: 400;">[6] F Pereira, C Pagliari, EAB da Silva, I Tabus, H Amirpour, M Bernardo, and A Pinheiro, “JPEG pleno light field coding common test conditions v3. 2,” Doc. ISO/IEC JTC, vol. 1.</span><br />
<span style="font-weight: 400;">[7] P. Astola and I. Tabus, “Wasp: Hierarchical warping, merging, and sparse prediction for light field image compression,” in The 7th European Workshop on Visual Information Processing (EUVIP), Oct 2018, pp. 435–439.</span><br />
<span style="font-weight: 400;">[8] M. B. de Carvalho, M. P. Pereira, G. Alves, E. A. B. da Silva, C. L. Pagliari, F. Pereira, and V. Testoni, “A 4D DCT-Based lenslet light field codec,” in 2018 25th IEEE International Conference on Image Processing (ICIP), Oct 2018, pp. 435–439.</span><br />
<span style="font-weight: 400;">[9] L. Li, Z. Li, B. Li, D. Liu, and H. Li, “Pseudo-Sequence-Based 2-D Hierarchical Coding Structure for Light-Field Image Compression,” in 2017 Data Compression Conference (DCC), April 2017, pp. 131–140.</span></p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/scalable-light-field-coding">ATHENA Lab: Improving Viewer Experiences with Scalable Light Field Coding (SLFC)</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
