<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	 xmlns:media="http://search.yahoo.com/mrss/" >

<channel>
	<title>Vignesh V Menon &#8211; Bitmovin</title>
	<atom:link href="https://bitmovin.com/author/vignesh-v-menon/feed" rel="self" type="application/rss+xml" />
	<link>https://bitmovin.com</link>
	<description>Bitmovin provides adaptive streaming infrastructure for video publishers and integrators. Fastest cloud encoding and HTML5 Player. Play Video Anywhere.</description>
	<lastBuildDate>Tue, 16 May 2023 16:06:22 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://bitmovin.com/wp-content/uploads/2023/11/bitmovin_favicon.svg</url>
	<title>Vignesh V Menon &#8211; Bitmovin</title>
	<link>https://bitmovin.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Bringing Per-Title Encoding to Live Streaming Workflows</title>
		<link>https://bitmovin.com/per-title-encoding-for-live-streaming</link>
					<comments>https://bitmovin.com/per-title-encoding-for-live-streaming#respond</comments>
		
		<dc:creator><![CDATA[Vignesh V Menon]]></dc:creator>
		<pubDate>Sat, 15 Apr 2023 14:37:36 +0000</pubDate>
				<category><![CDATA[Innovation]]></category>
		<category><![CDATA[per-title encoding]]></category>
		<guid isPermaLink="false">https://bitmovin.com/?p=258036</guid>

					<description><![CDATA[<p>Introduction Bitmovin was born from research performed by our co-founders at Alpen-Adria-Universität Klagenfurt (AAU) and many of the innovations powering our products today are the direct result of the ongoing ATHENA project collaboration between Bitmovin engineers and the Christian Doppler Laboratory at AAU. This post takes a closer look at our recent combined efforts researching...</p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/per-title-encoding-for-live-streaming">Bringing Per-Title Encoding to Live Streaming Workflows</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading">Introduction</h2>



<p>Bitmovin was born from research performed by our co-founders at Alpen-Adria-Universität Klagenfurt (AAU) and many of the innovations powering our products today are the direct result of the ongoing <a href="https://athena.itec.aau.at/" rel="nofollow noopener" target="_blank">ATHENA project</a> collaboration between Bitmovin engineers and the Christian Doppler Laboratory at AAU. This post takes a closer look at our recent combined efforts researching the application of real-time quality and efficiency optimizations to live encoding workflows.&nbsp;</p>



<div style="height:63px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading">Taking Per-Title Encoding Beyond VOD Workflows</h2>



<p><a href="https://bitmovin.com/encoding-service/per-title-encoding/" data-type="page" data-id="259745">Per-Title encoding</a> is a video encoding technique that involves analyzing and customizing the encoding settings for each individual video, based on its content and complexity. Per-Title Encoding delivers the best possible video quality while minimizing the data required when compared to traditional approaches. This allows content providers to save on bandwidth and storage costs, without impacting the viewing experience.&nbsp;</p>



<p>Bitmovin’s co-founders have been presenting research on the topic since 2011 and Netflix first used the term ‘Per-Title Encoding’ in 2015, so while it’s not a new concept, until now its benefits have been mostly limited to Video-on-Demand workflows. This is largely because of the increased latency added by the complexity analysis stage, something ATHENA began to address with the open-source <a href="https://vca.itec.aau.at/" rel="nofollow noopener" target="_blank">Video Complexity Analyzer</a> (VCA) project last year.&nbsp;</p>



<p>Without the ability to optimize settings in real-time, live encoding workflows have to use a fixed adaptive bitrate ladder for the duration of the stream and providers are left with a choice: Either set the average bitrate you think you’ll need, knowing that periods of high motion are going to end up looking blocky and pixelated OR set the bitrate high enough to handle the peaks, knowing you’re going to be wasting data when there is less motion and visual complexity in the stream. That is, until now…</p>



<div style="height:63px" aria-hidden="true" class="wp-block-spacer"></div>



<h2 class="wp-block-heading">Per-Title Live Encoding: Bitmovin + ATHENA collaboration</h2>



<p>In addition to the VCA project, ATHENA has been researching its potential applications to live streaming, something Hadi Amirpour presented at the 2022 Demuxed conference, in his talk “Live is Life: Per-Title Encoding for Live Video Workflows” which you can watch at this <a href="https://www.youtube.com/watch?v=leQKq7x77rw" target="_blank" rel="noreferrer noopener nofollow">link</a>.</p>



<p>The initial results from the research were promising enough to move into the experimental phase, involving collaboration between ATHENA and Bitmovin’s engineering team to measure the performance and viability of Live Variable Bitrate (VBR) techniques in real-world applications. The proposed approach would involve combining input parameters with real-time extraction of features and complexity in the source video to create a variable, perceptually-aware optimized bitrate ladder. Presented below is a summary of the methodology and results prepared by lead author Vignesh V Menon.</p>



<figure class="wp-block-image size-full"><img fetchpriority="high" decoding="async" width="1400" height="411" src="https://bitmovin.com/wp-content/uploads/2023/04/livePTEmodel.png" alt="- Bitmovin" class="wp-image-258037" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/livePTEmodel-300x88.png?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/livePTEmodel.png?size=384x113&amp;lossy=2&amp;strip=1&amp;webp=1 384w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/livePTEmodel-768x225.png?lossy=2&amp;strip=1&amp;webp=1 768w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/livePTEmodel.png?size=1152x338&amp;lossy=2&amp;strip=1&amp;webp=1 1152w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/livePTEmodel.png?lossy=2&amp;strip=1&amp;webp=1 1400w" sizes="(max-width: 1400px) 100vw, 1400px" /></figure>



<div style="height:63px" aria-hidden="true" class="wp-block-spacer"></div>



<h3 class="wp-block-heading">Perceptually-Aware Bitrate Ladder</h3>



<p>One of the common inefficiencies when using a fixed bitrate ladder is that you often end up with renditions that, to the human eye, have equivalent visual quality. That means some of them will be redundant and ultimately wasting storage space without improving QoE for the viewer. In a perfect world, each ladder rung would provide a perceivable quality improvement over the previous, up to the maximum point our eyes can detect any difference, denoted as Just-Noticeable-Difference (JND). By setting a maximum VMAF quality score and target VMAF score difference between renditions you can create an ideal theoretical bitrate ladder for the best QoE.&nbsp;</p>



<div style="height:18px" aria-hidden="true" class="wp-block-spacer"></div>



<figure class="wp-block-image aligncenter size-full is-resized"><img decoding="async" src="https://bitmovin.com/wp-content/uploads/2023/04/perceptual_ladder.png" alt="- Bitmovin" class="wp-image-258038" width="290" height="197" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/perceptual_ladder.png?size=58x39&amp;lossy=2&amp;strip=1&amp;webp=1 58w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/perceptual_ladder.png?size=116x79&amp;lossy=2&amp;strip=1&amp;webp=1 116w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/perceptual_ladder.png?size=174x118&amp;lossy=2&amp;strip=1&amp;webp=1 174w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/perceptual_ladder.png?lossy=2&amp;strip=1&amp;webp=1 245w" sizes="(max-width: 290px) 100vw, 290px" /></figure>



<p class="has-text-align-center"><em>The ideal Live VBR bitrate ladder targeted in this paper. The blue line denotes the corresponding rate-distortion (RD) curve, while the red dotted line indicates VMAF = 𝑣𝑚𝑎𝑥. When the VMAF value is greater than 𝑣𝑚𝑎𝑥, the video stream is deemed to be perceptually lossless. 𝑣<sub>𝐽</sub> represents the target VMAF difference.</em></p>



<div style="height:21px" aria-hidden="true" class="wp-block-spacer"></div>



<p></p>



<p>For this project, we experimented with the following input parameters:</p>



<ul>
<li>Set of predefined resolutions</li>



<li>Minimum and maximum target bitrates</li>



<li>Target VMAF difference between renditions</li>



<li>Maximum VMAF of the highest quality rendition</li>
</ul>



<div style="height:21px" aria-hidden="true" class="wp-block-spacer"></div>



<p>These parameters were then combined with the extracted complexity information to predict and adjust the bitrates necessary to hit the VMAF quality targets. First, based on the pre-configured minimum bitrate, we predict the VMAF score of the lowest quality rung. Next, we use the configured target VMAF difference to determine the VMAF scores of the remaining renditions (up to the max VMAF) and predict the corresponding bitrate-resolution pairs to encode.&nbsp; We also predict the optimal Constant Rate Factor(CRF) for each of these pairs to ensure maximum compression efficiency given our target bitrate. This further reduces file sizes while delivering higher visual quality compared to our reference HLS bitrate ladder CBR encoding using the x264 AVC encoder.</p>



<div style="height:7px" aria-hidden="true" class="wp-block-spacer"></div>



<figure class="wp-block-image size-full"><img decoding="async" width="2584" height="612" src="https://bitmovin.com/wp-content/uploads/2023/04/Screenshot-2023-04-15-at-7.15.19-AM.png" alt="- Bitmovin" class="wp-image-258040" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/Screenshot-2023-04-15-at-7.15.19-AM-300x71.png?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/Screenshot-2023-04-15-at-7.15.19-AM.png?size=384x91&amp;lossy=2&amp;strip=1&amp;webp=1 384w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/Screenshot-2023-04-15-at-7.15.19-AM-768x182.png?lossy=2&amp;strip=1&amp;webp=1 768w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/Screenshot-2023-04-15-at-7.15.19-AM.png?size=1152x273&amp;lossy=2&amp;strip=1&amp;webp=1 1152w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/Screenshot-2023-04-15-at-7.15.19-AM-1536x364.png?lossy=2&amp;strip=1&amp;webp=1 1536w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/Screenshot-2023-04-15-at-7.15.19-AM.png?size=1920x455&amp;lossy=2&amp;strip=1&amp;webp=1 1920w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/Screenshot-2023-04-15-at-7.15.19-AM-2048x485.png?lossy=2&amp;strip=1&amp;webp=1 2048w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2023/04/Screenshot-2023-04-15-at-7.15.19-AM.png?lossy=2&amp;strip=1&amp;webp=1 2584w" sizes="(max-width: 1920px) 100vw, 1920px" /><figcaption class="wp-element-caption"><em>Comparison of RD curves of 3 different videos with varying spatiotemporal complexities using the HLS CBR encoding (green line), and Live VBR encoding (red line). Please note that the target VMAF difference is six in these plots.</em></figcaption></figure>



<div style="height:36px" aria-hidden="true" class="wp-block-spacer"></div>



<h3 class="wp-block-heading">Results</h3>



<p>The initial results of our Live VBR Per-Title Encoding have been extremely promising, achieving average bitrate savings of between 7.21% and 13.03% while maintaining the same PSNR and VMAF, respectively, compared to the reference HLS bitrate ladder CBR encoding. Even more impressive is that it delivered those improvements without introducing noticeable latency to the streams, thanks to the real-time complexity analysis from <a href="https://bitmovin.com/video-complexity-analyzer-vca/">ATHENA’s VCA</a>. Furthermore, by eliminating redundant renditions, we’ve seen a 52.59% cumulative decrease in storage space and a 28.78% cumulative decrease in energy consumption*, considering a JND of six VMAF points.</p>



<p><em>*Energy consumption was estimated using codecarbon on a dual-processor server with Intel Xeon Gold 5218R (80 cores, frequency at 2.10 GHz). The renditions were encoded concurrently to measure the encoding time and energy.</em></p>



<div style="height:29px" aria-hidden="true" class="wp-block-spacer"></div>
<p>The post <a rel="nofollow" href="https://bitmovin.com/per-title-encoding-for-live-streaming">Bringing Per-Title Encoding to Live Streaming Workflows</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://bitmovin.com/per-title-encoding-for-live-streaming/feed</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Efficiently Predicting Quality with ATHENA&#8217;s Video Complexity Analyzer (VCA) project</title>
		<link>https://bitmovin.com/video-complexity-analyzer-vca</link>
		
		<dc:creator><![CDATA[Christian Timmerer]]></dc:creator>
		<pubDate>Fri, 25 Mar 2022 19:30:34 +0000</pubDate>
				<category><![CDATA[Innovation]]></category>
		<category><![CDATA[athena]]></category>
		<guid isPermaLink="false">https://bitmovin.com/?p=224553</guid>

					<description><![CDATA[<p>For online prediction in live streaming applications, selecting low-complexity features is critical to ensure low-latency video streaming without disruptions. For each frame/ video/ video segment, two features, i.e., the average texture energy and the average gradient of the texture energy are determined. A DCT-based energy function is introduced to determine the block-wise texture of each...</p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/video-complexity-analyzer-vca">Efficiently Predicting Quality with ATHENA&#8217;s Video Complexity Analyzer (VCA) project</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><span style="font-weight: 400;">For online prediction in live streaming applications, selecting low-complexity features is critical to ensure low-latency video streaming without disruptions. For each frame/ video/ video segment, two features, i.e., the average texture energy and the average gradient of the texture energy are determined. A DCT-based energy function is introduced to determine the block-wise texture of each frame. The spatial and temporal features of the video/ video segment are derived from the DCT-based energy function. The Video Complexity Analyzer (VCA) project is launched in 2022, aiming to provide the most efficient, highest performance spatial and temporal complexity prediction of each frame/ video/ video segment which can be used for a variety of applications like shot/scene detection, online per-title encoding.</span></p>
<h2><span style="font-weight: 400;">What is the Video Complexity Analyzer</span></h2>
<p style="text-align: left;"><span style="font-weight: 400;">The primary objective of the Video Complexity Analyzer is to become the best spatial and temporal complexity predictor for every frame/ video segment/ video which aids in predicting encoding parameters for applications like scene-cut detection and online per-title encoding. VCA leverages x86 SIMD and multi-threading optimizations for effective performance. While VCA is primarily designed as a video complexity analyzer library, a command-line executable is provided to facilitate testing and development. We expect VCA to be utilized in many leading video encoding solutions in the coming years.</span></p>
<p><span style="font-weight: 400;">VCA is available as an open-source library, published under the GPLv3 license. For more details, please visit the software online documentation </span><a href="https://cd-athena.github.io/VCA/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">here</span></a><span style="font-weight: 400;">. The source code can be found </span><a href="https://github.com/cd-athena/VCA" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">here</span></a><span style="font-weight: 400;">.</span><br />
&nbsp;</p>
<p style="text-align: center;"><img loading="lazy" decoding="async" class="aligncenter" src="https://athena.itec.aau.at/wp-content/uploads/sites/12/2022/02/beauty_E.gif" width="426" height="480" alt="- Bitmovin"><br />
<strong>Heatmap of spatial complexity (E)</strong></p>
<p style="text-align: center;"><img loading="lazy" decoding="async" class="" src="https://athena.itec.aau.at/wp-content/uploads/sites/12/2022/02/beauty_h.gif" width="426" height="480" alt="- Bitmovin"><br />
<strong>Heatmap of temporal complexity (h)</strong></p>
<p><span style="font-weight: 400;">A performance comparison (frames analyzed per second) of VCA (with different levels of threading enabled) compared to Spatial Information/Temporal Information (SITI) [</span><a href="https://github.com/Telecommunication-Telemedia-Assessment/SITI" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Github</span></a><span style="font-weight: 400;">] is shown below</span></p>
<p><figure id="attachment_224831" aria-describedby="caption-attachment-224831" style="width: 512px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="wp-image-224831" src="https://bitmovin.com/wp-content/uploads/2022/03/SITI-EH2-1-300x197.png" alt="Visual Complexity Analyzer vs Spatial Information/Temporal Information_Bar Chart" width="512" height="337"><figcaption id="caption-attachment-224831" class="wp-caption-text">Visual Complexity Analyzer vs Spatial Information/Temporal Information</figcaption></figure></p>
<h2><span style="font-weight: 400;">How to Build a Video Complexity Analyzer</span></h2>
<p><span style="font-weight: 400;">The software is tested mostly in Linux and Windows OS. It requires some pre-requisite software to be installed before compiling. The steps to build the project in Linux and Windows are explained below.</span></p>
<h3><span style="font-weight: 400;">Prerequisites</span></h3>
<ol>
<li style="font-weight: 400;" aria-level="1"><a href="https://cmake.org/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">CMake</span></a><span style="font-weight: 400;"> version 3.13 or higher.</span></li>
<li style="font-weight: 400;" aria-level="1"><a href="https://git-scm.com/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">Git</span></a><span style="font-weight: 400;">.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">C++ compiler with C++11 support</span></li>
<li style="font-weight: 400;" aria-level="1"><a href="https://nasm.us/" rel="nofollow noopener" target="_blank"><span style="font-weight: 400;">NASM</span></a><span style="font-weight: 400;"> assembly compiler (for x86 SIMD support)</span></li>
</ol>
<p><span style="font-weight: 400;">The following C++11 compilers have been known to work:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Visual Studio 2015 or later</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">GCC 4.8 or later</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Clang 3.3 or later</span></li>
</ul>
<h3><span style="font-weight: 400;">Execute Build</span></h3>
<p><span style="font-weight: 400;">The following commands will check out the project source code and create a directory called ‘build’ where the compiler output will be placed. CMake is then used for generating build files and compiling the VCA binaries.</span></p>
<pre><span style="font-weight: 400;">$ git clone https://github.com/cd-athena/VCA.git</span>
<span style="font-weight: 400;">$ cd VCA</span>
<span style="font-weight: 400;">$ mkdir build</span>
<span style="font-weight: 400;">$ cd build</span>
<span style="font-weight: 400;">$ cmake ../</span>
<span style="font-weight: 400;">$ cmake --build .</span></pre>
<p><span style="font-weight: 400;">This will create VCA binaries in the VCA/build/source/apps/ folder.</span></p>
<h3><span style="font-weight: 400;">Command-Line Options</span></h3>
<h4><strong>General</strong></h4>
<p>Displaying Help Text:</p>
<pre><span style="font-weight: 400;">--help, -h</span>
</pre>
<p>Displaying version details:</p>
<pre><span style="font-weight: 400;">--version, -v</span></pre>
<h4><strong>Logging/Statistic Options</strong></h4>
<pre><span style="font-weight: 400;">--complexity-csv &lt;filename&gt;</span></pre>
<p><span style="font-weight: 400;">Write the spatial (E) and temporal complexity (h), epsilon, brightness (L) statistics to a Comma Separated Values log file. Creates the file if it doesn’t already exist. The following statistics are available:</span></p>
<ul>
<li><span style="font-weight: 400;">POC</span><span style="font-weight: 400;"> Picture Order Count &#8211; The display order of the frames</span></li>
<li><span style="font-weight: 400;">E</span><span style="font-weight: 400;"> Spatial complexity of the frame</span></li>
<li><span style="font-weight: 400;">h</span><span style="font-weight: 400;"> Temporal complexity of the frame</span></li>
<li><span style="font-weight: 400;">epsilon</span><span style="font-weight: 400;"> Gradient of the temporal complexity of the frame</span></li>
<li><span style="font-weight: 400;">L</span><span style="font-weight: 400;"> Brightness of the frame</span></li>
</ul>
<p><span style="font-weight: 400;">Unless option:</span><span style="font-weight: 400;">&#8211;no-chroma</span><span style="font-weight: 400;"> is used, the following chroma statistics are also available:</span></p>
<ul>
<li><span style="font-weight: 400;">avgU</span><span style="font-weight: 400;"> Average U chroma component of the frame</span></li>
<li><span style="font-weight: 400;">energyU</span><span style="font-weight: 400;"> Average U chroma texture of the frame</span></li>
<li><span style="font-weight: 400;">avgV</span><span style="font-weight: 400;"> Average V chroma component of the frame</span></li>
<li><span style="font-weight: 400;">energyV</span><span style="font-weight: 400;"> Average V chroma texture of the frame</span></li>
</ul>
<pre><span style="font-weight: 400;">--shot-csv &lt; filename&gt;</span></pre>
<p><span style="font-weight: 400;">Write the shot id, the first POC of every shot to a Comma Separated Values log file. Creates the file if it doesn’t already exist.</span></p>
<pre><span style="font-weight: 400;">--yuvview-stats &lt;filename&gt;</span></pre>
<p><span style="font-weight: 400;">Write the per block results (L, E, h) to a stats file that can be visualized using YUView.</span></p>
<h4><strong>Performance Options</strong></h4>
<pre><span style="font-weight: 400;">--no-chroma</span></pre>
<p><span style="font-weight: 400;">Disable analysis of chroma planes (which is enabled by default).</span></p>
<pre><span style="font-weight: 400;">--no-simd</span>
</pre>
<p><span style="font-weight: 400;">The Video Complexity Analyzer will use all detected CPU SIMD architectures by default. This will disable that detection.</span></p>
<pre><span style="font-weight: 400;">--threads &lt;integer&gt;</span></pre>
<p><span style="font-weight: 400;">Specify the number of threads to use. Default: 0 (autodetect).</span></p>
<h4><strong>Input/Output</strong></h4>
<pre><span style="font-weight: 400;">--input &lt;filename&gt;</span></pre>
<p><span style="font-weight: 400;">Input filename. Raw YUV or Y4M supported. Use </span><span style="font-weight: 400;">stdin</span><span style="font-weight: 400;"> for stdin. For example piping input from ffmpeg works like this:</span></p>
<pre><span style="font-weight: 400;">ffmpeg.exe -i Sintel.2010.1080p.mkv -f yuv4mpegpipe - | vca.exe --input stdin</span></pre>
<pre><span style="font-weight: 400;">--y4m</span>
</pre>
<p><span style="font-weight: 400;">Parse input stream as YUV4MPEG2 regardless of file extension. Primarily intended for use with stdin. This option is implied if the input filename has a “.y4m” extension</span></p>
<pre><span style="font-weight: 400;">--input-depth &lt;integer&gt;</span>
</pre>
<p><span style="font-weight: 400;">Bit-depth of input file or stream. Any value between 8 and 16. Default is 8. For Y4M files, this is read from the Y4M header.</span></p>
<pre><span style="font-weight: 400;">--input-res &lt;wxh&gt;</span>
</pre>
<p><span style="font-weight: 400;">Source picture size [w x h]. For Y4M files, this is read from the Y4M header.</span></p>
<pre><span style="font-weight: 400;">--input-csp &lt;integer or string&gt;</span>
</pre>
<p><span style="font-weight: 400;">Chroma Subsampling. 4:0:0(monochrome), 4:2:0, 4:2:2, and 4:4:4 are supported. For Y4M files, this is read from the Y4M header.</span></p>
<pre><span style="font-weight: 400;">--input-fps &lt;double&gt;</span>
</pre>
<p><span style="font-weight: 400;">The framerate of the input. For Y4M files, this is read from the Y4M header.</span></p>
<pre><span style="font-weight: 400;">--skip &lt;integer&gt;</span>
</pre>
<p><span style="font-weight: 400;">Number of frames to skip at start of input file. Default 0.</span></p>
<pre><span style="font-weight: 400;">--frames, -f &lt;integer&gt;</span>
</pre>
<p><span style="font-weight: 400;">Number of frames of input sequence to be analyzed. Default 0 (all).</span></p>
<h4><strong>Analyzer Configuration</strong></h4>
<pre><span style="font-weight: 400;">--block-size &lt;8/16/32&gt;</span>
</pre>
<p><span style="font-weight: 400;">Size of the non-overlapping blocks used to determine the E, h features. Default: 32.</span></p>
<pre><span style="font-weight: 400;">--min-thresh &lt;double&gt;</span>
</pre>
<p><span style="font-weight: 400;">Minimum threshold of epsilon for shot detection.</span></p>
<pre><span style="font-weight: 400;">--max-thresh &lt;double&gt;</span>
</pre>
<p><span style="font-weight: 400;">Maximum threshold of epsilon for shot detection.</span></p>
<h2><span style="font-weight: 400;">Using the VCA API</span></h2>
<p><span style="font-weight: 400;">VCA is written primarily in C++ and x86 assembly language. This API is wholly defined within :file: </span><span style="font-weight: 400;">vcaLib.h</span><span style="font-weight: 400;"> in the source/lib/ folder of our source tree. All of the functions and variables and enumerations meant to be used by the end-user are present in this header.</span></p>
<pre><span style="font-weight: 400;">vca_analyzer_open(vca_param param)</span>
</pre>
<p><span style="font-weight: 400;">Create a new analyzer handler, all parameters from vca_param are copied. The returned pointer is then passed to all of the functions pertaining to this analyzer. Since </span><span style="font-weight: 400;">vca_param</span><span style="font-weight: 400;"> is copied internally, the user may release their copy after allocating the analyzer. Changes made to their copy of the param structure have no affect on the analyzer after it has been allocated.</span></p>
<pre><span style="font-weight: 400;">vca_result vca_analyzer_push(vca_analyzer *enc, vca_frame *frame)</span></pre>
<p><span style="font-weight: 400;">Push a frame to the analyzer and start the analysis. Note that only the pointers will be copied but no ownership of the memory is transferred to the library. The caller must make sure that the pointers are valid until the frame was analyzed. Once a results for a frame was pulled the library will not use pointers anymore. This may block until there is a slot available to work on. The number of frames that will be processed in parallel can be set using nrFrameThreads.</span></p>
<pre><span style="font-weight: 400;">bool vca_result_available(vca_analyzer *enc)</span>
</pre>
<p><span style="font-weight: 400;">Check if a result is available to pull.</span></p>
<pre><span style="font-weight: 400;">vca_result vca_analyzer_pull_frame_result(vca_analyzer *enc, vca_frame_results *result)</span></pre>
<p><span style="font-weight: 400;">Pull a result from the analyzer. This may block until a result is available. Use </span><span style="font-weight: 400;">vca_result_available()</span><span style="font-weight: 400;"> if you want to only check if a result is ready.</span></p>
<pre><span style="font-weight: 400;">void vca_analyzer_close(vca_analyzer *enc)</span>
</pre>
<p><span style="font-weight: 400;">Finally, the analyzer must be closed in order to free all of its resources. An analyzer that has been flushed cannot be restarted and reused. Once </span><span style="font-weight: 400;">vca_analyzer_close()</span><span style="font-weight: 400;"> has been called, the analyzer handle must be discarded.</span><br />
Try out the video complexity analyzer for yourself, amongst other exciting innovations both at <a href="https://athena.itec.aau.at/" rel="nofollow noopener" target="_blank">https://athena.itec.aau.at/</a> and <a href="https://bitmovin.com">bitmovin.com</a></p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/video-complexity-analyzer-vca">Efficiently Predicting Quality with ATHENA&#8217;s Video Complexity Analyzer (VCA) project</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming</title>
		<link>https://bitmovin.com/multi-encoding-has</link>
		
		<dc:creator><![CDATA[Christian Timmerer]]></dc:creator>
		<pubDate>Thu, 23 Dec 2021 16:25:36 +0000</pubDate>
				<category><![CDATA[Innovation]]></category>
		<category><![CDATA[athena]]></category>
		<category><![CDATA[video encoding]]></category>
		<guid isPermaLink="false">https://bitmovin.com/?p=209148</guid>

					<description><![CDATA[<p>The Future of HTTP Adaptive Streaming (HAS) According to multiple reports, video viewing will account for as much as 82% of all internet traffic by the end of 2022, as such the popularity of HTTP Adaptive Streaming (HAS) is steadily increasing to efficiently support modern demand. Furthermore, improvements in video characteristics such as frame rate,...</p>
<p>The post <a rel="nofollow" href="https://bitmovin.com/multi-encoding-has">Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></description>
										<content:encoded><![CDATA[<h2><b>The Future of HTTP Adaptive Streaming (HAS)</b></h2>
<p><span style="font-weight: 400;">According to multiple reports, video viewing will account for as much as <a href="https://www.streamingmedia.com/Articles/ReadArticle.aspx?ArticleID=144177" rel="nofollow noopener" target="_blank">82% of all internet traffic by the end of 2022</a>, as such the popularity of <a href="https://bitmovin.com/adaptive-streaming/">HTTP Adaptive Streaming (HAS)</a> is steadily increasing to efficiently support modern demand. Furthermore, improvements in video characteristics such as frame rate, resolution, and bit depth raise the need to develop a large-scale, highly efficient video encoding environment. This is even more crucial for DASH-based content provisioning as it requires encoding multiple representations of the same video content. Each video is encoded at multiple bitrates and spatial resolutions (i.e., representations) to adapt to the heterogeneity of network conditions, device characteristics, and end-user preferences as shown in Figure 1. However, encoding the same content at multiple representations requires substantial resources and costs for content providers.</span></p>
<p><figure id="attachment_209151" aria-describedby="caption-attachment-209151" style="width: 300px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-medium wp-image-209151 zoooom" src="https://bitmovin.com/wp-content/uploads/2021/12/A-systematic-representation-of-encoding-scheme-in-HTTP-Adaptive-Streaming-HAS_decision-tree-300x161.png" alt="A systematic representation of encoding scheme in HTTP Adaptive Streaming (HAS)_decision tree" width="300" height="161" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/12/A-systematic-representation-of-encoding-scheme-in-HTTP-Adaptive-Streaming-HAS_decision-tree-300x161.png?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/12/A-systematic-representation-of-encoding-scheme-in-HTTP-Adaptive-Streaming-HAS_decision-tree.jpg?lossy=2&amp;strip=1&amp;webp=1 512w" sizes="(max-width: 300px) 100vw, 300px" /><figcaption id="caption-attachment-209151" class="wp-caption-text">Figure 1: A systematic representation of encoding scheme in HAS</figcaption></figure></p>
<p><span style="font-weight: 400;">As seen in Figure 2</span><span style="font-weight: 400;">, as resolution doubles, encoding time complexity also doubles! To address this challenge, we must employ multi-encoding schemes</span><span style="font-weight: 400;">&nbsp;to accelerate the encoding process of multiple representations without impacting quality. This is achieved by exploiting a high correlation of encoder analysis decisions (like block partitioning and prediction mode decisions) across multiple representations.</span></p>
<p><figure id="attachment_209152" aria-describedby="caption-attachment-209152" style="width: 300px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-medium wp-image-209152 zoooom" src="https://bitmovin.com/wp-content/uploads/2021/12/Relative-time-complexity-of-encoding-representations-in-x265-HEVC-encoding_Bar-Chart-300x90.png" alt="Relative time complexity of encoding representations in x265 HEVC encoding_Bar Chart" width="300" height="90" srcset="https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/12/Relative-time-complexity-of-encoding-representations-in-x265-HEVC-encoding_Bar-Chart-300x90.png?lossy=2&amp;strip=1&amp;webp=1 300w, https://b3148424.smushcdn.com/3148424/wp-content/uploads/2021/12/Relative-time-complexity-of-encoding-representations-in-x265-HEVC-encoding_Bar-Chart.jpg?lossy=2&amp;strip=1&amp;webp=1 512w" sizes="(max-width: 300px) 100vw, 300px" /><figcaption id="caption-attachment-209152" class="wp-caption-text">Figure 2: Relative time complexity of encoding representations in x265 HEVC encoding.</figcaption></figure></p>
<h2>What is Multi-Encoding?</h2>
<p><span style="font-weight: 400;">To encode multiple renditions of the same video at multiple bitrates and resolution, we reuse encoder analysis information across various renditions. This is due to the fact that there is a strong correlation of encoder decisions across various bitrate and resolution renditions. The scheme of sharing analysis information across multiple bitrates within a resolution is termed “multi-rate encoding” while sharing the information across multiple resolutions is termed as “multi-resolution encoding”. &#8220;Multi-encoding&#8221; is a generalized term that combines both <a href="https://bitmovin.com/multi-rate-encoding-fares-ml/">multi-rate</a> and multi-resolution encoding schemes.</span></p>
<h3>Proposed Heuristics:</h3>
<p><span style="font-weight: 400;">To aid the encoding process of the dependent renditions in HEVC, the ATHENA Labs research team proposes a few new encoder decision heuristics, Prediction Mode and Motion Estimation:</span><br />
<i><span style="font-weight: 400;">Prediction Mode Heuristics:</span></i><br />
<span style="font-weight: 400;">Prediction Mode heuristics are those where the selected Coding Unit (CU) size for the dependent renditions is the same as the reference representation &#8211; this can be further broken down into the following modes:</span></p>
<ol>
<li><span style="font-weight: 400;"> If the <a href="https://www.researchgate.net/publication/276133857_Early_Skip_Mode_Decision_for_HEVC_Encoder_with_Emphasis_on_Coding_Quality" rel="nofollow noopener" target="_blank">SKIP mode</a> was chosen in the highest bitrate rendition, rate-distortion optimization is evaluated for only MERGE/SKIP modes.</span></li>
<li><span style="font-weight: 400;"> If the 2Nx2N mode was chosen in the highest bitrate rendition, RDO is skipped for AMP modes.</span></li>
<li><span style="font-weight: 400;"> If the inter-prediction mode was chosen in the highest bitrate rendition, RDO is skipped for intra-prediction modes.</span></li>
<li><span style="font-weight: 400;"> If the intra-prediction mode was chosen for the highest and lowest bitrate rendition, RDO is evaluated for only intra-prediction modes in the intermediate renditions.</span></li>
</ol>
<p><i><span style="font-weight: 400;">Motion Estimation Heuristics:</span></i><br />
<span style="font-weight: 400;">Motion Estimation heuristics are those where the CU size and PU selected for the dependent representations are the same as the reference representation:</span></p>
<ol>
<li><span style="font-weight: 400;"> The same reference frame is forced as that of the highest bitrate rendition.</span></li>
<li><span style="font-weight: 400;"> The Motion Vector Predictor (MVP) is set to be the Motion Vector (MV) of the highest bitrate rendition.</span></li>
<li><span style="font-weight: 400;"> The motion search range is decreased to a smaller window if the MVs of the highest and the lowest bitrate renditions are close to each other.</span></li>
</ol>
<p><span style="font-weight: 400;">Based on the above-mentioned heuristics, two multi-encoding schemes are proposed.</span></p>
<h3><b>Proposed Multi-encoding Schemes</b></h3>
<p><span style="font-weight: 400;">In our first proposed <a href="https://athena.itec.aau.at/2021/04/efficient-multi-encoding-algorithms-for-http-adaptive-bitrate-streaming/" rel="nofollow noopener" target="_blank">multi-encoding</a> approach we perform the following steps:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The first resolution tier (i.e, 540p in our example) is encoded using the combination of double-bound for CU depth estimation (c.f. Previous blog post), </span><i><span style="font-weight: 400;">Prediction Mode Heuristics, </span></i><span style="font-weight: 400;">and</span><i><span style="font-weight: 400;"> Motion Estimation Heuristics</span></i><span style="font-weight: 400;">.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The CU depth from the highest bit rate representation of the first resolution tier (i.e., 540p) is shared with the highest bit rate representation of the next resolution tier (i.e., 1080p in our example). In particular, the information is used as a lower bound, i.e., the CU is forced to split if the current encode depth is lower than the reference encode CU depth. The remaining bitrate representations of this resolution tier are encoded using the multi-rate scheme as used in Step 1.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Repeat Step 2 for the remaining resolution tiers in ascending order with respect to the resolution until no more resolution tiers are left (i.e., only for 2160p in our example).</span></li>
</ol>
<p><span style="font-weight: 400;">Figure 3: An example of the first proposed multi-encoding scheme.</span></p>
<p><figure id="attachment_209153" aria-describedby="caption-attachment-209153" style="width: 300px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-medium wp-image-209153 zoooom" src="https://bitmovin.com/wp-content/uploads/2021/12/Multi-Encoding-Scheme-Proposal-1_Decision-Tree-2-300x255.png" alt="Multi-Encoding Scheme Proposal 1_Decision Tree" width="300" height="255"><figcaption id="caption-attachment-209153" class="wp-caption-text">Figure 3: An example of the first proposed multi-encoding scheme.</figcaption></figure></p>
<p><span style="font-weight: 400;">The second proposed multi-encoding scheme is a minor variation of the first scheme which aims to extend the double-bound for CU depth estimation scheme across resolution tiers. It is performed in the following steps:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The first resolution tier (i.e., 540p in our example) is encoded using the combination of double-bound for CU depth estimation, </span><i><span style="font-weight: 400;">Prediction Mode Heuristics, </span></i><span style="font-weight: 400;">and</span><i><span style="font-weight: 400;"> Motion Estimation Heuristics</span></i><span style="font-weight: 400;">.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The CU depth from the lowest bit rate representation of the first resolution tier (i.e., 540p) is shared with the highest bit rate representation of the next resolution tier (i.e., 1080p in our example). In particular, the information is used as a lower bound, i.e., the CU is forced to split if the current encode depth is lower than the reference encode CU depth.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The scaled CU depth from the lowest bit rate representation of the previous resolution tier (i.e., 540p) and CU depth information from the highest bit rate representation of the current resolution tier are shared with the lowest bit rate representation of the current resolution tier (i.e., 1080p in our example) and are used as the lower bound and upper bound respectively for CU depth search.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The remaining bit rate representations of this resolution tier (i.e., 1080p) are encoded using the multi-rate scheme as used in Step 1.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Repeat Step 2 for the remaining resolution tiers in ascending order with respect to the resolution until no more resolution tiers are left (i.e., only for 2160p in our example).</span></li>
</ol>
<p><figure id="attachment_209154" aria-describedby="caption-attachment-209154" style="width: 300px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-medium wp-image-209154 zoooom" src="https://bitmovin.com/wp-content/uploads/2021/12/Multi-Encoding-Scheme-Proposal-2_Decision-Tree-1-300x243.png" alt="Multi-Encoding Scheme Proposal 2_Decision Tree" width="300" height="243"><figcaption id="caption-attachment-209154" class="wp-caption-text">Figure 4: An example of the second proposed multi-encoding scheme.</figcaption></figure></p>
<h2><b>Results</b></h2>
<p><span style="font-weight: 400;">It is observed that the state-of-the-art scheme yields the highest average encoding time-saving, i.e., 80.05%, but it comes with a bitrate increase of 13.53% and 9.59% to maintain the same&nbsp; PSNR and VMAF respectively as compared to the stand-alone encodings. The first proposed multi-encoding scheme has the lowest increase in bitrate to maintain the same&nbsp; PSNR and VMAF (2.32% and 1.55%) respectively as compared to the stand-alone encodings.&nbsp; The second proposed multi-encoding scheme improves the encoding time savings of the first proposed multi-encoding scheme by 11% with a negligible increase in bitrate to maintain the same PSNR and VMAF. This result is shown in Table 1, where Delta T represents the overall encoding time-savings compared to the stand-alone encodings, BDR_P and BDR_V refer to the average difference in bitrate with respect to stand-alone encodings to maintain the same PSNR and VMAF, respectively.</span></p>
<p><figure id="attachment_209155" aria-describedby="caption-attachment-209155" style="width: 300px" class="wp-caption aligncenter"><img loading="lazy" decoding="async" class="size-medium wp-image-209155 zoooom" src="https://bitmovin.com/wp-content/uploads/2021/12/Quality-Results-of-the-Proposed-multi-encoding-schemes_Chart-1-300x73.png" alt="Quality Results of the Proposed multi-encoding schemes_Chart" width="300" height="73"><figcaption id="caption-attachment-209155" class="wp-caption-text">Results of the proposed multi-encoding schemes</figcaption></figure></p>
<p>View the full multi-encoding research paper from <a href="https://athena.itec.aau.at/2021/04/efficient-multi-encoding-algorithms-for-http-adaptive-bitrate-streaming/" rel="nofollow noopener" target="_blank">ATHENA here</a>.<br />
If you liked this article, check out some of our other great ATHENA content at the following links:</p>
<ul>
<li aria-level="1"><a href="https://bitmovin.com/scalable-light-field-coding/">Scalable Light Field Coding – Improving the Quality of Experience (QoE) | Bitmovin x ATHENA Labs</a></li>
<li aria-level="1"><a href="https://bitmovin.com/multicast-live-video-streaming-oscar/">Replacing the Multicast Live Video Streaming Approach with OSCAR | Bitmovin x ATHENA Labs</a></li>
<li aria-level="1"><a href="https://bitmovin.com/multi-rate-encoding-fares-ml/">Multi-Rate Encoding for HTTP Adaptive Streaming | Bitmovin x ATHENA Labs</a></li>
</ul>
<p>The post <a rel="nofollow" href="https://bitmovin.com/multi-encoding-has">Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming</a> appeared first on <a rel="nofollow" href="https://bitmovin.com">Bitmovin</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
