In 2008, Microsoft announced the release of the new Internet Information Server (IIS) 7.0 with a new adaptive HTTP streaming feature called Smooth Streaming, targeting the smooth delivery of HD content to the client. Their Silverlight based client continually detects the available bandwidth conditions as well as CPU utilization and playback window resolution to decide which representation of the content fits best to the specific situation. They demonstrated the capabilities of their new streaming system at several sports events such as the 2008 Beijing Summer Olympic Games, the 2009 Wimbledon Championship and the 2010 Winter Olympics in Vancouver. Taking the 2010 Winter Olympics as an example, the TV broadcaster NBC Sports used adaptive HTTP streaming (Smooth Streaming) to provide online streams for 15.8 million individual users who were watching 50.5 million streams in 720p resolution and producing 3.6 petabytes traffic in total. These success stories impressively show that adaptive HTTP streaming, not only Smooth Streaming also MPEG-DASH and Apple HLS work very well when it comes to huge live events with millions of viewers.
Considering CPU in Adaptation Decisions
Interestingly, Microsoft also includes the CPU utilization as an indicator for the stream switching decision which is especially valuable for mobile devices such as smartphones and tablets. This means that if the CPU utilization is high, the client reduces the stream quality and resolution which furthermore reduces the CPU performance needs of the decoding process and guarantees a continuous decoding without stalls. For those devices, CPU adaptive streaming becomes useful in multiple ways. First and foremost, they have limited CPU as well as GPU capabilities and sometimes even restricted decoding features, e.g., only H.264 baseline profile support. In such cases, the client can choose the representation which fits best for their resources and provides the best capability and quality of experience to the user. Nevertheless, mobile devices become more and more powerful due to high-end mobile CPUs. However, the energy consumption of mobile video playback still remains high, which leads to the next use case considering the CPU of a device in the stream switching decision. Due to the restricted battery capacity of most mobile devices, it may also be necessary to switch to a lower complexity stream during playback to reduce energy consumption of the decoding process and therefore extend the remaining running time of the battery. Such lower complexity streams could be produced by reducing the bitrate and/or resolution on the one hand. But when going into more detail in video coding, there are further and more sophisticated possibilities. So in the case of H.264/AVC, it is conceivable to choose lower complexity entropy coding methods, e.g. the usage of CAVLC instead of CABAC, use a lower sub-pixel precision or to disable the deblocking filter. Due to such methods, the CPU utilization of the video decoding process may be reduced significantly and as a consequence of this, the battery of the device lasts longer.
File Formats
Microsoft Smooth Streaming leverages three different file types for their streaming system which will be described in the following:
- Fragmented MP4 files for media content: *.ismv and *.isma
- Server manifest file: *.ism
- Client manifest file: *.ismc
The media files of Microsoft Smooth Streaming are based on fragmented MP4. MP4, which is based on the ISO Base Media File Format (IBMFF), is basically organized using boxes as units for data as well as metadata and offers a wide range of arrangement possibilities of those boxes in a file. Especially for streaming scenarios, MP4 offers the possibility to split up the metadata and media-data of a continuous stream into several fragments, each consisting of a metadata and a media-data block, also labeled as fragmented MP4 (fMP4) in the context of Smooth Streaming. Therefore, it is possible to store separate media segments, which correspond to one or more Group of Pictures (GOP), by using a Movie Fragment Box (moof) and a Media Data Box (mdat).
All chunks of the same representation are stored together in one MP4 file that allows random access. This file starts with a Movie Metadata Box (moov) containing metadata information for the whole file, followed by the different fragments. At the end of the file, there is a Move Fragment Random Acces (mfra) Box enabling fast random access to the different fragments.
Those chunks are requested via the IIS webserver which carries out an address translation of the incoming URL and responds with the appropriate Movie Fragment. These URLs of the segment requests are in the following form, where <bitrate> signals the representation and <segment time offset> signals which segment is requested. An example request could look like this:
http://cdn.bitmovin.net/content.ism/QualityLevels(<bitrate>)/Fragments(video=<segment time offset>)
At this point, an IIS webserver is necessary. Although HTTP 1.1 range requests would be an alternative, working without any address translation on the server side. Unfortunately, this address translation seems to be intended to limit the use of Smooth Streaming to CDNs using Microsoft products.
In addition to the media files, Microsoft Smooth Streaming uses a client manifest file containing the metadata of the different representations and basic chunk descriptions of the video stream. This file is based on XML and basically contains the information about the used codec, bitrate, resolution, etc. of the different available quality versions of the content. Furthermore, there is a server manifest file used for the relationship of representations, segments and contained media tracks, which is not transferred to the client and only used by the IIS. It contains the information about the bitrates, the associated filenames/paths on the server, track ids and further metadata. This file is used for the URL translation of the HTTP requests to media files. It is necessary because the HTTP requests of Microsoft Smooth Streaming do not contain file names or byte ranges for the segments and so they have to be translated to a file read operation of the IIS using this server manifest file. As already mentioned, this mechanism could be omitted in general using HTTP/1.1 range requests.
Follow Bitmovin on Twitter: @bitmovin