CMAF facilitates backwards-compatible ultra-low latency streaming. This article explains why and how this is achieved and discusses exemplary practical challenges that must be overcome when implementing CMAF-compatible pipelines that provide ultra-low latency consumer experiences.
IBC365 – 28 July 2020
Why use CMAF?
Many competing formats for multimedia streaming exist today, but only few are widely used. Two of the most prominent are certainly HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP (DASH). While both deliver multimedia data in a very similar manner, they are not compatible with one another. To deliver the same source audio and video data via both formats, it is necessary to store it twice in slightly different representations.
The Common Media Application Format (CMAF) solves this issue by defining a more abstract format that enables different manifests for the same encoded data. This way, the encoded data needs to be stored only once, with one manifest for HLS and one for DASH. Some additional restrictions still apply to enable full compatibility with legacy consumer HLS or DASH players, e.g., the use of fragmented MP4 to package the data. This is especially important for ultra-low latency use cases, where end-to-end delays in the second or even sub-second range are desired.
Why is latency important?
In live broadcasts, such as sporting events and concerts, minimizing latency is critical. Imagine paying for the live stream of a soccer game and hearing your neighbor cheer “goal” five seconds before you even see it happening on your screen. These scenarios can be avoided by trying to reduce the end-to-end latency of the live stream down to a minimum.
Latency is introduced by different components within the streaming pipeline between the broadcaster and the consumer. Some components such as the data link protocol are usually already operating very close to their physical limit, e.g., the speed of light, with sub-millisecond processing delays at the sending and the receiving end. Other components such as the video encoder come in different flavors, many of which include low-latency options to reduce the delay between the last input pixel and the first encoded byte.
One component of the pipeline, however, is sometimes overlooked despite it implicitly adding significant latency – the restrictions of the streaming format. While neither HLS nor DASH explicitly add any practically relevant delays on their own, the way in which they require the sender and the receiver (player) to communicate encoded data at the live edge can add substantial delays of several seconds or even tens of seconds, depending on the size of the data segments.
Why is chunked transfer helpful?
Segments are the groups of frames (or audio samples) that can be fetched at once. They are listed in the manifest and are typically several seconds long. Legacy HLS implementations require that each segment be added to the playlist after all its frames are fully encoded. This adds a latency equal to the duration of the segment, i.e., typically several seconds. In addition, the player can only start processing data after it is aware of the new segment in the playlist and has fetched the segment. This adds additional latency.
CMAF enables segments published on the encoder side to be split into chunks, which can be as small as one single frame. Once a chunk is encoded, it can be published immediately through HTTP Chunked Transfer Encoding, where each chunk as a part of the whole segment is transmitted one after another. This allows the manifest to reference a still incomplete segment that the player can already start fetching if it also supports chunked transfer. This reduces the latency on both sides.
While several updates to HLS, such as the recent Low Latency HLS (LLHLS), enable chunked transfer, they are still mostly incompatible with DASH and not widely implemented. With CMAF compatibility, it is possible to use chunked transfer of the data with two different manifests – one for LLHLS and one for DASH, each with their respective chunk-level playlist representations. Even legacy HLS can be supported this way as players unaware of chunked transfer can be served with the segment as usual once it is complete. Moreover, they still benefit from the latency reduction on the encoder side.
Why does chunked transfer sometimes fail in practice?
Nonetheless, for all latency reductions to fully manifest in real-world use cases, every component of the streaming pipeline must support chunked transfer reliably. Experience from several practical sporting event use cases, collected while building a streaming infrastructure for Austrian solutions provider NativeWaves, has shown that even a single component that does not reliably support chunked transfer can increase the end-to-end delay by multiple seconds. One particularly surprising example are software player implementation specifics.
While many players already support chunked transfer, some of them fetch chunks one after another instead of in parallel or within one connection (via keep-alive). This may cause very small chunks to not be fetched fast enough so that the buffer constantly underflows and playback stutters. If the player implementation cannot be modified, such issues can usually only be fixed by falling back from the live edge and/or by resorting to fetching full segments instead of chunks, which increases latency.
The same is true when player implementations without keep-alive support reach the limits of the receiver operating system, such as the maximum number of concurrent TCP connections. When this number is exceeded, playback may fail with hard-to-pinpoint symptoms such as smooth low-latency playback for multiple minutes, followed by a complete halt of playback until the system-specific timeout settings allow for new connections again.
Clearly, one single component in the streaming pipeline may be sufficient to prevent ultra-low-latency streaming. While CMAF certainly enables playback of ultra-low latency streams for different existing manifest formats, even with backwards-compatibility, it will still take a significant amount of time until all components of the pipeline are ready to actually allow for smooth ultra-low latency end-to-end streaming in practice.
About the author
Dr. Andreas Unterweger is a teaching professor for Media Technology at the Salzburg University of Applied Sciences, with a background in video coding, streaming and multimedia security. He is also the Lead Video Encoding Engineer at NativeWaves, a Salzburg-based company which provides synchronized multi-screen streaming for low-latency use cases.