During the design and development of Agile Content’s award-winning innovation project around cloud-based remote and distributed live media production, we asked ourselves the question “surely there must be a better way to transport audiovisual media streams within our distributed solution than the legacy protocol MPEG-2 TS?”
Although omnipresent in legacy gear, it could simply not support the new features we were building, such as distributed synchronization of streams and transport-independent channel bonding. Also, the prospect of having to deal with TS’s limitations and nowadays unnecessary complexities, like high protocol overhead, time-stamp-based system-clock recovery, sub-par packet loss or re-ordering handling, and complex buffer model, was not appealing to us. By building something of our own, we could create a lean efficient protocol built for today’s networks and addressing all these issues and more, and by open-sourcing it with a permissive license we could share it with the industry, build community engagement and work towards wide support. The result can be seen in our open-source media-stream multiplexing protocol that we named Elastic Frame Protocol, or EFP for short.
What is a multiplexing protocol and why is MPEG-2 TS ripe for replacement?
The purpose of a stream multiplexing protocol is to aggregate a number of sub-streams such as video, audio, and subtitle tracks into a single stream, and to add metadata into this aggregated stream that identifies and describes the different sub-streams as well as how they relate to each other, e.g. regarding timing. MPEG-2 Transport Stream (TS) is today the most common such multiplexing protocol for streaming media, at least in the fields of production, contribution and primary distribution of broadcast television. However, being designed back in 1995 with the design objectives and constraints from that time, it is much less suited to modern conditions. At the time when MPEG-2 TS was conceived, IP transport of media was at its infancy, and the focus was on one-way communication over error-prone channels with constant delay, such as virtual circuit switched Asynchronous Transfer Mode (ATM) networks. Since our target environment is modern IP-based networks, we wanted to take advantage of their duplex nature and variable MTU sizes, while being robust to the jitter incurred in packet-switched networks.
Key areas of improvement
There were a number of areas where we needed better performance than MPEG-2 TS could offer:
1) Timestamps and stream synchronization: In a multiplexed stream, each frame (consisting of one or several samples of some media) in any of the sub-streams has an associated timestamp that tells the receiver when it shall be presented to the end user, relatively to some local reference in the receiver. MPEG-2 TS has 33-bit timestamps at a frequency of 90 kHz, and therefore wraps around every 26.5 hours. There are no requirements on initialization and no mapping to any absolute time standard. The timestamps are only used to synchronize relative to the other frames in the same multiplexed stream. For our use-case we wanted to be able to carry a high-precision strictly monotonic absolute timestamp, to be able to not only synchronize within one multiplexed stream but between multiplexed streams from geographically distributed sources, and to specify exactly when they should be displayed based on the International Atomic Time (TAI) reference. Timestamps in EFP are 64-bit wide to allow for representation of TAI in a suitably high precision. Furthermore, one can choose a timescale that is compatible with the video framerate in order to avoid the rounding errors that MPEG-2 TS has for 59.94fps video.
2) Error detection and recovery: When transporting data over lossy or error-prone networks, there will be occasional damage done to some of that data. For IP networks, packet loss and packet reordering are relevant. MPEG-2 TS only has a 4-bit counter for each sub-stream to indicate packet order. This is most times enough to give an indication of the presence of packet loss but not what volumes of it, and it is generally not enough to allow for correction of reordered packets. EFP has a hierarchy of two 16-bit packet counters. This allows to indicate how many packets were lost, and is used to correct any packet reordering that occurs on the link. In addition, the packet sequence numbers can be used by a thin extension layer to the protocol to support seamless switching, so called 1+1 or 1+n protection, and different forms of bonding and load sharing among different physical network links.
3) Protocol overhead: All protocols add some form of overhead, i.e. data that is transported that is not a part of the actual payload. MPEG-2 TS, due to the prerequisites at its inception, carries quite a lot of protocol overhead that is unnecessary when transported over IP networks. Examples of this include measures to identify framing boundaries and bit errors when transporting a bit stream, such as byte sequences indicating starts of different levels of sub-packets and various checksums. Also, TS has fixed size packets and thus requires padding at the end of the payload. There is also repetitive content metadata which will be discussed in more detail in the next item below. When transporting over IP, the 188 bytes long TS packets are often collected several in a single IP packet, up to seven TS packets in one IP packet. However, this still only amounts to 1316 bytes in each IP packet, so this scheme is sub-optimally utilizing the IP packet MTU and is thus incurring higher relative overhead from the headers in the upper protocol layers like UDP, IP and Ethernet. Finally, TS is often sent out as constant bitrate (CBR) which is achieved by padding the multiplexed stream with null packets, typically in the range of 5-10%. EFP was designed from the start for IP transport, and thus is able to use the full MTU. It can also rely on mechanisms in the upper protocol layers for consistency checking. For a deeper discussion and practical examples on overhead please check out our IEEE paper on the subject linked to at the bottom of this post.
4) Duplex content signaling: All multiplexing protocols need some way to communicate to the receiver what sub-streams are contained in the multiplexed stream, and what their properties are. MPEG-2 TS sends this content signaling metadata embedded within the multiplexed stream in the form of a frequently repeating table-of-contents. It needs to repeat often since the receiver has no way to request the signaling information, but is instead resigned to parse the multiplexed stream and wait until the table-of-contents occurs before it can start digesting the input stream. EFP is able to in-line the content signaling metadata in a similar way, but more powerfully, it can utilize the duplex nature of IP networks to signal the content metadata in an out-of-band reliable channel such as a websocket, allowing the receiver to query the sender, and to be updated on changes, without the overhead of repetitive in-line signaling. Another possibility in duplex signaling in one-to-one connections is for the sender to declare the contents and the receiver to subscribe to specific parts of a sender’s stream.
5) Open-source reference implementation: While MPEG-2 TS is the de-facto standard today due to its wide-spread support in legacy products, the fact is that there is no open-source library that fully supports the full specification in software, due to the complexities therein. With EFP, the aim is for a definitive reference implementation that is portable and easily integrated, with a simple and clean yet powerful C++ interface, and a friendly license type.
There are more improvements than these, as can be found in the supporting material listed at the end of this blog entry.
Avoid complexity, keep EFP lean and efficient
In addition to these new requirements, we also wanted to free ourselves of much of the complexity in MPEG-2 TS that originate from its initial intended use. For example, TS has a complex buffer model that originates from the fact that the receivers were often hardware-based decoders or set-top-boxes with severely constrained amounts of memory available. Our target environments are modern software- and GPU-based encoders and decoders that do not need these narrow constraints.
As to time handling, since MPEG-2 TS was designed to be sent in one-way connections and synchronizing clocks in many cases was very difficult, the receivers instead recover the sender’s system clock using special high-resolution timestamps, named the Program Clock Reference (PCR), sent regularly in the stream. But this clock recovery process is complex and is adversely affected by jitter that occurs in large packet-switched and routed networks. EFP relies on duplex-based time synchronization instead.
Further reading
For more detailed information on EFP and MPEG-2 TS and how they stack up to each other, please check out our recent scientific paper evaluating EFP vs MPEG-2 TS that was presented at the 2022 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting. The paper can be found here.
The EFP open-source project is hosted on Github, and can be found here. We would love for you to check it out and let us know what you think, community participation in the project is greatly encouraged!