Chewie, don’t touch TCP/IP!
November 16, 2020
However hard it may be to sometimes accept; change is part of progress and evolution. And it usually has implications on several other aspects surrounding or interlinked with the main item undergoing transformation. But the reality is that change does not come holding hands with everything it “touches” and we typically rely on systems’ backwards compatibility properties.
The Star Wars Universe, a blockbuster space-opera franchise which requires no introductions, is a great example of that. It depicts futuristic technologies that any kid—and even grown-ups—would wish for if only they existed. But, although in Star Wars they’ve mastered technologies such as carbonite freezing, lightsabres, interstellar-travel “hyperdrives”, advanced cybernetics, “hypermatter” fusion reactors, and artificial gravity, somehow in that universe—just as in our own—they too still rely on much older and sometimes “pre-Old-Republic looking” technologies which, perhaps due to the basic nature of the purpose they aim to serve, haven’t yet evolved to more high-tech versions as others undoubtedly did.
“You can’t stop the change, any more than you can stop the suns from setting.” – Shmi Skywalker
Similarly, in the Telecommunications Universe, the good old TCP protocol, developed in 1974 by Robert Khan and Vinton Cerf, is still today the most used Internet data transmission protocol. The Transmission Control Protocol (TCP) was introduced to provide a reliable connection-oriented communication channel between two end hosts, making sure that the data gets through to the other host. It was and essentially has been the very basis of the internet as we know it. Since then, the TCP/IP protocols suite has evolved to meet the changing requirements of the Internet. Nevertheless, it was designed for an age where communications occurred mainly between computers and terminals in fixed locations, and in which the user interface was text rather than dynamic media such as video or audio.
It has been some years now since mobile operators have first identified problems with the use of TCP/IP in core and access networks. Those include spectrum inefficiencies, high battery consumption, large latency, and degraded throughput performance. Lately, TCP/IP has even been deemed unsuitable—by the ETSI ISG group—for the more advanced services that 5G expects to offer.
It is not surprising that achieving low latency and high through-puts in mobile networks is more challenging than in wired networks. It’s mainly because mobile networks have additional specifications, such as variable channels, fast fluctuating capacities, large per-user buffers, and uplink/downlink scheduling delays. From this, one could rightfully expect that the most used data transmission protocol—TCP—should by now accommodate such variables. However, that is not true in many cases, and in fact current commonly used TCP variants perform quite poorly on mobile networks.
In mobile networks the link capacity changes dynamically while stochastic packet losses also exist. Thus, the originally loss-based TCP congestion control algorithms (CCA) are not suitable for mobile networks, regardless of the technology (e.g. 3G and 4G). The problem is even further aggravated when the user multitasks, i.e. when short web flows are mixed with longer running background flows.
In loss-based CCAs the sender will slow down its sending rate when it sees a packet loss. However, due to mobile networks’ large buffers and the link layer re-transmissions, packet losses on the radio interface are concealed from TCP senders. These two facts combined lead to a continuously increasing TCP sending rate, on the sender side, regardless of whether it has already exceeded the actual link capacity for a given user under certain radio conditions and radio resources utilization ratio. And, when in fact all the excessive sent packets are being absorbed by the base station buffers. This “buffer bloating” results in increased one-way delay and RTT.
There are also additional issues to consider with the basic loss-based TCP variants. For example, the varying channel conditions of the radio link can cause compression of acknowledgements on the uplink, which then arrive in bursts, causing a burst of outgoing data. Packets in that burst can further aggravate the issue, leading to congestion or/and delay spikes on the downlink.
Currently commonly used solution
Obviously something had to be done to overcome the inadequacies of TCP/IP to mobile networks; several different TCP protocol adaptations have be developed since then, and a specific workaround on the UE side was put in place by mobile phone manufacturers. Trials that were run by the world’s major mobile operators have shown that the TCP congestion window grows to a static value and stays there until the end of the session. That means that the in-going rate of data arriving in the nodeB/eNodeB (from the server to the base station) is fixed and typically on a large value, which for several cases, coverage-wise, is far larger than it should be comparing with the actual possible outgoing rate (i.e. from the base station to the UE). A fixed value is clearly far from ideal considering the variable nature of the radio channels. Moreover, it was also found that such static value varies between different OS brands, handset versions, and between RAN technologies.
The congestion window size (cwnd) which is essentially the amount of transmitted data, is negotiated between the server and the UE. On the other hand, the receive window (rwnd), which works as a buffer on the receiver endpoint, is communicated by the UE to the server. However, the amount of in-transit unacknowledged data within a TCP connection cannot exceed the minimum value between both cwnd and rwnd. Thus, this means that the handsets’ static TCP receive windows work as a way of mitigating the ever-increasing TCP congestion window on the server side.
Without the static receive window size the congestion window would in fact grow towards much greater values, because the radio link is not treated by the E2E TCP as a bottleneck link. The radio nodes’ large buffers mask the actual channel capacity and congestion, and the server is not able to lower the in-going rate in response to worse radio conditions or radio congestion. Common TCP protocol variants treat radio node essentially as relay nodes, not allowing the server to become aware of the current radio conditions and available capacity. As a result of this “server blindness”, the TCP transmission rate grows unaware of the radio conditions and resources utilization, leading to a buffer-bloating effect, increased DL latency and degraded DL throughput.
The hunt for next generation protocols which overcome the TCP/IP limitations and inefficiencies for wireless communications continues. But the Universe is vast, filled with perils, and the force answers only to those who have a strong connection with it. Until then operators need to be aware and not overlook this transport layer issue, and to accommodate the required changes to their core networks or users’ handsets, so as to guarantee the best possible latency and throughput experiences to their customers. At Aspire we are more than happy to support our customers with the secrets of the force and its’ many telecommunications challenges.