Why TCP Over TCP Is A Bad Idea - 其他编程语言

标题: Why TCP Over TCP Is A Bad Idea

时间: 2016-05-23

点击: 3948

来自：http://sites.inka.de/~W1011/devel/tcp-tcp.html
即将翻译本文

Why TCP Over TCP Is A Bad Idea

A frequently occurring idea for IP tunneling applications is to run a protocol like PPP, which encapsulates IP packets in a format suited for a stream transport (like a modem line), over a TCP-based connection. This would be an easy solution for encrypting tunnels by running PPP over SSH, for which several recommendations already exist (one in the Linux HOWTO base, one on my own website, and surely several others). It would also be an easy way to compress arbitrary IP traffic, while datagram based compression has hard to overcome efficiency limits.

Unfortunately, it doesn't work well. Long delays and frequent connection aborts are to be expected. Here is why.

TCP's retransmission algorithm

TCP divides the data stream into segments which are sent as individual IP datagrams. The segments carry a sequence number which numbers the bytes in the stream, and an acknowledge number which tells the other side the last received sequence number. [RFC793]

Since IP datagrams may be lost, duplicated or reordered, the sequence numbers are used to reassemble the stream. The acknowledge number tells the sender, indirectly, if a segment was lost: when an acknowledge for a recently sent segment does not arrive in a certain amount of time, the sender assumes a lost packet and re-sends that segment.

Many other protocols using a similar approach, designed mostly for use over lines with relatively fixed bandwidth, have the "certain amount of time" fixed or configurable. In the Internet however, parameters like bandwidth, delay and loss rate are vastly different from one connection to another and even changing over time on a single connection. A fixed timeout in the seconds range would be inappropriate on a fast LAN and likewise inappropriate on a congested international link. In fact, it would increase the congestion and lead to an effect known as "meltdown".

For this reason, TCP uses adaptive timeouts for all timing-related parameters. They start at conservative estimates and change dynamically with every received segment. The actual algorithms used are described in [RFC2001]. The details are not important here but one critical property: when a segment timeouts, the following timeout is increased (exponentially, in fact, because that has been shown to avoid the meltdown effect).

Stacking TCPs

The TCP timeout policy works fine in the Internet over a vast range of different connection characteristics. Because TCP tries very hard not to break connections, the timeout can increase up to the range of several minutes. This is just what is sensible for unattended bulk data transfer. (For interactive applications, such slow connections are of course undesirable and likely the user will terminate them.)

This optimization for reliability breaks when stacking one TCP connection on top of another, which was never anticipated by the TCP designers. But it happens when running PPP over SSH or another TCP-based protocol, because the PPP-encapsulated IP datagrams likely carry TCP-based payload, like this:

(TCP over IP over PPP over SSH over TCP over IP)

Note that the upper and the lower layer TCP have different timers. When an upper layer connection starts fast, its timers are fast too. Now it can happen that the lower connection has slower timers, perhaps as a leftover from a period with a slow or unreliable base connection.

Imagine what happens when, in this situation, the base connection starts losing packets. The lower layer TCP queues up a retransmission and increases its timeouts. Since the connection is blocked for this amount of time, the upper layer (i.e. payload) TCP won't get a timely ACK, and will also queue a retransmission. Because the timeout is still less than the lower layer timeout, the upper layer will queue up more retransmissions faster than the lower layer can process them. This makes the upper layer connection stall very quickly and every retransmission just adds to the problem - an internal meltdown effect.

TCPs reliability provisions backfire here. The upper layer retransmissions are completely unnecessary, since the carrier guarantees delivery - but the upper layer TCP can't know this, because TCP always assumes an unreliable carrier.

Practical experience

The whole problem was the original incentive to start the CIPE project, because I used a PPP over SSH solution for some time and it proved to be fairly unusable. At that time it had to run over an optical link which suffered frequent packet loss, sometimes 10-20% over an extended period of time. With plain TCP, this was just bearable (because the link was not congested), but with the stacked protocols, connections would get really slow and then break very frequently.

This is the detailed reason why CIPE uses a datagram carrier. (The choice for UDP, instead of another IP-level protocol like IPsec does, is for several reasons: this allows to distinguish tunnels by their port number, and it adds the ability to run over SOCKS.) The datagram carrier has exactly the same characteristics as plain IP, for which TCP was designed to run over.

Olaf Titz
Last modified: Mon Apr 23 11:50:59 CEST 2001

[隐藏样式|查看源码]

2 . 为什么TCP在TCP是一个坏主意

对于IP隧道应用的一个频繁出现的想法是运行像PPP，这适合于流传输（如调制解调器线路）格式封装的IP数据包的协议，在基于TCP的连接。这将是运行PPP通过SSH，这其中有许多建议已经存在（一个在Linux的HOWTO基地之一，在我自己的网站，并肯定其他几个人）加密隧道一个简单的解决方案。它也将是压缩任意IP流量的简单方法，而基于数据报压缩有难以克服的效率的限制。

不幸的是，它不能很好地工作。长时间的延迟，频繁的连接中止是可以预期的。这是为什么。

TCP的重传算法

TCP将所述数据流分成它们作为单独的IP数据报发送的段。段携带序号哪些号码流中的字节，和一个应答告诉对方的最后接收的序列号数。 [RFC793]

因为IP数据报可能会丢失，复制或重新排序，该序列号用于重新组装该流。应答号码通知发送方，间接地，如果一个段丢失：当确认了最近发送的片段在一定量的时间没有到达，发送者假定丢失分组和重新发送一个段。

许多其他协议用类似的方法，多为具有相对固定的带宽使用以上行设计的，具有“一定的时间”固定的或可配置的。然而，在互联网上，象带宽，延迟和丢失率的参数是从一个单一的连接于一个连接到另一个，甚至随时间变化很大的不同。在秒范围内的固定超时是不合适的快速局域网上，同样不恰当的拥塞国际链接。实际上，它会增加拥塞，并导致被称为“崩溃”的效果。

出于这个原因，TCP使用所有时间相关的参数自适应超时。他们开始在保守的估计，并与每一个接收的段动态变化。所用的实际算法在[RFC2001]中描述。细节不在这里重要但一关键特性：当一个段超时，以下超时增加（指数，事实上，因为这已经显示出，以避免崩溃效果）。

堆叠技术合作计划

TCP超时政策在互联网上种类繁多的不同连接特性正常工作。由于TCP试图很难不中断连接，超时可以增加至几分钟的范围内。这正是是明智的无人值守的批量数据传输。（对于交互式应用程序，例如连接速度慢是当然不可取的，有可能的用户将终止它们。）

堆叠在另一个的顶部，这是从未由TCP设计者预期一个TCP连接时，该优化的可靠性断裂。但通过SSH或其他基于TCP的协议运行PPP时，因为PPP封装的IP数据报有可能进行基于TCP的有效载荷，这样的情况发生：

（TCP IP上的PPP上通过SSH通过TCP通过IP）

注意，上部和下部层的TCP具有不同的定时器。当一个上层连接启动快，它的定时器是快。现在，它可以发生，下部连接有慢计时器，也许从一个缓慢或不可靠的底座连接一个时期遗留下来的。

想象的时候，在这种情况下，基极连接开始丢失分组时会发生什么。下层的TCP排队的重发，并增加其超时。由于连接被阻止这段时间，上层（即负载）TCP不会得到及时的ACK，也将排队重传。由于超时仍比下层超时少，上层将排队更多的重传比下层可以处理它们更快。这使得上层连接失速速度非常快，每一个转播只是增加了问题 - 一个内部危机的影响。

可靠性的技术合作计划规定适得其反这里。上层重传是完全没有必要的，因为载波保证交付 - 但上层TCP无法知道这一点，因为TCP总是假定有一个不可靠的载体。

实践经验

整个问题是原来的激励启动CIPE项目，因为我用了一段时间一个PPP通过SSH解决方案，它被证明是相当不可用。在那个时候它不得不运行在其上在延长的时间周期经常遭受丢包，有时10-20％的光学链路。与普通的TCP，这只是忍受（因为链接不拥挤），但与层叠的协议，连接会得到很慢，然后突破非常频繁。

这是详细原因CIPE使用一个数据报的载体。（对于UDP的选择，而不是另一个IP层协议IPsec等确实是有以下几个原因：这允许它们的端口号来区分隧道，并将其添加到运行在SOCKS的能力。）数据报载体具有完全相同的特点为纯IP，为此，TCP旨在碾过。

奥拉夫·蒂茨
最后修改时间：周一4月23日11时五十分59秒CEST 2001年

(a4bai/@Ta/2016-05-23 10:42/样/源)

3 . 汪

(qwe123/@Ta/2016-06-02 11:39/样/源)

4 . 喵

(qwe123/@Ta/2016-06-02 11:39/样/源)

5 . 喵呜

(qwe123/@Ta/2016-06-02 11:39/样/源)

6 . 又在发我看不懂的东西了

(love封尘/@Ta/2016-06-02 12:29/样/源)