Congestion Control Working Group H. Shi, Ed. Internet-Draft T. Zhou Intended status: Standards Track Huawei Expires: 11 January 2024 10 July 2023 Advanced Explicit Congestion Notification draft-shi-ccwg-advanced-ecn-00 Abstract This document proposes Advanced Explicit Congestion Notification mechanism enabling host to obtain the congestion information at the bottleneck. The sender sets the congestion information collection command in the packet header indicating the network device to update the congestion information field per hop. The receiver carries the updated congestion information back to the sender in the ACK. The sender then leverage the rich congestion information to do congestion control. Discussion Venues This note is to be removed before publishing as an RFC. Discussion of this document takes place on the Congestion Control Working Group Working Group mailing list (ccwg@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/ccwg/. Source for this draft and an issue tracker can be found at https://github.com/VMatrix1900/draft-ccwg-advanced-ecn. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 11 January 2024. Shi & Zhou Expires 11 January 2024 [Page 1] Internet-Draft AECN July 2023 Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. AECN header format and encapsulation . . . . . . . . . . . . 4 4. Example: HPCC with AECN . . . . . . . . . . . . . . . . . . . 5 5. Security Considerations . . . . . . . . . . . . . . . . . . . 6 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 7.1. Normative References . . . . . . . . . . . . . . . . . . 6 7.2. Informative References . . . . . . . . . . . . . . . . . 6 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 1. Introduction Traditionally, congestion control has depended on implicit congestion detection by the host, where hosts gauge congestion primarily through packet loss or variations in round-trip times. Explicit Congestion Notification (ECN) represents a substantial improvement, as it facilitates network devices to explicitly signal congestion to the endpoints before packet loss occurs. Low Latency, Low Loss, Scalable throughput (L4S) leverages ECN to meticulously control the queuing delay. It uses ECN markings to maintain low queuing delays and avoid bufferbloat. However, ECN is limited by the use of a single bit of information. This limitation constrains the granularity of congestion information that can be conveyed. L4S's requirement for more detailed congestion signals demands an enhanced utilization of ECN, which could involve employing additional bits for a more precise representation of congestion levels and better control over delay and throughput in contemporary network environments. Shi & Zhou Expires 11 January 2024 [Page 2] Internet-Draft AECN July 2023 HPCC[I-D.draft-an-ccwg-hpcc] leverages more extensive congestion signals from the network by utilizing in-band telemetry, which facilitates the gathering of detailed load information from each switch it traverses. This enhanced approach enables HPCC to make more informed decisions on controlling network congestion and converge fast. However, one caveat associated with this approach is that HPCC utilizes an append mode for in-band telemetry. In append mode, as the packet traverses the network, it accumulates data from each switch, which consequently increases the size of the packet. This growth in packet size can potentially lead to issues such as exceeding the Maximum Transmission Unit (MTU) size which makes it unsuitable for the internet. Another caveat is that each sender need to repeat the computation to get the bottleneck information even if they shares the same path. This document defines Advanced ECN which expands the 1 bit congestion notification to multiple bits and enables network device to update the congestion information per hop. When the packet arrives at the receiver, the congestion information field will reflect the congestion status of the path. By offloading the congestion information calculation to the network device, the computing burden of the endpoint can be reduced. 1.1. Terminology * ECN: Explicit Congestion Notification * AECN: Advanced Explicit Congestion Notification * HPCC: High Precision Congestion Control[I-D.draft-an-ccwg-hpcc] * DRE: Discounting Rate Estimator[CONGA] 1.2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Shi & Zhou Expires 11 January 2024 [Page 3] Internet-Draft AECN July 2023 2. Overview Figure 1 shows the overview procedure of AECN. First the sender MUST marks the packet with AECN command and initial Congestion Info(called AECN header, see Section 3). The AECN Command specified what kind of the congestion information that the endpoint intend to collect from network devices. As the packet traverses through the network, each router MUST update the Congestion Info field based on the AECN command and the router's local load condition. Upon reaching the receiver, the updated congestion information within the packet is extracted and then communicated back to the sender, typically using the transport protocol's acknowledgment mechanism. The sender, now equipped with the congestion information reflective of the packet's journey, uses this data to make informed adjustments to its sending rate. pkt+ pkt+ pkt+ AECN Command+ AECN Command+ AECN Command+ +------+Congestion Info0+-------+Congestion Info1+-------+Congestion Info2+--------+ |Sender|===============>|Router1|===============>|Router2|===============>|Receiver| +------+ Link-1 +-------+ Link-2 +-------+ Link-3 +--------+ /|\ | | | +--------------------------------------------------------------------------+ ACKs Figure 1: Overview of Advanced ECN 3. AECN header format and encapsulation Figure 2 shown the format of AECN. The AECN header SHOULD be encapsulated in IPv6 extension header[RFC8200] such as SRH, Hop by Hop Options Header etc. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags | Congestion Info Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Congestion Info Data | ~ .... ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: AECN header format where: Shi & Zhou Expires 11 January 2024 [Page 4] Internet-Draft AECN July 2023 Flags: An 8-bit field. The Bit 7 of Flags indicates the Congestion Info is customized and used only in limited domain such as Data center network. If the Bit 7 is 0, the Congestion Info Type is a bitmap. Other bits are reserved. Congestion Info Type: A 24-bit map that specifies the present Congestion Info Data. Supported Congestion Info Data is listed in Table 1. Note that it is possible for multiple Congestion Info Data to coexist in one packet. Congestion Info Data: A variable length field including the congestion information data. Router MUST update this field based on local load status. +=====+=========================+========+===========+ | Bit | Congestion Info Data | Length | Operation | +=====+=========================+========+===========+ | 0 | Inflight Ratio | 8 | Max | +-----+-------------------------+--------+-----------+ | 1 | DRE | 8 | Max | +-----+-------------------------+--------+-----------+ | 2 | Queue Utilization Ratio | 8 | Max | +-----+-------------------------+--------+-----------+ | 3 | Queue Delay | 8 | Add | +-----+-------------------------+--------+-----------+ | 4 | Congested Hops | 8 | Add | +-----+-------------------------+--------+-----------+ Table 1: Congestion Info Data 4. Example: HPCC with AECN HPCC calculates the inflight ratio of each link(represent the link utilization of the link) from the collected raw load information carried in the INT. Then maximum inflight ratio along the path is identified and used to adjust the sending rate. The formula to calculate the inflight ratio of each link is shown below: txRate = (txBytes_1 - txBytes_2)/(t_1-t_2) inflight ratio = qlen/(B*T) + txRate/B where: txBytes: link total transmitted bytes associated with timestamp ts qlen: link queue length B: link bandwidth Shi & Zhou Expires 11 January 2024 [Page 5] Internet-Draft AECN July 2023 T: Baseline RTT Leveraging AECN, the router participates in calculation of the maximum inflight ratio. Each router MUST calculate the inflight ratio of the down link and then compare it to the one in the AECN header and keep the larger one. When the packet arrives at the endpoint, the Congestion Info field of the AECN header already contains the maximum inflight ratio. The sending rate adjustment algorithm remains unchanged. By allowing routers to conduct these calculations, the computing overhead is reduced for the endpoint. Since the update of value is in-place, the packet size remains unchanged regardless of the hops count. 5. Security Considerations TBD. 6. IANA Considerations TBD. 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", STD 86, RFC 8200, DOI 10.17487/RFC8200, July 2017, . 7.2. Informative References [CONGA] Alizadeh, M., Edsall, T., Dharmapurikar, S., Vaidyanathan, R., Chu, K., Fingerhut, A., Lam, V., Matus, F., Pan, R., Yadav, N., and G. Varghese, "CONGA: distributed congestion-aware load balancing for datacenters", Proceedings of the 2014 ACM conference on SIGCOMM, DOI 10.1145/2619239, August 2014, . Shi & Zhou Expires 11 January 2024 [Page 6] Internet-Draft AECN July 2023 [I-D.draft-an-ccwg-hpcc] An, Q., Gao, J., Anubolu, S., Pan, R., Lee, J., Gafni, B., Shpigelman, Y., Tantsura, J., and G. Caspary, "HPCC++: Enhanced High Precision Congestion Control", Work in Progress, Internet-Draft, draft-an-ccwg-hpcc-00, 30 June 2023, . Authors' Addresses Hang Shi (editor) Huawei Beijing China Email: shihang9@huawei.com Tianran Zhou Huawei Email: zhoutianran@huawei.com Shi & Zhou Expires 11 January 2024 [Page 7]