rtgwg Y. Wang Internet-Draft China Telecom Intended status: Standards Track C. Lin Expires: 8 January 2024 New H3C Technologies A. Wang China Telecom 7 July 2023 IGP Prefix Independent Convergence draft-wang-rtgwg-igp-pic-01 Abstract In many cases, a large number of routes can be reached by multiple next hops. When a link fails, route calculation needs to be performed and a new reachable path needs to be calculated. If all routes are re-calculated and refreshed, the calculation time increases linearly as the number of routes increases, resulting in a long time for route convergence. This document describes an architecture where the number of prefixes is independent. This architecture allows routes to be recalculated when paths change, regardless of the number of IGP routes. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 8 January 2024. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. Wang, et al. Expires 8 January 2024 [Page 1] Internet-Draft rtgwg July 2023 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions used in this document . . . . . . . . . . . . . . 3 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 4. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4.1. Dependency . . . . . . . . . . . . . . . . . . . . . . . 4 4.2. FRR Consideration . . . . . . . . . . . . . . . . . . . . 4 4.3. IGP-PIC Illustration . . . . . . . . . . . . . . . . . . 5 5. ISIS PIC . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5.1. Maintenance of ISIS IGP-nodes . . . . . . . . . . . . . . 7 5.2. PIC Route Compute . . . . . . . . . . . . . . . . . . . . 8 6. OSPF PIC . . . . . . . . . . . . . . . . . . . . . . . . . . 8 6.1. Maintenance of OSPF IGP-nodes . . . . . . . . . . . . . . 8 6.2. PIC Route Compute . . . . . . . . . . . . . . . . . . . . 9 7. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 7.1. ISIS PIC Route . . . . . . . . . . . . . . . . . . . . . 9 7.2. OSPF PIC Route . . . . . . . . . . . . . . . . . . . . . 10 8. Normative References . . . . . . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 1. Introduction In modern networks, it is not uncommon to have a prefix reachable via multiple paths. When the primary link fails, routes must be converged again as soon as possible. For the OSPF route calculation process, see [RFC2328]. 1) Calculate the shortest path (spf) tree from the root node to all routing nodes based on the link status. 2) The cost of each prefix is calculated according to the distance between the root node and the router node in the shortest path tree. When the number of prefixes increases, route convergence slows down. Wang, et al. Expires 8 January 2024 [Page 2] Internet-Draft rtgwg July 2023 This document proposes a hierarchical shared forwarding chain organization that allows traffic to be restored in time periods independent of prefix number. This technology relies on internal router behavior that is completely transparent to operators and can be deployed and enabled progressively without operator intervention. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] . 3. Terminology The following terms are defined in this draft: * IGP prefix: A prefix P/m (of any AFI/SAFI) that is learnt via an Interior Gateway Protocol, such as OSPF and ISIS, has a path for. The prefix may be learnt directly through the IGP or redistributed from other protocol(s) * OSPF ABR Node: OSPF Area Boundary Router, A OSPF router between multiple areas * OSPF ASBR Node: OSPF AS boundary router, A OSPF router that exchanges routing information with routers in other AS * OSPF Node: A node is associated with a real OSPF router or the combination of multiple OSPF routers that advertise the same prefix. Real OSPF Routers include OSPF ABR Node, OSPF ASBR Node, and OSPF ordinary Node. * ISIS Node: A node is associated with a real ISIS router or the combination of multiple ISIS routers that advertise the same prefix * IGP Node: including OSPF Node and ISIS Node 4. Overview The idea of IGP-PIC is based on two pillars, 1) A shared forwarding Chain: Instead of having q separate list of next-hops for each destination, all destinations sharing the same list of next-hops can point to a single copy of this thereby allowing fast convergence by making changes to a single shared list of next- hops rather than possibly a large number of destinations. Wang, et al. Expires 8 January 2024 [Page 3] Internet-Draft rtgwg July 2023 2) A forwarding plan that support multiple levels of indirection: A forwarding that starts with a destination and ends with an outgoing interface is not a simple flat structure. Instead a forwarding entry is constructed via multiple levels of dependency. Designing a forwarding plane that constructs multi-level forwarding chains with maximal sharing of forwarding objects allows rerouting a large number of destinations by modifying a small number of objects thereby achieving convergence in a time frame that does not depend on the number of destinations. Similar to the implementation of BGP-PIC, see[I-D.ietf-rtgwg-bgp-pic]chapter 2 for details. 4.1. Dependency This section describes the required functionality in the forwarding and control planes to support IGP-PIC described in the document. IGP PIC requires a hierarchical hardware FIB support: for each IGP forwarded packet, a destination is looked up, then an IGP Node, then an Adjacency. 4.2. FRR Consideration As per [RFC5286] Rapid failure repair is achieved through use of precalculated backup next-hops that are loop-free and safe to use until the distributed network convergence process completes. So based on backing up the next hop of the current route in advance, FRR can achieve rapid switching of faulty links. +-----+ /----| S |----\ / +-----+ \ / 5 8 \ / \ +-----+ +-----+ | E | | N_1 | +-----+ +-----+ \ / \ \ 4 3 / / \| \ / |/ -+ \ +-----+ / +- \---| D |---/ +-----+ Figure 1: Node Protection Topology Wang, et al. Expires 8 January 2024 [Page 4] Internet-Draft rtgwg July 2023 As shown in the figure, the optimal next hop from original device S to D is E. If we take N_1 as the next hop for backup from S to E, when there is a fault between S and E, the data packet to D is handed over to N_1. It can be forwarded to D normally, so N_ 1 has the qualification for backup next hop from S to E. But if the COST value of the direct link from N_1 to D is greater than 17,before the route on N_1 converges again, the next jump from N_1 to D is S instead of D thus forming a temporary loop. So as per [RFC5286] A neighbor N_1 can provide a loop-free alternate (LFA) if and only if Distance_opt(N_1, D) < Distance_opt(N_1, S) + Distance_opt(S, D) +-----+ +-----+ | S |-------| N | +-+---+ 6 +-----+ | | | 5 2 | | | | +-----+ | +----| E |---+ +--+--+ | | 3 | +--+--+ | D | +-----+ Figure 2: Link Protection Topology Another typical scenario is shown in figure 2. When S and N Both have enabled IP FRR, so S and N will treat each other as their backup to the next hop of the D main path. At this time, when downstream node E fails, S and N will send messages to D to each other and resulting in a microloop. So the priority of node protection is higher than that of link protection. 4.3. IGP-PIC Illustration Wang, et al. Expires 8 January 2024 [Page 5] Internet-Draft rtgwg July 2023 +---+ +------|R2 |-------+ | +---+ | | | +---+ +---+ Prefix-1 |R1 | |R4 | Prefix-2 +---+ +---+ ... | | Prefix-n | +---+ | +------|R3 |-------+ +---+ Figure 3: Single source PIC network diagram As shown in the figure 1, R4 advertides n prefix routes. R1->R2->R4, R1->R3->R4. When the link between R1 and R2 is faulty, route calculation is performed again. Topology calculation is performed first to calculate the path to R4 from the original equal-cost path to the single path R1->R3->R4. Routes from prefix-1 to prefix-n are recalculated, and forwarding entries are updated for all routes. When the number of prefix-1 to prefix-n increases, the time for route calculation and forwarding table update increases as the number of routes increases, which slows route convergence. For prefix-1 to prefix-n routes, since they are all advertised by R4, their paths are the same after switching. In route calculation, the change of the route to R4 only needs to be calculated once, and the forwarding table to R4 needs to be updated to the new forwarding path. The route from Prefix-1 to Prefix-n can be updated. This is the convergence of prefix-independent routes. Before PIC route calculation, the prefix needs to be associated with the IGP Node. In the current example, the IGP node is the real router R4. Prefix IGP Node NextHop +--------+ +----------+ |Prefix-1| |R4 | ---->R2 |Prefix-2|--->| | ---->R3 |... | +----------+ |Prefix-n| +--------+ Figure 4: Single source PIC Forward When path switching occurs, only the forwarding path of the IGP node needs to be updated from the equal-cost route ECMP path R2+R3 to R3, without recalculating and updating all prefixes. This saves the time Wang, et al. Expires 8 January 2024 [Page 6] Internet-Draft rtgwg July 2023 of route calculation and forwarding table update, and improves the speed of route convergence. In the process of PIC route calculation update, that is, the next hop information to the corresponding IGP node is updated regardless of the specific prefix. +---+? ? +---+? Prefix-1 +------|R2 |-------|R4 | Prefix-2 | +---+ +---+ ... | Prefix-n +---+ |R1 | +---+ | | +---+ +---+ Prefix-1 +------|R3 |-------|R5 | Prefix-2 +---+ +---+ ... Prefix-n Figure 5: Multi-source PIC network diagram In the case of multiple sources, the multiple destination nodes are combined into combined IGP node and the path is calculated for this combined node. Prefix IGP Node NextHop +--------+ +----------+ |Prefix-1| |R4,R5 | ---->R2 |Prefix-2|--->| | ---->R3 |... | +----------+ |Prefix-n| +--------+ Figure 6: Multi-source PIC Forward When the path changes, route calculation is performed again for the combined node (R4,R5), and the forwarding path is updated from the original R2+R3 to R3 without route calculation for all prefixes and forwarding table flushing. 5. ISIS PIC 5.1. Maintenance of ISIS IGP-nodes For single-source prefixes, when an ISIS LSP is received carries the prefix TLV, an ISIS IGP Node is created and associated with the prefix. The key of ISIS IGP Node is system-id, level, and topo. If the prefix is advertised by the LSP of the pseudo node, the key of ISIS IGP Node is system-id, pseudo node ID, level, and topo. Wang, et al. Expires 8 January 2024 [Page 7] Internet-Draft rtgwg July 2023 For multi-source prefixes, Multiple ISIS routers advertise the same prefix through LSPs, a combined ISIS IGP node is create and associated with the prefix. The key of the combined ISIS IGP node is multiple (system-id, level, and topo). 5.2. PIC Route Compute The procedure for route calculation is as follows, (1) Calculating the shortest-path tree for Level-1 and Level-2. (2) Calculate each routes for Level-1 and Level-2. When support PIC Route Compute, The procedure for route calculation is as follows, (1) Calculating the shortest-path tree for Level-1 and Level-2. (2) Instead of calculating routes based on each prefix, the next hop information is updated based on IGP-node. 6. OSPF PIC 6.1. Maintenance of OSPF IGP-nodes The key of OSPF IGP-node is router-id, area, and topo. When the prefix is advertised through a router-LSA, the OSPF IGP-node is create and the key is router-id, area, and topo. When the prefix is advertised through a network-LSA, the key of OSPF IGP-node is router-id, DR IP-Address, area, and topo. When the prefix is advertised through Type-3 summary-LSA, the key of OSPF IGP-node is ABR router-id, area, and topo. When the prefix is advertised through Type-5 AS-external-LSA, the key of OSPF IGP-node is ASBR router-id, Forwarding Address, and topo. For multi-source prefixes, Multiple OSPF routers advertise the same prefix through LSAs, a combined OSPF IGP-node is create and associated with the prefix. The key of the combined OSPF IGP-node is multiple (router-id, area, and topo). Wang, et al. Expires 8 January 2024 [Page 8] Internet-Draft rtgwg July 2023 6.2. PIC Route Compute For OSPF route calculation, see [RFC2328], chapter 16, Calculation of the routing table. The procedure for route calculation is as follows, (1) Calculating the shortest-path tree for an area, and then calculate the intra-area routes. (2) Calculating the inter-area routes by examining summary-LSAs. (3) Examining transit areas' summary-LSAs. (4) Calculating AS external routes. When support PIC Route Compute, The procedure for route calculation is as follows, (1) Calculating the shortest-path tree for an area, and then calculate the intra-area routes. Instead of calculating intra-area routes based on each prefix, the next hop information is updated based on IGP-node. (2) Calculating the inter-area routes by examining summary-LSAs. If the ABR IGP-node has been updated, the inter-area routes do not need to be recalculated. (3) Examining transit areas' summary-LSAs. Instead of calculating routes based on each prefix, the next hop information is updated based on Intra IGP-node and ABR IGP-node. (4) Calculating AS external routes. If the ASBR IGP-node has been updated, the AS external routes do not need to be recalculated. 7. Example 7.1. ISIS PIC Route When the link to the IGP node changes, the topology is re-calculated and the corresponding next hop list is updated, without updating the forwarding table for each prefix. Wang, et al. Expires 8 January 2024 [Page 9] Internet-Draft rtgwg July 2023 0000.0000.0002 12.1.1.2+---+ +-------|R2 |-----------+ | +---+ | |if1,12.1.1.1 | +---+ +---+ 192.0.0.1/32 |R1 | 0000.0000.0001 |R4 | 0000.0000.0004 192.0.0.2/32 +---+ +---+ ... |if2,13.1.1.1 | 192.168.0.10/32 | +---+ | +-------|R3 |-----------+ 13.1.1.3+---+ 0000.0000.0003 Figure 7: Single source ISIS PIC network diagram Prefix IGP Node NextHop +-------------+ +----------------+ |192.0.0.1/32 | |0000.0000.0004 | ---->R2(Via 12.1.1.2,if1) |192.0.0.2/32 |--->| | ---->R3(Via 13.1.1.3,if2) |... | +----------------+ |192.0.0.10/32| +-------------+ Figure 8: Single source ISIS PIC Forward Prefix IGP Node NextHop +-------------+ +----------------+ |192.0.0.1/32 | |0000.0000.0004 | |192.0.0.2/32 |--->| | ---->R3(Via 13.1.1.3,if2) |... | +----------------+ |192.0.0.10/32| +-------------+ Figure 7: Single source ISIS PIC Forward If the path to R2 is faulty, re-calculate the route and update the next hop information of the IGP node associated with R4. 7.2. OSPF PIC Route When the link to the IGP node changes, the topology is re-calculated and the corresponding next hop list is updated, without updating the forwarding table for each prefix. Wang, et al. Expires 8 January 2024 [Page 10] Internet-Draft rtgwg July 2023 22.22.22.22 12.1.1.2+---+ +-------|R2 |-----------+ | +---+ | |if1,12.1.1.1 | +---+ +---+ 192.0.0.1/32 |R1 | 11.11.11.11 |R4 | 44.44.44.44 192.0.0.2/32 +---+ +---+ ... |if2,13.1.1.1 | 192.168.0.10/32 | +---+ | +-------|R3 |-----------+ 13.1.1.3+---+ 33.33.33.33 Figure 9: Single source OSPF PIC network diagram Prefix IGP Node NextHop +-------------+ +----------------+ |192.0.0.1/32 | |44.44.44.44 ---->R2(Via 12.1.1.2,if1) |192.0.0.2/32 |--->| ---->R3(Via 13.1.1.3,if2) |... | +----------------+ |192.0.0.10/32| +-------------+ Figure 10: Single source OSPF PIC Forward Prefix IGP Node NextHop +-------------+ +----------------+ |192.0.0.1/32 | |44.44.44.44 |192.0.0.2/32 |--->| ---->R3(Via 13.1.1.3,if2) |... | +----------------+ |192.0.0.10/32| +-------------+ Figure 11: Single source OSPF PIC Forward If the path to R2 is faulty, re-calculate the route and update the next hop information of the IGP node associated with R4. 8. Normative References [I-D.ietf-rtgwg-bgp-pic] Bashandy, A., Filsfils, C., and P. Mohapatra, "BGP Prefix Independent Convergence", Work in Progress, Internet- Draft, draft-ietf-rtgwg-bgp-pic-19, 1 April 2023, . Wang, et al. Expires 8 January 2024 [Page 11] Internet-Draft rtgwg July 2023 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, DOI 10.17487/RFC2328, April 1998, . [RFC5286] Atlas, A., Ed. and A. Zinin, Ed., "Basic Specification for IP Fast Reroute: Loop-Free Alternates", RFC 5286, DOI 10.17487/RFC5286, September 2008, . Authors' Addresses Yue Wang China Telecom Beiqijia Town, Changping District Beijing Beijing, 102209 China Email: wangy73@chinatelecom.cn Changwang Lin New H3C Technologies China Email: linchangwang.04414@h3c.com Aijun Wang China Telecom Beiqijia Town, Changping District Beijing Beijing, 102209 China Email: wangaj3@chinatelecom.cn Wang, et al. Expires 8 January 2024 [Page 12]