Internet-Draft Exp. Implementing Packet Discard Class. August 2023
Evans & Pylypenko Expires 16 February 2024 [Page]
Workgroup:
Independent Stream
Internet-Draft:
draft-evans-discardclass-03
Published:
Intended Status:
Informational
Expires:
Authors:
J. Evans
Amazon
O. Pylypenko
Amazon

Experience from implementing a new packet discard classification scheme

Abstract

Router reported packet loss is the primary signal of when a network is not doing its job. Some packet loss is normal or intended in TCP/IP networks, however. To minimise network packet loss through automated network operations we need clear and accurate signals of all packets which are dropped and why. This document describes our experience from implementing a packet loss classification scheme to provide these signals and enable automated network mitigation of unintended packet loss.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 16 February 2024.

Table of Contents

1. Introduction

The job of a network is to transport packets. Understanding both where and why packet loss occurs is essential for effective network operation. Router-reported packet loss is the most direct signal for network operations to identify customer impact from unintended packet loss. Accurate accounting of packet loss is not enough, however, as some level of packet loss is normal in TCP/IP networks. In automating network operations, there are only a relatively small number of automated actions that can be taken to mitigate customer impacting packet loss. Precise classification of packet loss signals is important to ensure the right action is taken as taking the wrong action can make problems worse.

The existing metrics for packet loss as defined in [RFC1213] - namely ifInDiscards, ifOutDiscards, ifInErrors, ifOutErrors - do not provide sufficient precision to be able to automatically identify the cause of the loss and mitigate the impact. From a network operators' perspective, ifindiscards can represent both intended packet loss (i.e. packets discarded due to policy) and unintended packet loss (e.g. packets dropped in error). Further, these definitions are ambiguous, in that vendors can and have implemented them differently. In some implementations, ifinerrors accounts only for errored packets which are dropped, whilst in others it accounts for all errored packets whether they are dropped or not. Many vendors support more discard metrics than these; where they do, they are inconsistently implemented due to an absence of a clearly defined classification scheme and semantics for packet loss reporting.

This document describes our experience from implementing a packet loss classification scheme across multiple hardware platforms, which aims to address these issues and enable automated mitigation of unintended packet loss. Section 2 describes the problem. Section 3 defines the classification scheme and the accounting requirements with examples. Section 4 gives examples of discard signal-to-cause-to-auto-mitigation action mapping. Section 5 details our experience from implementing this scheme.

The terms 'packet drop' and 'discard' are considered equivalent and are used interchangeably.

2. Problem statement

Working backwards from the goal of auto-mitigation of unintended packet loss, there are only a relative small number of potential auto-mitigation actions, e.g.:

  1. Take a device, link or set of devices and/or links out of service
  2. Return a device, link or set of devices and/or links back into service
  3. Move traffic to another device
  4. Roll-back a recent change to a device that might have caused the problem
  5. Escalate to a human (e.g. network operators) as a last resort

Precise signal of impact is important as taking the wrong action can be worse than taking no action. For example, taking a congested device out of service can make congestion worse by moving the traffic to other already congested links and/or devices.

To be able to detect whether router reported packet loss is a problem and determine what actions should be taken to mitigate the impact and remediate the cause, depends on four primary features of the packet loss signal:

  1. the cause of the loss
  2. the rate and/or degree of the loss
  3. the duration of the loss
  4. the location of the loss

Features 2, 3 and 4 are already addressed with passive monitoring statistics, e.g. obtained with SNMP [RFC1157] or NETCONF [RFC6241]. Feature 1, however, is dependent on the classification scheme used for packet loss reporting. In the next section we define a new classification scheme to address this problem.

3. Traffic and Discard Classification Scheme

We define the classification scheme as a tree which follows the structure <component><direction><type><layer><sub-type><sub-sub-type><metric>, where:
a. component can be interface|device
b. direction can be ingress|egress
c. type can be traffic|discards, where traffic accounts for packets successfully received or transmitted, and discards account for packet drops
d. layer can be l2|l3

.
|-- interface/
|   |-- ingress/
|   |   |-- traffic/
|   |   |   |-- l2/
|   |   |   |   |-- frames
|   |   |   |   `-- bytes
|   |   |   |-- l3/
|   |   |   |   |-- v4/
|   |   |   |   |   |-- packets
|   |   |   |   |   |-- bytes
|   |   |   |   |   |-- unicast/
|   |   |   |   |   |   |-- packets
|   |   |   |   |   |   `-- bytes
|   |   |   |   |   `-- multicast/
|   |   |   |   |       |-- packets
|   |   |   |   |       `-- bytes
|   |   |   |   `-- v6/
|   |   |   |       |-- packets
|   |   |   |       |-- bytes
|   |   |   |       |-- unicast/
|   |   |   |       |   |-- packets
|   |   |   |       |   `-- bytes
|   |   |   |       `-- multicast/
|   |   |   |           |-- packets
|   |   |   |           `-- bytes
|   |   |   `-- qos/
|   |   |       |-- class_0/
|   |   |       |   |-- packets
|   |   |       |   `-- bytes
|   |   |       |-- ...
|   |   |       `-- class_n/
|   |   |           |-- packets
|   |   |           `-- bytes
|   |   `-- discards/
|   |       |-- l2/
|   |       |   |-- frames
|   |       |   `-- bytes
|   |       |-- l3/
|   |       |   |-- v4/
|   |       |   |   |-- packets
|   |       |   |   |-- bytes
|   |       |   |   |-- unicast/
|   |       |   |   |   |-- packets
|   |       |   |   |   `-- bytes
|   |       |   |   `-- multicast/
|   |       |   |       |-- packets
|   |       |   |       `-- bytes
|   |       |   `-- v6/
|   |       |       |-- packets
|   |       |       |-- bytes
|   |       |       |-- unicast/
|   |       |       |   |-- packets
|   |       |       |   `-- bytes
|   |       |       `-- multicast/
|   |       |           |-- packets
|   |       |           `-- bytes
|   |       |-- errors/
|   |       |   |-- l2/
|   |       |   |   `-- rx/
|   |       |   |       |-- frames
|   |       |   |       |-- crc_error/
|   |       |   |       |   `-- frames
|   |       |   |       |-- invalid_mac/
|   |       |   |       |   `-- frames
|   |       |   |       |-- invalid_vlan/
|   |       |   |       |   `-- frames
|   |       |   |       `-- invalid_frame/
|   |       |   |           `-- frames
|   |       |   |-- l3/
|   |       |   |   |-- rx/
|   |       |   |   |   |-- packets
|   |       |   |   |   |-- checksum_error/
|   |       |   |   |   |   `-- packets
|   |       |   |   |   |-- mtu_exceeded/
|   |       |   |   |   |   `-- packets
|   |       |   |   |   |-- invalid_packet/
|   |       |   |   |   |   `-- packets
|   |       |   |   |   `-- ttl_expired/
|   |       |   |   |       `-- packets
|   |       |   |   `-- no_route/
|   |       |   |       `-- packets
|   |       |   `-- local/
|   |       |       |-- packets
|   |       |       `-- hw/
|   |       |           |-- packets
|   |       |           `-- parity_error/
|   |       |               `-- packets
|   |       |-- policy/
|   |       |   `-- l3/
|   |       |       |-- packets
|   |       |       |-- acl/
|   |       |       |   `-- packets
|   |       |       |-- policer/
|   |       |       |   |-- packets
|   |       |       |   `-- bytes
|   |       |       |-- null_route/
|   |       |       |   `-- packets
|   |       |       `-- urpf/
|   |       |           `-- packets
|   |       `-- no_buffer/
|   |           |-- class_0/
|   |           |   |-- packets
|   |           |   `-- bytes
|   |           |-- ...
|   |           `-- class_n/
|   |               |-- packets
|   |               `-- bytes
|   `-- egress/
|       |-- traffic/
|       |   |-- l2/
|       |   |   |-- frames
|       |   |   `-- bytes
|       |   |-- l3/
|       |   |   |-- v4/
|       |   |   |   |-- packets
|       |   |   |   |-- bytes
|       |   |   |   |-- unicast/
|       |   |   |   |   |-- packets
|       |   |   |   |   `-- bytes
|       |   |   |   `-- multicast/
|       |   |   |       |-- packets
|       |   |   |       `-- bytes
|       |   |   `-- v6/
|       |   |       |-- packets
|       |   |       |-- bytes
|       |   |       |-- unicast/
|       |   |       |   |-- packets
|       |   |       |   `-- bytes
|       |   |       `-- multicast/
|       |   |           |-- packets
|       |   |           `-- bytes
|       |   `-- qos/
|       |       |-- class_0/
|       |       |   |-- packets
|       |       |   `-- bytes
|       |       |-- ...
|       |       `-- class_n/
|       |           |-- packets
|       |           `-- bytes
|       `-- discards/
|           |-- l2/
|           |   |-- frames
|           |   `-- bytes
|           |-- l3/
|           |   |-- v4/
|           |   |   |-- packets
|           |   |   |-- bytes
|           |   |   |-- unicast/
|           |   |   |   |-- packets
|           |   |   |   `-- bytes
|           |   |   `-- multicast/
|           |   |       |-- packets
|           |   |       `-- bytes
|           |   `-- v6/
|           |       |-- packets
|           |       |-- bytes
|           |       |-- unicast/
|           |       |   |-- packets
|           |       |   `-- bytes
|           |       `-- multicast/
|           |           |-- packets
|           |           `-- bytes
|           |-- errors/
|           |   |-- l2/
|           |   |   `-- tx/
|           |   |       `-- frames
|           |   `-- l3/
|           |       `-- tx/
|           |           `-- packets
|           |-- policy/
|           |   `-- l3/
|           |       |-- acl/
|           |       |   `-- packets
|           |       `-- policer/
|           |           |-- packets
|           |           `-- bytes
|           `-- no_buffer/
|               |-- class_0/
|               |   |-- packets
|               |   `-- bytes
|               |-- ...
|               `-- class_n/
|                   |-- packets
|                   `-- bytes
`-- control_plane/
    |-- packets
    |-- bytes
    `-- policy/
        |-- acl/
        |   `-- packets
        `-- policer/
            `-- packets

For additional context, Appendix A provides an example of where packets may be dropped in a device.

3.1. Discard Class Descriptions

discards/error/l2/rx/
Frames dropped due to errors in the received L2 frame, e.g. due to failing CRC, invalid header, invalid MAC address, invalid VLAN.

discards/error/l3/rx/
These are drops due to errors in the received packet, i.e. which indicate an upstream problem, rather than a problem with the device that is dropping the errored packets. There are multiple potential errors that can cause a packet to be dropped on receipt, e.g. header checksum errors, incorrect version, incorrect header length, bad options.

discards/error/l3/ttl_expired
There can also be multiple causes for TTL-exceed drops: i) trace-route; ii) TTL set too low by the end-system; iii) routing loops

discards/error/l3/no_route/
Discards due to a packet not matching any route.

discards/error/local/
A device may drop packets within its switching pipeline due to internal errors, e.g. parity errors. Any discards not explicitly assigned to the above classes are accounted here.

discards/policy/
These are intended discards, i.e. packets dropped due to a configured policy. There are multiple sub-classes.

discards/policy/l3/acl/
Discards due to packet matching an access control list (ACL).

discards/policy/l3/policer/
Discards due to packet matching a configured policer.

discards/policy/l3/null_route/
Discards due to a packet matching a route with discard action.

discards/policy/l3/urpf/
Discards due to a packet failing unicast reverse path forwarding (RPF) check.

discards/no_buffer/
Discards due to no available buffer to enqueue the packet. These can be tail-drop discards or due to an active queue management algorithm, e.g. RED [RED93], CODEL [RFC8289].

3.2. Discard Accounting Requirements

Requirements 1-10 apply to the packets forwarded by the device, i.e. rather than packets destined to/from the device:

  1. All packet receipt, transmission and drops MUST be reported
  2. All packet receipt, transmission and drops SHOULD be attributed to the physical or logical interface where they occur.
  3. If a frame is discarded at L2, it MUST NOT be accounted for at L3
  4. An individual packet MUST NOT account against both the L2 traffic and L2 discard classes on a single direction, i.e. ingress or egress
  5. An individual packet MUST NOT account against both the L3 traffic and L3 discard classes on a single direction, i.e. ingress or egress
  6. The aggregate L2 and L3 traffic and discard classes MUST account for all underlying packets received, transmitted and dropped across all other classes
  7. The aggregate QOS traffic and discard (no buffer) classes MUST account for all underlying packets received, transmitted and dropped across all other classes
  8. In addition to the L2 and L3 aggregate classes, an individual dropped packet MUST only account against a single error, policy or no buffer discard sub class
  9. Where there may be multiple drop reasons for a packet, the ordering of discard class reporting MUST be defined
  10. If Diffserv [RFC2475] quality of service (QOS) is not used, no_buffer discards SHOULD be reported as class0
  11. Traffic from the device control plane SHOULD be accounted for the same as other egress traffic

3.3. Examples

Assuming all the requirements are met, a good unicast IPv4 packet received would increment:
- interface/ingress/traffic/l3/v4/unicast/packets
- interface/ingress/traffic/l3/v4/unicast/bytes
- interface/ingress/traffic/qos/class_0/packets
- interface/ingress/traffic/qos/class_0/bytes

A received unicast IPv6 packet dropped due to TTL expiry would increment:
- interface/ingress/discards/l3/v6/unicast/packets
- interface/ingress/discards/l3/v6/unicast/bytes
- interface/ingress/discards/l3/rx/ttl_expired/packets

An IPv4 packet dropped on egress due to no buffers would increment: - interface/egress/discards/l3/v4/unicast/packets
- interface/egress/discards/l3/v4/unicast/bytes
- interface/egress/discards/no_buffer/class_0/packets
- interface/egress/discards/no_buffer/class_0/bytes

4. A Possible Signal-Cause-Mitigation Mapping

Example discard signal-to-cause-to-mitigation mappings are shown in the table below:

+-------------------------------------------+---------------------+------------+----------+-------------+-----------------------+
| Discard class                             | Cause               | Discard    | Discard  | Unintended? | Possible actions      |
|                                           |                     | rate       | duration |             |                       |
+-------------------------------------------+---------------------+------------+----------+-------------+-----------------------+
| ingress/discards/errors/l2/rx             | Upstream device     | >Baseline  | O(1min)  | Y           | Take upstream link or |
|                                           | or link errror      |            |          |             | device out-of-service |
| ingress/discards/errors/l3/rx/ttl_expired | Tracert             | <=Baseline |          | N           | no action             |
| ingress/discards/errors/l3/rx/ttl_expired | Convergence         | >Baseline  | O(1s)    | Y           | no action             |
| ingress/discards/errors/l3/rx/ttl_expired | Routing loop        | >Baseline  | O(1min)  | Y           | Roll-back change      |
| .*/policy/.*                              | Policy              |            |          | N           | no action             |
| ingress/discards/errors/l3/no_route       | Convergence         | >Baseline  | O(1s)    | Y           | no action             |
| ingress/discards/errors/l3/no_route       | Config error        | >Baseline  | O(1min)  | Y           | Roll-back change      |
| ingress/discards/errors/l3/no_route       | Invalid destination | >Baseline  | O(10min) | N           | Escalate to operator  |
| ingress/discards/errors/local             | Device errors       | >Baseline  | O(1min)  | Y           | Take device           |
|                                           |                     |            |          |             | out-of-service        |
| egress/discards/no_buffer                 | Congestion          | <=Baseline |          | N           | no action             |
| egress/discards/no_buffer                 | Congestion          | >Baseline  | O(1min)  | Y           | Bring capacity back   |
|                                           |                     |            |          |             | into service or move  |
|                                           |                     |            |          |             | traffic               |
+-------------------------------------------+---------------------+------------+----------+-------------+-----------------------+

The 'Baseline' in the 'Discard Rate' column is network dependent.

5. Implementation Experience

  1. The number and granularity of classes described in section 3 is a compromise between: providing sufficient detail to be able to take the appropriate automated actions whilst: a) not providing too much detail which may require deeper understanding rather than helping to surface the problem quickly; b) constraining the quantity of data produced where these metrics are produced per interface to limit data volume and device CPU impacts. While further granularity is possible, we found the scheme described to be generally sufficient.
  2. There are multiple ways that we could have defined the discard classification tree, e.g. we could have used a multi-rooted tree, rooted in each protocol. We opted instead to define a tree where protocol discards and causal discards are accounted orthogonally, as this reduces the number of classes and we found it sufficient to determine mitigation actions.
  3. NoBuffer discards can be realised differently with different memory architectures. Whether a NoBuffer discard is attributed to ingress or egress can differ accordingly. For successful auto-mitigation: where the discards are due to egress interface congestion, they should be reported on egress; where the discards are due to device-level congestion (exceeding the device forwarding rate), they should be reported on ingress.
  4. Most platforms account for the number of packets where the TTL has expired, and the CPU has returned an ICMP Time Exceeded message. In practise, however, there is often a policer applied to limit the number of packets to the CPU. Implicitly, this limits the rate of TTL discards processed by the CPU and hence it limits the number of discards reported. One method to account for all packets discards due to TTL exceeded, even those that are dropped by a policer when being forwarded to the CPU, is to use accounting of all ingress packets received with TTL=1.
  5. Where a no route discard is implemented with a default null route, separate accounting is needed for any explicit null routes configured, in order to differentiate between interface/ingress/discards/policy/null_route/packets and interface/ingress/discards/errors/no_route/packets.
  6. It is useful to account separately for transit packets dropped by transit ACLs or policers, and packets dropped by ACLs or policers which limit the number of packets to the device control packets.
  7. It is not possible to identify a configuration error - i.e. when intended discards are unintended - with device packet loss metrics alone. For example, to determine if ACL drops are intended or due to a misconfigured ACL some other method is needed, e.g. with configuration validation before deployment or in detecting a significant change in ACL drops after a change compared to before.
  8. Where traffic byte counters need to be 64-bit, packet and discard counters which increase at a lower rate may be encoded in fewer bits, e.g. 48-bit.
  9. Where the reporting device is the source or destination of a tunnel, the ingress protocol for a packet may be different to the egress protocol, e.g. if IPv4 is tunnelled over IPv6. In this case, some implementations may attribute egress discards to the ingress protocol.
  10. There are multiple ways that these traffic and discard metrics can be exposed from the device, including SNMP [RFC1157], IPFIX [RFC5153], and NETCONF [RFC6241].

6. Security Considerations

There are no new security considerations introduced by this document.

7. IANA Considerations

There are no new IANA considerations introduced by this document.

8. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

9. Acknowledgments

The content of this draft has benefitted from feedback from JR Rivers, Ronan Waide, Chris DeBruin, Marcoz Sanz, Avinash Kadosh and Nadav Chachmon.

10. References

10.1. Normative References

[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.

10.2. Informative References

[RED93]
Jacobson, V., "Random Early Detection gateways for Congestion Avoidance", n.d..
[RFC1157]
Case, J., Fedor, M., Schoffstall, M., and J. Davin, "Simple Network Management Protocol (SNMP)", RFC 1157, DOI 10.17487/RFC1157, , <https://www.rfc-editor.org/rfc/rfc1157>.
[RFC1213]
McCloghrie, K. and M. Rose, "Management Information Base for Network Management of TCP/IP-based internets: MIB-II", STD 17, RFC 1213, DOI 10.17487/RFC1213, , <https://www.rfc-editor.org/rfc/rfc1213>.
[RFC2475]
Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, DOI 10.17487/RFC2475, , <https://www.rfc-editor.org/rfc/rfc2475>.
[RFC5153]
Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P. Aitken, "IP Flow Information Export (IPFIX) Implementation Guidelines", RFC 5153, DOI 10.17487/RFC5153, , <https://www.rfc-editor.org/rfc/rfc5153>.
[RFC6241]
Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., and A. Bierman, Ed., "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, , <https://www.rfc-editor.org/rfc/rfc6241>.
[RFC8289]
Nichols, K., Jacobson, V., McGregor, A., Ed., and J. Iyengar, Ed., "Controlled Delay Active Queue Management", RFC 8289, DOI 10.17487/RFC8289, , <https://www.rfc-editor.org/rfc/rfc8289>.

Appendix A. Where do packets get dropped?

The diagram below is an example of where and why packets may be dropped in a typical single ASIC, shared buffered type device, where packets ingress on the left and egress on the right.

                                                      +----------+
                                                      |          |
                                                      |  CPU     |
                                                      |          |
                                                      +--+---^---+
                                                from_cpu |   | to_cpu
                                                         |   |
                          +------------------------------v---+-------------------------------+
                          |                                                                  |

            +----------+  +----------+  +----------+  +----------+  +----------+  +----------+  +----------+
            |          |  |          |  |          |  |          |  |          |  |          |  |          |
 Packet rx ->  Phy     +-->  Mac     +--> Ingress  +--> Buffers  +--> Egresss  +-->  Mac     +-->  Phy     |>  Packet tx
            |          |  |          |  |  Pipeline|  |          |  |  Pipeline|  |          |  |          |
            +----------+  +----------+  +----------+  +----------+  +----------+  +----------+  +----------+

  Intended                               policy/acl                  policy/acl
  Discards:                              policy/policer              policy/policer
                                         policy/urpf
                                         null_route

Unintended                 error/rx/l2   error/rx/l3   no_buffer     error/tx/l3
  Discards:                              error/local
                                         no_route
                                         ttl

Authors' Addresses

John Evans
Amazon
Oleksandr Pylypenko
Amazon