I try to explain load balancing on MX. This post does not cover all the cases, but tries to summarize the default behaviour depending on the type of card: ICHIP or TRIO based, to provide which fields can be add or remove to compute the hash and finally to provide some troubleshooting commands.
This new post will be split in 3 parts :
- Part 1: Junos Load balancing – Introduction
- Part 2: Junos Load Balancing – Configuration
- Part 3: Junos Load Balancing – Troubleshooting
This set of posts covers these kinds of traffic:
- mpls (for IP or PW
Multiservices traffic (like VPLS) is out of the scope of these posts.
Note, the release used for test was 11.4.
These posts are based on my experience, tests in LAB with IXIA tester, and some public Juniper technical documentations. Sometimes I will use PFE related commands, even if I never encountered crash with them, remember that PFE commands are not supported by the JTAC.
In all cases, we suppose that we have configured the per-flow load-balancing feature, like that:
set policy-options policy-statement LBpolicy then load-balance per-packet
set routing-options forwarding-table export LBpolicy
commit sync and-quit
N.B.: Remember that per-packet, here, means per-flow and by default only one Path is selected per prefix and installed in FIB. RE works and displays (show route) in a per-prefix mode.
On Junos, the load balancing algorithm is the same for ECMP and LAG interfaces. Junos uses the term of “unilist” and “aggregate” Next Hops to refer to ECMP and LAG Next Hops. Forwarding Next Hop is called “unicast”.
We often use the term of indirect, composite or mcast Next Hop. (All are indirect). Moreover Next Hop types are usually combined to form a Next Hop chain. For example, a BGP route refers first to an indirect Next Hop, this one can point to an unilist Next Hop composed of several unicast Next Hops. This list of unicast NHs could be unicast links or LAG links. Take this example:
sponge@bob> show route 126.96.36.199/24
188.8.131.52/24 *[BGP/170] 5d 20:27:27, MED 0, localpref 70000, from 10.1.1.1
AS path: 65000 65001 I
> to 10.253.184.42 via ae0.0
to 10.253.184.50 via ae1.0
Here we’ve a BGP route reachable via 2 Equal Cost Path (ae0 and ae1). In our example, ECMP is composed of 2 LAGs. Note: RE selects one forwarding Next Hop in a per prefix mode (the > means that).
Now, move on the forwarding table, to verify that 2 NHs are installed at the FIB level:
sponge@bob> show route forwarding-table destination 184.108.40.206/24
Routing table: default.inet
Destination Type RtRef Next hop Type Index NhRef Netif
220.127.116.11/24 user 0 indr 1049510 399575
ulst 1048574 4
10.253.184.42 ucst 731 2 ae0.0
10.253.184.50 ucst 732 3 ae1.0
What does it mean?
The BGP route has a protocol Next Hop that is the indirect Next Hop ID 1049510, this one is assigned to a list of unicast Next Hop called an unilist Next Hop with the ID 1048574. This one is composed of a list of 2 unicast Next Hops 731 and 732 that are actually 2 aggregate Next Hops. The previous Cli command does not resolve these 2 aggregate NHs, even if you use the extensive option. To have a detailed view of the Next Hop chain, I prefer using this command that gives the NH chain at the PFE level:
sponge@bob> show pfe route ip prefix 18.104.22.168/24 detail
IPv4 Route Table 0, default.0, 0x0:
Destination NH IP Addr Type NH ID Interface
------------ --------------- -------- ----- ---------
1.0.0/24 Indirect 1049510 RT-ifl 0 ae0.0 ifl 323
Next Hop details:
1049510(Indirect, IPv4, ifl:323:ae0.0, pfe-id:0, i-ifl:0:-)
1048574(Unilist, IPv4, ifl:0:-, pfe-id:0)
731(Aggreg., IPv4, ifl:323:ae0.0, pfe-id:0)
733(Unicast, IPv4, ifl:418:xe-2/0/0.0, pfe-id:8)
734(Unicast, IPv4, ifl:419:xe-2/0/1.0, pfe-id:8)
735(Unicast, IPv4, ifl:426:xe-3/0/0.0, pfe-id:12)
736(Unicast, IPv4, ifl:427:xe-3/0/1.0, pfe-id:12)
737(Unicast, IPv4, ifl:434:xe-4/0/0.0, pfe-id:16)
738(Unicast, IPv4, ifl:435:xe-4/0/1.0, pfe-id:16)
739(Unicast, IPv4, ifl:442:xe-5/0/0.0, pfe-id:20)
740(Unicast, IPv4, ifl:443:xe-5/0/1.0, pfe-id:20)
727(Unicast, IPv4, ifl:466:xe-10/0/0.0, pfe-id:40)
741(Unicast, IPv4, ifl:467:xe-10/0/1.0, pfe-id:40)
732(Aggreg., IPv4, ifl:324:ae1.0, pfe-id:0)
744(Unicast, IPv4, ifl:420:xe-2/1/0.0, pfe-id:9)
745(Unicast, IPv4, ifl:421:xe-2/1/1.0, pfe-id:9)
746(Unicast, IPv4, ifl:428:xe-3/1/0.0, pfe-id:13)
747(Unicast, IPv4, ifl:429:xe-3/1/1.0, pfe-id:13)
748(Unicast, IPv4, ifl:436:xe-4/1/0.0, pfe-id:17)
749(Unicast, IPv4, ifl:437:xe-4/1/1.0, pfe-id:17)
750(Unicast, IPv4, ifl:444:xe-5/1/0.0, pfe-id:21)
751(Unicast, IPv4, ifl:445:xe-5/1/1.0, pfe-id:21)
742(Unicast, IPv4, ifl:468:xe-10/1/0.0, pfe-id:41)
743(Unicast, IPv4, ifl:469:xe-10/1/1.0, pfe-id:41)
752(Unicast, IPv4, ifl:470:xe-10/2/0.0, pfe-id:42)
Here we have the complete view, the unilist Next Hop is actually a list of 2 aggregate NHs 731 and 732. Each of them are composed of several “real” unicast NHs, that are the forwarding NHs.
When a packet is received the ingress PFE performs a lookup and finds several forwarding NHs. Therefore, the ingress PFE has to load balance flow over all these forwarding NHs. To do that, the ingress PFE extracts some keys from the incoming packet, optionally adds some internal keys (like interface index), then those keys are used to compute a hash and finally the algorithm selects a forwarding Next Hop among with the list. The packet can be forwarded, thought the fabric, to the right PFE (PFE that hosts the selected forwarding NH). The same flow (with the same key values) will be always forwarded to the same forwarding NH.
By default, ICHIP based cards use these following keys to compute the hash:
On ICHIP based cards, I didn’t find a command (It’s not the case for TRIO – see below) to check the default or configured hash-key either in CLI or PFE, but I’m still looking for (I currently develop a Perl script to extract all PFE commands for a given card)
By default, TRIO based cards use these following keys to compute the hash:
As you can see, for inet traffic TRIO uses by default layer 4 keys to compute the hash, therefore beware with some kinds of traffic: fragmented traffic, IP RAW traffic (I mean other than UDP or TCP). Moreover, TRIO includes for mpls traffic the mpls payload either IP or PW.
Note: the “incoming interface index” key has been removed for all types of traffic.
TRIO provides a PFE command to show the current load balancing configuration for every types of traffic, the following output shows the default load balancing used on TRIO cards:
sponge@bob> start shell pfe network fpc3
NPC platform (1067Mhz MPC 8548 processor, 2048MB memory, 512KB flash)
NPC3(bob vty)# show jnh lb
Unilist Seed Configured 0x919ae752 System Mac address 00:21:59:a2:e8:00
Hash Key Configuration: 0x0000000000e00000 0xffffffffffffffff
MAC ADDRESSES: Yes
802.1P OUTER: No
Services Hash Key Configuration:
PART 2 will present the load-balancing configuration for ICHIP and TRIO cards.