Overblog Suivre ce blog
Editer l'article Administration Créer mon blog
24 mai 2012 4 24 /05 /mai /2012 17:48

I try to explain load balancing on MX. This post does not cover all the cases, but tries to summarize the default behaviour depending on the type of card: ICHIP or TRIO based, to provide which fields can be add or remove to compute the hash and finally to provide some troubleshooting commands.


This new post will be split in 3 parts :

-          Part 1: Junos Load balancing – Introduction

-          Part 2: Junos Load Balancing – Configuration

-          Part 3: Junos Load Balancing – Troubleshooting 


This set of posts covers these kinds of traffic:

-          inet

-          inet6

-          mpls (for  IP or PW


Multiservices traffic (like VPLS) is out of the scope of these posts.

 

Note, the release used for test was 11.4.

 

These posts are based on my experience, tests in LAB with IXIA tester, and some public Juniper technical documentations. Sometimes I will use PFE related commands, even if I never encountered crash with them, remember that PFE commands are not supported by the JTAC.

 

In all cases, we suppose that we have configured the per-flow load-balancing feature, like that:



edit exclusive

  set policy-options policy-statement LBpolicy then load-balance per-packet

  set routing-options forwarding-table export LBpolicy

commit sync and-quit


 

N.B.: Remember that per-packet, here, means per-flow and by default only one Path is selected per prefix and installed in FIB. RE works and displays (show route) in a per-prefix mode. 

 

On Junos, the load balancing algorithm is the same for ECMP and LAG interfaces. Junos uses the term of “unilist” and “aggregate” Next Hops to refer to ECMP and LAG Next Hops. Forwarding Next Hop is called “unicast”.

 

We often use the term of indirect, composite or mcast Next Hop. (All are indirect). Moreover Next Hop types are usually combined to form a Next Hop chain. For example, a BGP route refers first to an indirect Next Hop, this one can point to an unilist Next Hop composed of several unicast Next Hops. This list of unicast NHs could be unicast links or LAG links. Take this example:

 


sponge@bob> show route 1.0.0.0/24

 

1.0.0.0/24         *[BGP/170] 5d 20:27:27, MED 0, localpref 70000, from 10.1.1.1

                      AS path: 65000 65001 I

                    > to 10.253.184.42 via ae0.0

                      to 10.253.184.50 via ae1.0


 

Here we’ve a BGP route reachable via 2 Equal Cost Path (ae0 and ae1). In our example, ECMP is composed of 2 LAGs. Note: RE selects one forwarding Next Hop in a per prefix mode (the > means that).

 

Now, move on the forwarding table, to verify that 2 NHs are installed at the FIB level:



sponge@bob> show route forwarding-table destination 1.0.0.0/24

Routing table: default.inet

Internet:

Destination        Type RtRef Next hop           Type Index NhRef Netif

1.0.0.0/24         user     0                    indr 1049510 399575

                                                 ulst 1048574     4

                              10.253.184.42      ucst   731     2 ae0.0

                              10.253.184.50      ucst   732     3 ae1.0 


What does it mean?

 

The BGP route has a protocol Next Hop that is the indirect Next Hop ID 1049510, this one is assigned to a list of unicast Next Hop called an unilist Next Hop with the ID 1048574. This one is composed of a list of 2 unicast Next Hops 731 and 732 that are actually 2 aggregate Next Hops. The previous Cli command does not resolve these 2 aggregate NHs, even if you use the extensive option. To have a detailed view of the Next Hop chain, I prefer using this command that gives the NH chain at the PFE level:

 


sponge@bob> show pfe route ip prefix 1.0.0.0/24 detail

 

Slot 1

 

IPv4 Route Table 0, default.0, 0x0:

Destination   NH IP Addr      Type     NH ID Interface

------------  --------------- -------- ----- ---------

1.0.0/24                      Indirect 1049510 RT-ifl 0 ae0.0 ifl 323

 

Next Hop details:

1049510(Indirect, IPv4, ifl:323:ae0.0, pfe-id:0, i-ifl:0:-)

    1048574(Unilist, IPv4, ifl:0:-, pfe-id:0)

        731(Aggreg., IPv4, ifl:323:ae0.0, pfe-id:0)

            733(Unicast, IPv4, ifl:418:xe-2/0/0.0, pfe-id:8)

            734(Unicast, IPv4, ifl:419:xe-2/0/1.0, pfe-id:8)

            735(Unicast, IPv4, ifl:426:xe-3/0/0.0, pfe-id:12)

            736(Unicast, IPv4, ifl:427:xe-3/0/1.0, pfe-id:12)

            737(Unicast, IPv4, ifl:434:xe-4/0/0.0, pfe-id:16)

            738(Unicast, IPv4, ifl:435:xe-4/0/1.0, pfe-id:16)

            739(Unicast, IPv4, ifl:442:xe-5/0/0.0, pfe-id:20)

            740(Unicast, IPv4, ifl:443:xe-5/0/1.0, pfe-id:20)

            727(Unicast, IPv4, ifl:466:xe-10/0/0.0, pfe-id:40)

            741(Unicast, IPv4, ifl:467:xe-10/0/1.0, pfe-id:40)

        732(Aggreg., IPv4, ifl:324:ae1.0, pfe-id:0)

            744(Unicast, IPv4, ifl:420:xe-2/1/0.0, pfe-id:9)

            745(Unicast, IPv4, ifl:421:xe-2/1/1.0, pfe-id:9)

            746(Unicast, IPv4, ifl:428:xe-3/1/0.0, pfe-id:13)

            747(Unicast, IPv4, ifl:429:xe-3/1/1.0, pfe-id:13)

            748(Unicast, IPv4, ifl:436:xe-4/1/0.0, pfe-id:17)

            749(Unicast, IPv4, ifl:437:xe-4/1/1.0, pfe-id:17)

            750(Unicast, IPv4, ifl:444:xe-5/1/0.0, pfe-id:21)

            751(Unicast, IPv4, ifl:445:xe-5/1/1.0, pfe-id:21)

            742(Unicast, IPv4, ifl:468:xe-10/1/0.0, pfe-id:41)

            743(Unicast, IPv4, ifl:469:xe-10/1/1.0, pfe-id:41)

            752(Unicast, IPv4, ifl:470:xe-10/2/0.0, pfe-id:42)



Here we have the complete view, the unilist Next Hop is actually a list of 2 aggregate NHs 731 and 732. Each of them are composed of several “real” unicast NHs, that are the forwarding NHs.

 

When a packet is received the ingress PFE performs a lookup and finds several forwarding NHs. Therefore, the ingress PFE has to load balance flow over all these forwarding NHs. To do that, the ingress PFE extracts some keys from the incoming packet, optionally adds some internal keys (like interface index), then those keys are used to compute a hash and finally the algorithm selects a forwarding Next Hop among with the list. The packet can be forwarded, thought the fabric, to the right PFE (PFE that hosts the selected forwarding NH). The same flow (with the same key values) will be always forwarded to the same forwarding NH.

 

By default, ICHIP based cards use these following keys to compute the hash: 


hash1

 

On ICHIP based cards, I didn’t find a command (It’s not the case for TRIO – see below) to check the default or configured hash-key either in CLI or PFE, but I’m still looking for (I currently develop a Perl script to extract all PFE commands for a given card)

 

By default, TRIO based cards use these following keys to compute the hash:

 

As you can see, for inet traffic TRIO uses by default layer 4 keys to compute the hash, therefore beware with some kinds of traffic: fragmented traffic, IP RAW traffic (I mean other than UDP or TCP). Moreover, TRIO includes for mpls traffic the mpls payload either IP or PW.

 

Note: the “incoming interface index” key has been removed for all types of traffic.


hash2

 

TRIO provides a PFE command to show the current load balancing configuration for every types of traffic, the following output shows the default load balancing used on TRIO cards:

 


sponge@bobstart shell pfe network fpc3

  

NPC platform (1067Mhz MPC 8548 processor, 2048MB memory, 512KB flash)

 

NPC3(bob vty)# show jnh lb

Unilist Seed Configured 0x919ae752 System Mac address 00:21:59:a2:e8:00

Hash Key Configuration: 0x0000000000e00000 0xffffffffffffffff

           IIF-V4: No

         SPORT-V4: Yes

         DPORT-V4: Yes

              TOS: No

 

           IIF-V6: No

         SPORT-V6: Yes

         DPORT-V6: Yes

    TRAFFIC_CLASS: No

 

         IIF-MPLS: No

     MPLS_PAYLOAD: Yes

         MPLS_EXP: No

 

      IIF-BRIDGED: No

    MAC ADDRESSES: Yes

    ETHER_PAYLOAD: Yes

     802.1P OUTER: No

 

Services Hash Key Configuration:

         SADDR-V4: No

           IIF-V4: No


 

PART 2 will present the load-balancing configuration for ICHIP and TRIO cards.

 

David.

Partager cet article

Repost 0
Published by junosandme - dans Posts
commenter cet article

commentaires