Recently, we received in LAB 2 new MPC cards:
- The MPC4e Combo card: 2x100GE + 8x10GE ports (MPC4E 3D 2CGE+8XGE)
- The MPC4e 32x10GE ports (MPC4E 3D 32XGE)
These 2 new cards need at least the Junos 12.3 and can be used on both dense chassis: MX960 and MX2000 chassis. Here, I only present you MPC4e on MX960 chassis.
I played with these 2 new cards to better understand how packets are managed at PFE level. Here after, my analysis:
Introduction – MPC4e on MX960:
To work, MPC4e cards need at least SCB-E fabric cards. Remember that an MX960 can host 3 SCB-E cards and each of them have 2 Fabric Planes. In normal configuration, SCBs work in 2+1 mode (aka. redundancy mode). In this configuration, each PFE has got active high speed links towards 4 planes and 2 other paths towards the 2 standby planes hosted by the standby SCB-E.
In this configuration (2+1), SCB-E is a real bottleneck and a new feature has been introduced to increase the fabric bandwidth and switch the fabric in 3+0 mode (no more redundancy). To recover the fabric redundancy we have to wait the third generation of SCB, named SCB-E2.
However, fabric components are more "basic" than PFE ASICs and therefore are very stable components. I’ve rarely seen SCB crash. But it could happen, and a good usage of the CoS can preserve critical traffic to be dropped when chassis encounters a fabric failure (play with loss priorities is recommended)
To enable the 3+0 SCB-E mode, apply this configuration:
set chassis fabric redundancy-mode increased-bandwidth
After that, the 6 planes of the chassis become active (Online):
sponge@bob> show chassis fabric summary
Plane State Uptime
0 Online 13 hours, 23 minutes, 21 seconds
1 Online 13 hours, 23 minutes, 16 seconds
2 Online 13 hours, 23 minutes, 11 seconds
3 Online 13 hours, 23 minutes, 6 seconds
4 Online 13 hours, 23 minutes, 1 second
5 Online 13 hours, 22 minutes, 56 seconds
sponge@bob> show chassis fabric redundancy-mode
Fabric redundancy mode: Increased Bandwidth
The MPC4e is a trio-based card which uses an enhanced version of the TRIO ASIC. An MPC4e card is made of 2 PFEs. Each PFE is a set of ASICs:
- 1 XMq chip (enhanced Mq chip version (used by MPC1/2 or MPC 16x10GE))
- 2 LU chip
Each PFE has a line-rate around 130Gbits/s depending on the packet size. So each MPC4e card can deliver around 260Gbits/s. But, as I said previously with the current MX960 fabric cards (SCB-E) each PFE can have either 80Gbits/s to and from the fabric in 2+1 mode or 120Gbits/s in 3+0. So it is why I recommend the 3+0 mode with SCB-E cards. In 3+0 SCB-E mode each MPC4e card has around 240Gbits/s to and from the fabric. But, remember that intra PFE traffic doesn’t use the fabric links, so MPC4e performance is really around 2x130Gbits/s. The full capacity of the MPC4e ASICs should be available with SCB-E2 cards.
One important thing to notice:
The PFE has a 130Gb/s total bandwidth capacity, but this bandwidth is divided between two virtual WAN groups of 65Gb/s.
The 32x10GE cards is logically divided in 4 PICs of 8 ports. The PIC 0 and 1 are connected to PFE 0 and PIC 2 and 3 belong to the PFE 1. For a given PFE, for instance PFE0, the group 0, named WAN0, of 65Gbits/s is associated to PIC 0 and the another 65Gbits/s group, WAN1, is associated to PIC1. Each PIC has 80Gbits/s of bandwidth but the remaining bandwidth of one group can be re-allocated to the other. So the PFE capacity of 130Gbits/s is well shared by all the sixteen ports and if you oversubscribe the PFE (more than 130 Gbits/s), you will see equally loss on every sixteen ports.
The case of the combo is different. Indeed a given PFE is associated to 2 virtual PICs. 4x10GE ports belong to the PIC 0 and the 100GE port to the PIC 1. Each PIC is associated to a WAN group with a 65Gbits/s “transmit-rate”. Without oversubscription you can use the 100GE port line-rate because the unused 65Gbits/s of the WAN0 can be consumed by the WAN1 if needed. Nevertheless, if you oversubscribe the 130Gbits/s PFE bandwidth, you will not see proportional loss between the 100GE ports and the 4x10GE ports. Because the 4x10GE belonging to WAN0 will never exceed 40GBits/s (it will always below its 65bits/s “transmit-rate”). But important notice: high priority traffic will not be affected, no matter of input interface.
The 2 next diagrams depict an internal view of the MPC4e combo card and the MPC4e 32x10GE card (It's what I have deduced by playing with PFE commands).
The MPC4e combo card internal representation
We carried out many stress tests on both cards in 3+0 SCB-E mode. Both cards have an ASIC capacity of 260Gbits/s depending on the packet size. Indeed, small packets (around 64B) stress more the ASIC. But this is not a news, it was the case for the previous versions of Juniper ASICs and it is also the case for other vendors.
In case of card oversubscription, you will see the 100GE port looses traffic and the 4x10GE ports almost do not. This is due to the design of the MPC4e PFE's in 2 WAN virtual ports of 65Gbits/s. But remember with a 100GE port and only 3x10GE ports used there is no drop. You can only experience low priority traffic loss on the 100GE port when you add the last 10GE port. This is not the case for 32x10GE traffic, where traffic loss is equally shared among the 10GE ports.
The second important thing to notice is the fabric bottleneck with the current SCB-E cards:
- 2+1 SCB-E mode : fabric BW 80Gbits/s per PFE or 160Gbits/s per MPC4e
- 3+0 SCB-E mode : fabric BW 120Gbits/s per PFE or 240Gbits/s per MPC4e
In practical, with a 3+0 mode we reached a throughput to and from the fabric for an MPC4e around 230Gbits/s for packet from 320Bytes. (over 240Gbits/s (2x120) theoretically available).
In conclusion, the performance of both MPC4e cards is close to what can be read in the datasheet.
What else? Troubleshooting:
During the tests in LAN I also played with the PFE CLI of the MPC4e. I've drawn this picture with some interesting commands (remember: not supported by the JTAC :-) )