Netronome was privileged to present a half day tutorial on host data plane acceleration at ACM SIGCOMM in Budapest on August 20, 2018. The tutorial introduced attendees to models for host data plane acceleration and provided an in-depth understanding of SmartNIC deployment models at hyperscale cloud vendors and telecom service providers. An emphasis was given throughout the tutorial to open source resources available for research and product development.
To add some extra excitement to the event, an air parade was conducted simultaneously over the Danube, directly behind the speakers.
SmartNIC Deployment Models
The first section was presented by myself, Simon Horman. It began by framing the discussion in terms of the history of switching, first in hardware in black boxes, then in software in hosts to provide switching to VMs, then in hardware in hosts using SR-IOV to provide switching to VMs, and then back in software using SDN to switch in software to VMs. A theme here was contrasting the performance of hardware against the flexibility and impact on host CPU of software. This provided a motivation for host data plane acceleration; to provide a solution that has both good performance and high flexibility.
The flexibility of different SDN models was then discussed contrasting applications such as Open vSwitch (OVS) and Tungsten Fabric/vRouter, packet movement infrastructure such as eBPF and tools such as P4. Here there was a tradeoff between work required to deploy a system and the flexibility of the resulting solution.
The discussion then moved to offloading existing data planes: OVS, Tungsten Fabric/vRouter and eBPF. It highlighted that drop-in solutions require relatively little development effort but flexibility of the resulting system was limited to the extent that features are defined by the existing system. Progressively replacing more and more of the system with data planes written in P4 or running eBPF was then discussed. The most flexible solution, which also required the greatest development effort, was to create entirely new data planes to run in both the SmartNIC and kernel.
The section closed with a brief comparison of the performance benefits of host data plane acceleration - in this case, the throughput improvement and host CPU savings of using Agilio OVS. And by highlighting upstream open source activities in the area of host data plane acceleration: Linux kernel and DPDK drivers, offload support in OVS and integration into OpenStack.
XDP/BPF Introduction and Classifier Lab
This section of the tutorial was presented by Netronome’s Jakub Kicinski and David Beckett. It opened with an overview of eBPF, a kernel-based virtual machine enabling low-level packet processing which provides maps to share data with user space. And XDP which allows eBPF programs to reflect, filter and redirect packets without traversing the network stack. This property leads to significant performance benefits in applications such as load balancing, DDoS mitigation and distributed firewalls. This was followed by some discussion of how the kernel uses a verifier to ensure the security and stability of eBPF code accepted from userspace.
Attention was then switched to offload of XDP which furthers the performance benefits of XDP pushing processing from the host to a SmartNIC. The programming model is for the kernel to compile eBPF programs loaded into it into Network Flow Processor (NFP) instructions and load the compiled program onto the SmartNIC. There was some discussion of optimizations performed by this compiler: replacing eBPF assembly code sequences with fewer or faster NFP instructions; batching atomic operations; optimizing out helpers; intelligently mapping 64-bit eBPF registers to 32-bit NFP registers; and creating read-only maps on the SmartNIC to allow use of memory units closer to the processors on the SmartNIC.
The remainder of the presentation provided a number of live demonstrations where simple eBPF programs were implemented to demonstrate features of XDP with real-world applications. The demonstrations included a load balancer and programmable RSS. Links to their source are provided in the slides.
P4-16 Introduction
The final section of the tutorial was presented by Netronome’s Jaco Joubert. It began with an introduction to P4 starting with the four Ps of P4, Protocol Independent Packet Processing Programming, where the programmer defines packet formats and processing algorithms. P4 provides a target independent high-productivity language for data plane implementation. It provides a consistent control plane interface as control plane APIs are automatically generated by the compiler. It also allows the development of reconfigurable data planes as the data plane program can be changed in the field. And perhaps most importantly, P4 itself is a community-driven design.
The presentation then moved on to the development model using the Netronome P4 SDK where P4 is first compiled into an intermediate representation and then, using a target specific compiler, into firmware to run on an Agilio SmartNIC. And then the P4C-XDP development model where the intermediate representation is compiled into XDP bytecode which is loaded onto the SmartNIC using standard Linux kernel APIs and tooling.
Some use case examples of P4 were then discussed, including in-band network telemetry, stateful firewalling and load balancing. The conversation then moved on to P4 constructs, parsing, matching and actions which lead to an example of a simple switch implemented in P4.
A discussion of the architecture of a stateful firewall implemented in P4 was then provided along with performance measurements showing near line-rate performance in the presence of up to 1 million flows. This section of the tutorial closed with a demonstration of P4 used for telemetry, illustrating how data on the latency of different hops a packet traverses may be collected.