Kernel 4.18 and eBPF Sample Apps
This month marks the release of both kernel 4.18 and our eBPF sample repository
. The kernel capabilities have been improved with the release of AF_XDP, allowing for user space applications to bypass the networking stack and communicate directly with the XDP driver. The core bpfilter functionality has also been introduced, laying the foundations for future eBPF firewalling solutions. eBPF offload capabilities have also improved including libbpf support, perf event map support and programmable RSS. Interesting times for offloading eBPF programs!
We have also introduced an extended Berkeley Packet Filter (eBPF) sample repository, which currently contains two XDP demo apps. We currently have a Layer 4 load balancer and a programmable RSS demo with more to come.
Layer 4 Load Balancer
Our load balancer application demonstrates how an eBPF program can be utilized to distribute incoming traffic to up to 512 servers. Netronome is certainly not the first to showcase a load balancer app, for Facebook has recently published their fully operational Katran
eBPF load balancer. Katran is highly sophisticated, so it can be a steep learning curve to install and configure, especially for those starting out with eBPF. With our demo app, we demonstrate a simpler load balancer, which can be controlled using Python scripts and bpftool, a tool that we have open sourced and is now part of the Linux kernel.
Our load balancer receives incoming packets, extracts the IP address and port numbers and calculates a hash based on these values. The hash value is subsequently used with an eBPF map to determine the network destination for the packet. The IP hash values will stay constant for the duration of each flow, resulting in a constant destination for packets from the same flow.
An outer destination IP header is inserted into the start of the packet, and the packet is transmitted back out of the interface to the target destination. Load balancer statistics can also be monitored with the included Python stats script.
A world first for us, is eBPF programmable RSS. What is RSS? It stands for “Receive Side Scaling,” a method utilized by modern network cards to determine which host CPU the packet should be sent to, enabling packet distribution. A hash function is applied to the packet headers (commonly the four-tuple IP and port, source and destination) to generate a hash for the flow, which designates which CPU RSS queue to send the packet to.
The RSS algorithms available on network cards are commonly closed source or fixed in hardware, therefore cannot be adapted to non-standard use cases. With kernel 4.18, my colleague, Jakub Kicinski, has addressed this issue by implementing an exciting new feature, programmable RSS using eBPF. This eBPF feature is only available on offload, i.e. when the eBPF program is run directly on the network card.
Our demo app illustrates how packets can be distributed to:
● a single queue, chosen by the user through a userspace utility,
● distributed queues using a hash algorithm,
● distributed queues using a Symmetric RSS hash algorithm,
● distributed queues using a hash algorithm against the IPinIP inner headers.
RSS distribution is a major concern for certain applications, such as Suricata or Snort intrusion detection systems (IDS). These applications require CPU state access locality across all packets of the flow, to help minimize cache bouncing and achieve higher performance. Symmetric RSS is a mode which ensures that packets sent in both directions of a flow map to the same queue. For example, a connection between hosts A and B, both packets sent by A and by B will be placed on the same CPU queue.
Custom header RSS (or Encapsulated RSS) allows users to benefit from RSS even if they are using custom headers or uncommon encapsulations. Default RSS implementations are only able to parse common protocols. With programmable RSS users can include any field of the headers in RSS calculation. Users can also parse any encapsulation protocol they have in their networks. This is beneficial to overlay and trunk networks, where the outer IP header is relatively static resulting in a badly distributed RSS. The inner IP header addresses can have more variance hence a better RSS distribution.
Our demo showcases several RSS modes. However, as this functionality is fully programmable, a RSS queue could be based on any header data such UDP encapsulations (VXLAN, Geneve, FOU, GUE, etc.), NSH, QUIC or any other protocol.
Also, Netronome is hosting a three-part Fall eBPF Webinar Series. Register here