Sujal_Das_headshot.jpg

Avoid Kernel Bypass in Your Network Infrastructure

By Sujal Das | Jan 10, 2017

Kernel means many things – such as the softer, usually edible part of a nut, or the seed and hard husk of a cereal, or the central and most important part of something. In the world of Linux, the kernel certainly means the central and most important part of the operating system. It means all of the following synonyms as well: essence, core, heart, essential, quintessence, fundamental, basic, and substance. So, what is it that causes some of us in the data center networking industry to gravitate toward bypassing the kernel?

We have seen many kernel bypass solutions in the past, most notably RDMA (Remote Direct Memory Access), TOE (TCP Offload Engine) and OpenOnload. More recently, DPDK (Data Plane Development Kit) has been used in some applications to bypass the kernel, and then there are new emerging initiatives such as FD.io (Fast data - Input/Output) based on VPP (Vector Packet Processing). More will likely continue to emerge in the future. These technologies and related Linux networking software stacks bypass the Linux kernel, more specifically the Linux kernel networking stack in the central and most important part of the operating system. This is depicted using a simple picture below.


Kernel Stack


In human anatomy, the brain and the heart are the central and most important parts. Bypassing them is the last resort one takes when the very life of the human is at stake. Cerebral bypass connects a blood vessel from outside the brain to a vessel inside the brain to reroute blood flow around a damaged or blocked artery. Similarly, a coronary artery bypass surgery aims to replace damaged arteries in the heart. The below is a picture of a heart bypass surgery that shows parallel bypass arteries, analogous to the parallel kernel bypass stacks in the above picture.

Heart


So then what could be so wrong with the networking packet arteries in the Linux kernel that motivates some of us to bypass them?

There are two main reasons:

1. The "kernel is too slow"
2. It allows anyone to "plug in the technology without the need to change core/kernel code”

For those two reasons, and with the added advantage of those kernel bypass technologies being open sourced and/or specified by standards bodies, the proponents push data center operators to adopt them.

It is true that the kernel networking stack is slow (in other words the kernel networking arteries are clogged in many ways) and the problem is getting worse with the adoption of higher speed networking in servers and switches, namely, 10, 25, 40GbE today, and 50 or 100GbE in the near future. Technologies like RDMA and TOE create a parallel stack in the kernel and solve the first problem (namely, the "kernel is too slow") while OpenOnload, DPDK and FD.io (based on VPP) utilize the Linux user space with the aim of solving both problems. When technologies are built in the Linux user space, the need for changes to the kernel is avoided, eliminating the extra effort required to convince the Linux kernel community about the usefulness of the bypass technologies and their adoption via upstreaming into the Linux kernel.

The challenges related to adopting parallel stacks outside of the kernel networking stack are obvious to the leading data center operators, especially the ones who are challenged with scaling their infrastructure to a very large number of servers and an assortment of applications, while maintaining operational efficiencies at the highest levels. With parallel networking stacks comes a seemingly endless list of security, manageability, robustness, hardware vendor lock-in, and protocol compatibility issues to name a few.

Nonetheless, some data center operators who have fewer servers (for example a few hundreds) to manage and run a single application – such as in HPC or HFT markets – may find it practical to utilize such parallel kernel bypass stacks. The same applies to dedicated storage clusters.

Can the clogging of the kernel networking stack be fixed without resorting to parallel bypass stacks? Fortunately, it can, unlike the situation with clogged arteries in the brain and heart. The right way to solve the two problems above would be to find ways to accelerate networking performance of the kernel networking stack transparently, using smart networking hardware, and without any vendor lock-in.

At Netronome, we are at the forefront, working with data center operators, looking to solve these problems without bypassing the kernel. We believe bypassing the central and most important part of the operating system just does not make sense as operators build data centers that will have to deal with a tenfold increase in data from mobile and IoT devices in the near future. Linux kernel stack technologies such as Extended Berkeley Packet Filter (eBPF) and the Traffic Classifier (TC) hold the promise of allowing SmartNIC vendors like Netronome to stick to the Linux kernel networking stack and allow data center operators to scale efficiently. The resounding recommendation from the Linux community has always been to avoid kernel bypass and like all fundamental and simple ideas, this idea has held solid ground in the past, holds true today and will do so in the future. If you want to learn more about Netronome’s novel approach in this area, please email me at sujal.das@netronome.com.