There is nothing like a trip to Paris to get an overview of the latest trends in fashion. This is true for clothing, but does it also apply to networking technologies? Usually not. Paris certainly hosts fewer big tech companies than Silicon Valley, for example. However, the 30th edition of the FRnOG meeting was held last week in the heart of the capital, a stone’s throw from the Opéra Garnier.
The acronym, FRnOG, stands for French Network Operators Group. It gathers many professionals working in networking, from core network administrators to (occasionally) people working in regulatory instances, without omitting hardware vendors – like us! – and provides a brief overview of the trends in the sector.
I went to the conference, not only to attend, but also to give the very first talk of the day about bpfilter. This technology is a proposal for a new eBPF-based back-end for the iptables firewall in Linux. As of this writing, it is in a very early stage: it was submitted as a RFC (Request for Comments) on the Linux netdev mailing list around mid-February 2018 by David Miller (maintainer of the networking system), Alexei Starovoitov and Daniel Borkmann (maintainers of the BPF parts in the kernel). So, keep in mind that all details that follow could change, or could not ever reach the kernel at all!
Technically, the iptables binary used to configure the firewall would be left untouched, while the xtables part in the kernel could be transparently replaced by a new set of commands that would require the BPF sub-system to translate the firewalling rules into an eBPF program. This program could then be attached to one of the network-related kernel hooks, such as on the traffic control interface (TC) or at the driver level (XDP). Rule translation would occur in a new kind of kernel module that would be something between traditional modules and a normal ELF user space binary. Running in a special thread with full privilege but no direct access to the kernel, thus providing less attack surface, this special kind of module would be able to communicate directly with the BPF sub-system (mostly through system calls). And at the same time, it would remain very easy to use standard user space tools to develop, debug or even fuzz it! Besides this new module object, the benefits from the bpfilter approach could be numerous. Increased security is expected, thanks to the eBPF verifier. Reusing the BPF sub-system could possibly make maintenance of this component easier than for the legacy xtables and could possibly provide later integration with other components of the kernel that also rely on BPF. And of course, leveraging just-in-time (JIT) compiling, or possibly hardware offload of the program would enable a drastic improvement in performance!
Netronome has no part in the development of bpfilter. However, we are excited by the new possibilities that it could offer regarding firewall appliances, in particular when the eBPF programs are run on our Agilio SmartNICs. But bpfilter is not there yet. It has to be merged in the kernel tree before we can really use it. The initial RFC submission triggered long debates over the need for this new mechanism, or over the approach itself. In particular, some argued that the new module type could present security issues, or that nftables (designed as the successor of iptables) already has enough performance, so why create this third framework? However, the new module type remains more secure than classic modules, and performance improvements from nftables are nowhere close than those from native XDP (×1.14 against ×3.55 improvement respectively, according to a brief test of our own).
Results from a quick comparison for simple drop with several firewall mechanisms
One of the main parts of the discussion, however, was about the decision to base bpfilter on iptables, that many would like to phase out, instead of the more recent (and better designed) nftables model. The motivation for this, the developers said, was that iptables are far more spread out today, and will remain wildly used for a decade at the very least. So the community should put forth the best effort to improve performances and ease maintenance costs of this model. However, nftables could be made compatible with bpfilter as a follow-up (another RFC was sent by a different developer a few weeks later to propose just that). So, let’s sum up where we are at this point: there was no clear consensus about the proposal yet, however many arguments in favor of bpfilter were advanced, and considering that the developers who submitted the RFC are rather influent in the community, it seems very likely to be merged within the next few months.
I focused on bpfilter so far in this article: assuming it is part of Linux some day, it could be one way to improve networks at large, but it is not the only one that was presented at FRnOG. Let’s take a step back and give a look at the rest of the technical presentations on the agenda. Networks would also benefit from faster and more resilient routing protocols, wouldn’t they? There was a presentation about the BABEL protocol which, contrary to many other distance-vector based protocols, is protected against routing loops or link starvation through very simple mechanisms. Even before a routing protocol is set up, network deployment itself needs fast and flexible solutions, and people from Cloudflare explained how they used Salt Stack and Nitrogen to get efficient orchestration. The nodes on the network can be improved, too.
OpenSwitch was presented as an Open Source operating system for switches built on commodity hardware, combining standards like ONIE (Open Network Install Environment, a way to use Grub and busybox to easily install a new system on a switch) and SAI (Switch Abstraction Interface, a normalized API to configure the switch ASIC). OpenSwitch proposes a number of pre-defined networking functionalities that are easy to set up through the API. Missing ones can simply be developed and added by the user – it is an operating system, after all. Systems like OpenSwitch demonstrate that there has been a will from network operators, over the last years, to step away from vendor-centric appliances and to use commodity hardware, often with a layer of virtualization, to get more flexible solutions. Without sacrificing performance, of course!
And performance topics were at FRnOG. Besides bpfilter, there were two distinct talks, respectively about “100Gb/s optical transmission,” and “Ethernet at 400Gb/s.” What should I say? The future is here – the future is now! Thanks to better silicon and to better optics, Ethernet speeds may be able to reach 400Gb/s on a single port in 2018, and are expected to double again by 2020. Two standards are competing for the succession to QSFPs: OSFPs (Octal SFPs) and QSFP-DDs (QSFP Double Density), the former being faster and bigger, making them able to dissipate more heat and to work at 15W without an additional heat sink, covering a higher distance for transmission, but not keeping the same cage size as the latter does. With new PCIe standards coming out, this can only mean one thing: the servers, or even their network cards, will have to work harder to be able to process even higher bit rates!
There were a couple more presentations, somewhat less technical and not related to fast networking that I did not list here (you may find them on the agenda of the conference
). But as I see it, the essential trend was about speed and performance. The pipes are growing, and the nodes must prepare to receive and process an ever increasing number of packets. How should we do this? When the system cannot cope, offloading to the NIC is one solution. At Netronome, we really feel we are at the heart of those changes.
Thinking again about bpfilter. Sure, we did not work on it, but there is something that I omitted to mention. As it turns out, we get out-of-the-box compatibility between bpfilter and the eBPF offloading features supported by our Agilio SmartNICs. As we made sure to cooperate with the kernel when building these features, there is absolutely nothing to change to make it work, to the point that hardware offload is even used in the proof of concept submitted with the RFC a few weeks ago. While attaching a filtering program, bpfilter first attempts to offload it and resorts to the host if that attempt failed. Regarding packet processing, is it time for the host to become no more than a backup solution? We may not be there yet, but still, hardware offload was given as one of the main advantages of the bpfilter approach, and Netronome was cited as an example for this. We feel very proud about it. It comforts us in our choice, and after all, if bpfilter was to be merged in the kernel, would that not mean that by the pertinence of our design, we are (somewhat) influencing major evolutions in networking on Linux? For my part, I like to think so.