Sujal_Das_headshot.jpg

10 Myths about SDN, NFV and Data Center Switches: Debunked: Part Two

By Sujal Das | Aug 23, 2016

In this blog, I continue debunking myths about the role of data center switches in SDN and NFV deployments.

Myth #2: There was a significant market need for OpenFlow-based SDN features, but market-leading switch vendors who wanted to maintain the status quo stalled much-needed progress.

I believe many will agree that the OpenFlow specification and early initiatives by the Open Networking Foundation (ONF) went against the grain of a few established switch silicon vendors that owned almost 100% of the data center top-of-rack (TOR) switch market, but the market has adapted since then, largely by focusing on scaling bandwidth while reducing costs, power and latency. Let me explain:

There are two main facets of OpenFlow:

  • 1. Disaggregating the control plane software from the hardware, and
  • 2. Supporting a large number of flows (ACL rules) or an ONF-specified forwarding pipeline that enabled that requirement.

The first facet hurt switch OEMs that had invested heavily in creating value by closely coupling control plane software with forwarding plane hardware. This facet of OpenFlow did not perish; rather it evolved in a different and more successful way – as disaggregation of the networking switch and adoption of white box switches. Innovations like Open Network Install Environment (ONIE) and Switch Abstraction Interfaces (SAI) driven under the auspices of the Open Compute Project (OCP) have resulted in the disaggregation of the switch networking OS from the underlying switch hardware in the same way as it has been done in servers.

The second facet put pressure on silicon vendors to look at alternate designs while not compromising cost, power and bandwidth. It was the latter that eventually killed the initial and fervent interest in OpenFlow and support for large number of flows in TOR switches. Here’s why:

Well-established and proven Layer 2 switching, Layer 3 routing and QoS features in existing switch silicon pipelines were much more important than OpenFlow for the bulk of rapidly evolving and simple leaf-and-spine switch architectures. Support for a large number of rules requires significantly more table memory on-chip and this is difficult to implement, especially when bandwidth and latency are top considerations. Support for a large number of wild card flow rules require use of large TCAMs integrated in the switch silicon. TCAMs have always been expensive, power-hungry and hard to scale to high bandwidths. Their integration in switch silicon meant higher cost and power at lower bandwidths.

There was only a small market for applications that justified the use of large TCAMs, so switch vendors stayed course; namely investing to scale the bandwidth of their switch silicon designs while reducing latency, cost and power. They added features that were important to the largest cross-sections of the market; for example, better network analytics and memory management techniques. Some may call this lack of conviction to do something different, but in the world of business, one does what sells and the incumbent switch silicon vendors enjoyed significant sales increase by staying the course. Most importantly, the largest data center operators helped them stay the course and rewarded them with significant revenue growth.

The need for more flows however did not go away. The increased importance for match-action processing in data center networks, as highlighted by early OpenFlow implementations and deployments, is likely the greatest contribution of the OpenFlow specification and related initiatives. With growing security requirements, especially distributed security where each VM and application needs to be trusted in what is called zero-trust defense inside the data center, the need for both stateless and stateful policy rules implemented using flows has exacerbated. Modern data centers, including the largest ones implementing cloud IaaS, SDN and NFV, utilize increasing number of flow-based rules to implement tight security policies. They don’t do this in OpenFlow-enabled TOR switches. Instead, they do it in the servers using software-based solutions such as IP tables and Open vSwitch (OVS). OVS has evolved rapidly in the last few years to enable stateless and stateful security rules. More recently, with large-scale adoption of 10GbE and higher speed networking and increasing number of VMs and VNFs (virtual network functions) per server, the CPU tax for such network processing has become unacceptable. Netronome’s Agilio server networking platform alleviates such server efficiency challenges by enabling large numbers of flows at high performance for both stateless and stateful security policies, while freeing up valuable CPU cores for revenue generating VMs and applications.

Myth #3: Nicira was one of the early implementers of OpenFlow and this is a key reason for their successful exit – touted as the first big one in the world of SDN.

Stay tuned for part 3 in this series.

Read the Blog, "10 Myths about SDN, NFV and Data Center Switches: Debunked: Part 3" by Sujal Das.
Read the Blog, "10 Myths about SDN, NFV and Data Center Switches: Debunked" by Sujal Das.