Sujal_Das_headshot.jpg

It’s Time for Disaggregated Silicon!

By Sujal Das | Oct 31, 2018
We have seen the value of disaggregation in many areas. Most recently, in the world of networking driven by the OCP community. It has enabled significant choice in how operators procure networking equipment and eliminated vertical, inflexible solutions and vendor lock-in. Is it now time for disaggregation at the silicon level?  

Seven leading silicon companies think so. Netronome, Achronix, GLOBALFOUNDRIES, Kandou Bus, NXP, Sarcina and SiFive are collaborating to announce an open, composable architecture for chiplets and domain-specific accelerators this week at the Linley Fall Processor Conference. They all provide unique technologies to help build advanced SoCs: Arm and RISC-V CPUs, SerDes, Networking and Security, FPGA, silicon development, foundry and packaging services.  At Netronome, we are proud to be one of them and lead the charter.

The need arises from multiple powerful trends:
  • New demanding server workloads and the demise of Moore’s Law
  • The significant performance/watt benefits of domain-specific accelerators
  • The exponential costs of silicon development, especially at lower process nodes
  • The economies of building chiplets instead of monolithic chips
  • Availability of best-of-breed components as chiplets at optimum process nodes
In this blog, I will cover each of these trends.

General-purpose CPUs cannot sustain the demands of new server workloads. With bandwidths increasing, server productivity (as measured in the number of CPU cores available for revenue generating applications) is heading toward zero. The demise of Moore’s Law (shrinking number and length of transistors bought per dollar) only exacerbates the situation.  

Figure-1-Silicon
Figure 1: Domain-specific silicon delivers higher performance/watt


Domain-specific architecture-based silicon have come to the rescue. These are popular today as networking and security coprocessors as used in SmartNICs, or machine learning and inferencing co-processors as used in PCIe adapters or appliances.  

Domain-specific architectures, as the name implies, are tailored to specific domains. The devices are programmable, not hardwired as in traditional ASICs. They feature integrated application and deployment-aware development of devices, firmware, systems and software and they support domain-specific languages for ease of use. Key attributes of a domain-specific architecture are parallelized data processing, function-specific logic, application-aware data management and control. As shown in figure 1, the result is significantly better performance/watt. Two well-known, domain-specific silicon-based examples are shown, namely Google’s Tensor Processing Unit (TPU) for machine learning and AI, and Netronome’s Network Flow Processor (NFP) for networking and security. 

While domain-specific silicon can eliminate the server productivity challenge, developing them is not easy. They require domain-specific knowhow and are typically applicable to smaller market segments as compared to general purpose CPUs. As can be seen in figure 2, the cost of silicon development is skyrocketing. The ROI revenue required to embark on silicon development at smaller, more advanced process nodes is astounding.

Figure-2
Figure 2: Skyrocketing costs of silicon development

Source: Keith Flamm, Nov ‘17 (Measuring Moore’s Law; Evidence from Price, Cost & Quality Indices) Global Foundries, semiengineering.com (“How much will that chip cost?”)

Does this mean only the largest companies serving the largest markets can afford to build new silicon? Well, this situation can certainly stymy domain-specific innovation and limit the choice of components – both specialized and commodity/generic. It is, therefore, time to disaggregate and democratize. Chiplets come to the rescue.

The economics of building silicon using chiplets with smaller die sizes has been established by multiple companies and products. AMD, with its highly successful EPYC CPUs, has shown us how use of chiplets with CPU cores can reduce silicon development and manufacturing costs of a 32-core CPU by up to 40%.  In addition, the strategy enables rapid go-to-market of other smaller versions of CPUs.  Silicon built using chiplets can bring significant benefits in multiple areas:

  • More Choice: Source silicon die from multiple chiplet vendors 
  • Best-of-breed: Source from suppliers with domain expertise (e.g., networking, security, AI)
  • Leverage economies of scale: Source commodity/generic components (MAC, SerDes, Memory) from large suppliers 
  • Cheaper: Smaller chiplet die with better yield curves, varied process nodes (for example, smaller where high density is needed such as with SerDes, while bigger for other chiplets), and focus on core-competency enable lower development costs, faster time-to-market
There is a challenge the industry needs to solve to democratize development and manufacturing of chiplet-based, domain-specific accelerator silicon. The connectivity between chiplets needs to be open and standardized so all chiplet suppliers adhering to the open standard can plug together like LEGO blocks on a more economic organic substrate. This is where Netronome, with its proven NFP architecture, brings significant value.

The NFP devices are built as LEGO blocks. The blocks are called Logic Blocks or Islands.  Each Logic Block is implemented as a complete chip and a multi-terabit switch fabric interconnect the Logic Blocks. This is shown in figure 3. The right-hand side of the figure shows the elements of the switch fabric that includes memory management, link and PHY layers. Using this architecture, Netronome has been able to produce many devices with different configurations quickly (four devices between 2014 and 2018) while minimizing engineering resources.

Figure-3
Figure 3: Netronome NFP LEGO block architecture using Logic Blocks

Netronome is pleased to bring the Logic Blocks-related interconnect expertise and intellectual property (IP) to the domain of chiplets and advanced SoCs that comprise best-of-breed components from leading silicon companies.  Together with Achronix, GLOBALFOUNDRIES, Kandou Bus, NXP, Sarcina and SiFive, Netronome has formed the Open Domain-Specific Accelerator (ODSA) Workgroup that is developing open specifications and contributions related to a complete stack comprising of an application layer, memory management layer, link layer, multiple PHY layer interfaces and the substrate layer. Existing standards are being leveraged where applicable while new open IP/specifications are being developed by the Workgroup. This concept is depicted in figure 4.


Figure-4
Figure 4: Open Architecture for Chiplets-Based Advanced SoC Designs for Domain-Specific Accelerators

A technical white paper with details on the above will be released this quarter. The ODSA Workgroup is open to all companies wishing to participate. The goal of the workgroup is to enable any vendor’s silicon die as a building block that can be utilized in a chiplet-based SoC design. We are starting in earnest with the workgroup participants bringing critical components – Arm and RISC-V processors, network, security and FPGA accelerators, and SerDes I/O peripherals using optimal process nodes – to the party. We need more, such as machine learning chiplets and memory solutions. Please join us in the chiplet revolution.  If you are interested, please send an email.

My personal vision of the innovative, ground-breaking initiative being conducted by the ODSA workgroup is that this approach will replace single vendor solutions like the Xilinx Everest chip. Quoting The Linley Group, “Instead of marketing them as FPGAs with embedded CPU cores, the company (Xilinx) is pitching them as full-fledged SoCs augmented with programmable logic. They upend the traditional orientation of FPGAs by surrounding the programmable gates with more of everything: processing cores, hard logic, fast interconnects, and I/O interfaces.” In the world of disaggregated silicon, “the programmable gates and the more of everything” will be sourced from multiple and best-of-breed vendors and the chip will cost significantly less.