Posted on Leave a comment

Arm Cloud-Native Ecosystem Recap, Fall/Winter 2021

For many years now, miniNodes has provided a semi-annual blog post (or more!) providing an update on all things Arm Servers.  But during this year’s Arm DevSummit that just concluded, there was a large emphasis on automotive, 5G, and Cloud-Native development, which served as a good reminder that the Arm compute ecosystem has grown and expanded and there are more than just the traditional datacenter-centric Arm Servers now.  Certainly, the big Arm Servers have gained a lot of traction amongst the hyperscalers, with AWS producing the Gravitron processor and now deploying multiple products on it, and Oracle recently launching cloud servers based on the Ampere Altra platform.  However, more broadly speaking, Arm compute is expanding and covering more use-cases than previously, thanks to a shifting mindset of how and where applications should be deployed.  While the past decade was all about placing workloads in the cloud, the next decade is going to be all about bringing workloads back from the cloud and instead run closer to the user, at the Edge of the network.  These endpoints might be smaller servers, or, they might be cars, aircraft, windmills, factories, 5G basestations, autonomous bots, and more.

The cloud-native design parameters such as containers for deployment, allow developers to build applications and then push them anywhere, independent of the underlying hardware and enabling a Cloud-to-Edge-to-IoT infrastructure that is seamless and tightly integrated.

With Arm compute in all of these devices, developers can build applications and frameworks that place big compute on the Arm Servers, smaller, “just what’s needed” compute on the Edge servers, and specialized workloads on the “Things” such as self-driving vehicles, smart cameras doing computer vision and machine learning tasks, cellular gateways, etc.

These specialized segments are now beginning to create their own hardware and reference designs targeting their individual use-cases, for example SOAFEE in the case of autonomous self-driving vehicles.  The first reference platform to be released is from ADLINK, with their AVA system built around an Ampere Altra SoC.  The AVA runs the Autoware software stack, with input and assistance from Arm as well as TierIV.  In the 5G Gateway ecosystem, Magma is an open-source software stack devoted to deploying gateways and managing connectivity both upstream and downstream (connected phones).

The AI, Computer Vision, and Machine Learning ecosystem is a bit more fragmented however, with a multitude of platform providers, specialized hardware, and software frameworks to choose from.  The major players in this segment are Nvidia, with their Jetson Nano, Xavier NX, and Xavier AGX systems containing Arm cores, as well as associated applications such as CUDA on the devices, Transfer Learning Toolkit for retraining models, and Fleet Command for running workloads on edge servers, and Google with their Coral hardware devices built on Arm cores and Tensorflow software stack.  OctoML, Roboflow, Plainsight, and similar AI platforms also exist, targeting simplified model creation and deployment to devices with easy-to-use web-based tools.  Because of the varied nature of the target problem and inherently unique AI/ML models being generated (each model is specifically customized / tailored per use-case), this ecosystem will likely remain fragmented and without a reference design that everyone agrees upon.

One last note to consider, is that for all of these specialized use-cases to gain traction, reach scale, and allow cloud-native best practices to take hold, these devices need to boot up and function in a standard, well understood way.  Developers will not adopt these platforms if the hardware is difficult to work with, doesn’t allow them to quickly install their OS of choice and languages / tooling, or can’t integrate easily with other existing infrastructure.  In that case, developers will simply return to what they already know works (x86), and sacrifice the advantages gained by making the switch to Arm.  Thus, the SystemReady standards that address device boot processes and hardware descriptions are critical to the success of these new specialty niche ecosystems.  The SystemReady IR for IoT devices, ES for Embedded Servers, and SR for Servers are important standards that need to be adopted by the SoC manufacturers and hardware vendors, to ensure developers end up with a device that behaves like the systems they already know and use on a daily basis.

ADLink SOAFEE AVA Workstation