Posted on

Running AI Workloads on Arm Servers

Arm’s Role in Processing AI Workloads

The past several years have seen enormous gains in Artificial Intelligence, Machine Learning, Deep Learning, Autonomous Decision Making, and more.  The availability of powerful GPUs and FPGAs both on-premise and in the cloud for several years now have certainly helped, but more and more of this AI processing is actually being done at the Edge, in small devices.  The popularity of Amazon Alexa, Google Home, and AI-enabled features in smartphones such as Apple’s Siri has skyrocketed over the past few years.  The various frameworks and models such as Tensorflow, PyTorch, Caffe, and others have matured, and newer, lightweight versions have come along such as TinyML, TensorflowLite, and other libraries designed to allow machine learning in the smallest devices possible.  Local processing of audio and detecting specific sounds via wavelength pattern matching, object recognition in a camera’s frame, motions and gestures being monitored and observed, and vehicle safety systems that detect and respond immediately to changing conditions with no human intervention are some of the most common applications.

The work that it takes to develop these AI models is very specialized, but ultimately algorithms are created, a large sample of training data is fed in to the system, and a model is developed that has a confidence factor and accuracy value.  Once the model is deployed, real-time stream processing occurs, and actions can be taken based upon the results of data flowing through the application.  In the case of a computer vision application for example, identifying certain objects  can result in alerts (hospital staff notified), corrective actions (apply the brakes immediately), or data stored for later use.
As mentioned, more and more AI/ML is actually being processed at the Edge, on small form factor devices.  And, small form factor devices tend to be powered by Arm SoCs, as opposed to the more power hungry x86 designs commonplace in laptops and desktops.  Home devices like Alexa, Google Home, and nearly all smartphones are based on Arm SoCs.  Thus, AI models need to be created, tested, and compatibility verified for Arm powered devices.  Even if an algorithm is developed and trained on a big GPU or FPGA, the resulting model should still be tested on Arm SoC’s to ensure proper functionality.  In order to help speed the testing process, miniNodes now offers hosted Arm microservers with dedicated AI accelerators, that can assist with offloading AI tasks from the CPU and offer excellent machine learning performance.  Testing of self-driving vehicle object detection, navigation and guidance, and actions / behavior models, image classification and object recognition from cameras and video streams, convolutional neural networks, and matrix multiplication workloads, robotics, weather simulation, and many types of deep learning activities can be quickly and easily processed.

Arm Lowers the Cost of AI Processing

AI training and inference in the cloud running on Arm microservers at miniNodes also offers a distinct cost advantage over Amazon AWS, Microsoft Azure, or Google GCE.   Those services can very quickly cost thousands of tens of thousands of dollars per month, but many AI workloads can get by just fine with more modest hardware when paired with a dedicated AI accelerator like a Google Coral TPU, Intel Movidius NPU, or Gyrfalcon Matrix Processing Engine.  AWS, Azure, and GCE provide great AI performance, sure, but you also pay heavily for the processor, memory, storage, and other components of the overall system.  If you are ready to make use of those immense resources, wonderful.  But if you are just starting out, are just learning AI/ML, are only beginning to test your AI modeling on Arm, or just have a lightweight use case, then going with a smaller underlying platform while retaining the dedicated AI processing capability can make more sense.

miniNodes is still in the process of building out the full product lineup, but in the meantime Gyrfalcon 2801 and 2803 nodes are online and ready, with up to 16.8 TOPs of processing for ResNet, MobileNet, or VGG models.  They are an easy, cost effective way to get started with AI processing on Arm!

Check them out here:  https://www.mininodes.com/product/raspberrypi-4-ai-server/