Posted on Leave a comment

The Future of AI Servers

The Future of AI Servers

Following up on the recent announcement of our new Raspberry Pi 4 AI Servers, it seems that AI servers running on Arm processors are gaining more and more traction in the market due to their natural fit at the IoT and Edge layers of infrastructure.  Let’s take a quick look at some of the unique properties that make AI Servers running on Arm a great strategy for AI/ML and AIoT deployments, to help understand why this is so important for the future.

Power – Many IoT deployments do not have luxuries that “regular” servers enjoy such as reliable power and connectivity, or even ample power for that matter.  While Intel has spent decades making excellent, though power hungry processors, Arm has focused on efficiency and battery life, helping to explain they they dominate the market in tablets and smartphones.  This same efficiency is then leveraged by IoT devices running AI workloads, so Edge devices responsible for computer vision, image classification, object detection, deep learning, or other workloads can operate with a much lower thermal footprint than a comparable x86 device.

Size – Similar to the underlying reasons behind power efficiency, the physical size and dimensions of Arm AI Servers can be made smaller than the majority of x86 designs.  Attaching AI Accelerators such as the Gyrfalcon 2801 or 2803 via USB to boards as small as 2 inches square (such as the NanoPi Neo2) is possible, or the addition of a Google Coral TPU via the mini-PCIe slot on a NanoPi T4 bring an enormous amount of inferencing to AI Servers in tiny form factors. 

Cost – Here again, Arm SoC’s and Single Board Computers typically have a rather large cost advantage versus x86 embedded designs.  

Scalability – This is a critical factor in why Arm will play a massive role in the future of AI Servers, and why miniNodes has begun to offer our Raspberry Pi 4 AI Server.  As mentioned, low power, cheap devices make great endpoints, but, there is also a role for “medium” sized AI servers handling larger workloads, and Arm partners are just now starting to bring these products to market.  An example is the SolidRun Janux AI Server, which also makes use of the same Gyrfalcon AI Accelerators used by our nodes.  So, you can get started training your models, testing out your deployment pipeline, understanding the various AI frameworks and algorithms, and getting comfortable with the tools, and very easily scale up as your needs expand.  Of course, once you reach massive amounts of deep learning and AI/ML processing, enterprise Arm server options exist for that as well.

Flexibility – Taking Scalability one step further, the Arm AI servers also allow for a great amount of flexibility in the specific accelerator hardware (Gyrfalcon, Google Coral, Intel Movidius), the frameworks used (Caffe, PyTorch, TensorFlow, TinyML), and the models (ResNet, MobileNet, ShuffleNet) employed.  

Ubiquity – A final piece of the overall AI Server ecosystem is the ease of access to this type of hardware, and low barriers to entry.  The Raspberry Pi and similar types of boards are distributed globally, and readily available in nearly all markets.

As you can see, our view is that the future of AI servers is based on Arm SoC’s, and now is the time to start exploring what that means for your workload.

Posted on Leave a comment

Running AI Workloads on Arm Servers

Arm’s Role in Processing AI Workloads

The past several years have seen enormous gains in Artificial Intelligence, Machine Learning, Deep Learning, Autonomous Decision Making, and more.  The availability of powerful GPUs and FPGAs both on-premise and in the cloud for several years now have certainly helped, but more and more of this AI processing is actually being done at the Edge, in small devices.  The popularity of Amazon Alexa, Google Home, and AI-enabled features in smartphones such as Apple’s Siri has skyrocketed over the past few years.  The various frameworks and models such as Tensorflow, PyTorch, Caffe, and others have matured, and newer, lightweight versions have come along such as TinyML, TensorflowLite, and other libraries designed to allow machine learning in the smallest devices possible.  Local processing of audio and detecting specific sounds via wavelength pattern matching, object recognition in a camera’s frame, motions and gestures being monitored and observed, and vehicle safety systems that detect and respond immediately to changing conditions with no human intervention are some of the most common applications.

The work that it takes to develop these AI models is very specialized, but ultimately algorithms are created, a large sample of training data is fed in to the system, and a model is developed that has a confidence factor and accuracy value.  Once the model is deployed, real-time stream processing occurs, and actions can be taken based upon the results of data flowing through the application.  In the case of a computer vision application for example, identifying certain objects  can result in alerts (hospital staff notified), corrective actions (apply the brakes immediately), or data stored for later use.
As mentioned, more and more AI/ML is actually being processed at the Edge, on small form factor devices.  And, small form factor devices tend to be powered by Arm SoCs, as opposed to the more power hungry x86 designs commonplace in laptops and desktops.  Home devices like Alexa, Google Home, and nearly all smartphones are based on Arm SoCs.  Thus, AI models need to be created, tested, and compatibility verified for Arm powered devices.  Even if an algorithm is developed and trained on a big GPU or FPGA, the resulting model should still be tested on Arm SoC’s to ensure proper functionality.  In order to help speed the testing process, miniNodes now offers hosted Arm microservers with dedicated AI accelerators, that can assist with offloading AI tasks from the CPU and offer excellent machine learning performance.  Testing of self-driving vehicle object detection, navigation and guidance, and actions / behavior models, image classification and object recognition from cameras and video streams, convolutional neural networks, and matrix multiplication workloads, robotics, weather simulation, and many types of deep learning activities can be quickly and easily processed.

Arm Lowers the Cost of AI Processing

AI training and inference in the cloud running on Arm microservers at miniNodes also offers a distinct cost advantage over Amazon AWS, Microsoft Azure, or Google GCE.   Those services can very quickly cost thousands of tens of thousands of dollars per month, but many AI workloads can get by just fine with more modest hardware when paired with a dedicated AI accelerator like a Google Coral TPU, Intel Movidius NPU, or Gyrfalcon Matrix Processing Engine.  AWS, Azure, and GCE provide great AI performance, sure, but you also pay heavily for the processor, memory, storage, and other components of the overall system.  If you are ready to make use of those immense resources, wonderful.  But if you are just starting out, are just learning AI/ML, are only beginning to test your AI modeling on Arm, or just have a lightweight use case, then going with a smaller underlying platform while retaining the dedicated AI processing capability can make more sense.

miniNodes is still in the process of building out the full product lineup, but in the meantime Gyrfalcon 2801 and 2803 nodes are online and ready, with up to 16.8 TOPs of processing for ResNet, MobileNet, or VGG models.  They are an easy, cost effective way to get started with AI processing on Arm!

Check them out here:  https://www.mininodes.com/product/raspberrypi-4-ai-server/