Posted on Leave a comment

Installing AlmaLinux 8 on Arm, using a Raspberry Pi 4

A few weeks ago, we took a look at how to install the new Rocky Linux 8 on Arm, using a Raspberry Pi, as a replacement for CentOS.  This is due to Red Hat altering the release strategy for CentOS, transitioning from a stable methodology to more early and rapid development.  However, there is also a second community build aiming to fill the gap left by Red Hat, so today we will look at the process of installing the new AlmaLinux 8, again on the Raspberry Pi 4 with community-built UEFI firmware.

Like Rocky Linux, this new AlmaLinux is a Linux distribution put together by the community, in order to replace the stable, predictable manner in which packages are updated.  AlmaLinux comes in both x86 and aarch64 builds of the OS, and we’ll be using the aarch64 build of course for our Pi.

We’re going to replicate the previous how-to for the most part, so let’s recap the hardware we’ll use:

  • Raspberry Pi 4B
  • SD Card
  • USB stick for install media
  • USB Stick or USB-to-SSD adapter for destination (permanent storage) media

To get started, we are going to download and flash the community-built UEFI firmware for the Raspberry Pi to an SD Card.  This UEFI implementation is closer in nature to a “normal” PC UEFI BIOS, and will cause the Pi to boot a bit more standard than would be achieved with the Raspberry Pi OS method.  The UEFI firmware is placed directly on an SD Card, and when the Pi is powered on it will read the UEFI firmware and can then boot from a USB stick or over the network.  To install the UEFI firmware, download the latest release .zip file (RPi4_UEFI_Firmware_v1.28.zip at the time of this writing) from https://github.com/pftf/RPi4/releases

Next, unzip this .zip file you just downloaded, and copy the contents to an SD Card.  The card needs to be formatted as FAT32, so if you are re-purposing an SD Card that had Linux on it previously you might need to delete partitions and re-create a FAT32 partition on the SD Card.  Once the files are copied to the SD Card, it will look like this:

With the SD Card complete, we can now proceed to download AlmaLinux.  Browse to https://almalinux.org and click on Download.  You will have the option for x86 or aarch64 downloads, obviously we’ll want the Arm64 version so click on that link, and then choose a mirror close to you.  Once you are taken to the mirror’s repository, you’ll see you have Boot, Minimal, or DVD .iso files to choose from.  For this tutorial, we’ll go with minimal, so click on that one and your download will begin.  Once the download is complete, flash the file to a USB stick using Rufus, Etcher, WinDisk32, or any other method you prefer.

Now that we have our SD card for booting and USB stick for installing, we just determine what to use for destination storage.  As the Pi doesn’t have any onboard eMMC, and the SD Card slot is occupied by our firmware, we could use another, separate USB drive, network attached storage, or for this tutorial we’ll actually go with a USB-to-SSD adapter, which will allow us to hook up a 2.5 inch SATA SSD as our permanent storage.

Plug the SSD into the adapter, and then connect the USB plug into one of the USB 3.0 (blue) ports on the Pi.  Attach a keyboard, mouse, and monitor, insert the SD Card, and the USB Stick with AlmaLinux on it, then plug in power.  After a moment you will see a Raspberry Pi logo, and the Pi will boot from the USB stick.  The AlmaLinux installation process will begin, and if you are familiar with the CentOS installation process you will notice it’s nearly identical, since the upstream sources are the same.

AlmaLinux-8-Install

The Raspberry Pi is not as fast as a PC, or a large Arm Server, so you’ll need to be patient while the installation wizard loads and navigating the menus can be a bit slow.  However, you will be able to setup a user account, choose your timezone, and select the destination drive to install to (the SSD).  Once satisfied, you can begin the installation, and again you’ll need to be patient while the files are copied to the SSD.  Make some coffee or tea. 

AlmaLinux-Install-Complete

Once the process does complete, you can reboot the Pi, remove the USB stick so you don’t start the whole process over, and eventually boot into your new AlmaLinux 8.4 for Arm distro!

AlmaLinux-8-Login

Posted on Leave a comment

AI Server And Hosted Jetson Use Cases

Let’s continue once again with our look at Edge AI Servers and why it makes sense to use Nvidia Jetson Nano’s and Xavier’s as small, cost-effective machine learning endpoints.

As many startups and developers already know, AI Servers from the major cloud vendors are very expensive.  Here is a quick sample of prices, at the time of this writing (July 2021), on for a few different EC2 platforms:

TypeHourly PriceCPURAMStorage
p3.2xlarge$3.06861 GiBEBS Only
g4dn.xlarge$0.53416 GiB125 GB NVMe SSD
g4ad.8xlarge$1.7332128 GiB1200 GB NVMe SSD
p3.8xlarge$12.2432244 GiBEBS Only
g3s.xlarge$0.75430.5 GiBEBS Only

P3 Instances include V100 Tensor Core GPUs, G4dn include T4 GPUs, and G4ad include Radeon Pro V520 GPUs.

Over on the Azure site, here are just a pair of the current pricing and options:

SizeHourly PricevCPUMemory: GiBTemp storage (SSD) GiBGPUGPU memory: GiB
Standard_NC4as_T4_v3$0.63428180116
Standard_NC6s_v3$3.986112736116

You can save some money if you commit to a 1-year or 3-year term, but the regular “On-Demand” prices start at $374.40 per month on AWS or $460 per month on Azure for the smallest server, and go up dramatically from there.  The p3.8xlarge for example, is currently nearly $9,000 per month.  Many others are in the $3,000 to $5,000 per month range.  

If your use case involves training a model or building an algorithm and then deploying it to Edge AI devices, the large amount of power provided by one of those machines could certainly make sense as the first step in the Edge AI deployment process.  Using the Nvidia Transfer Learning Toolkit as one example, the workflow consists of leveraging existing publicly available models, adding your own custom classes and dataset, training it on the big hardware such as V100 and T4 GPU’s, and then deploying the output and resulting model onto smaller devices such as Jetsons.  Testing your TLT output on a hosted Jetson, prior to deploying to all the devices out in the field, would be wise.  Thus, a hosted Nano or Xavier that is built into your overall CI/CD pipeline, to QA and ensure functionality prior to deployment to the whole fleet, is a great use case.

A second very important use-case for hosted Jetson devices is for learning, experimentation, and development purposes.  As we can see above, the cheapest EC2 instance is about $375 per month.  If you are just get started learning how to build and develop AI applications, exploring the DeepStream or CUDA ecosystem, or looking to work through various online learning courses on the topic, $375 probably doesn’t make much sense…but a smaller AI Server could be just right for the task.  Hosted Jetson Nano’s can absolutely be used to learn the fundamentals of computer vision, deep learning, neural networks, and other machine learning topics.  They are also good for learning about containers and pulling down images from the Nvidia NGC Catalog, such as PyTorch, TensorFlow, Rapids, and more.  Those types of smaller tasks, Hello AI World projects, and basic Getting Started projects are far more cost effective to run on Jetsons than on the big GPU servers costing 10x (at a minimum).

Posted on Leave a comment

AI Server Workflow, Nvidia Transfer Learning Toolkit Example

Continuing our series on Edge AI Servers and the rapid transition underway as developers migrate their workloads from traditional datacenters / clouds to smaller, distributed workloads running closer to users, let’s investigate a specific Nvidia AI Server workflow where a hosted Jetson Nano or Jetson Xavier NX could make sense.

At GTC in May 2021, Nvidia launched the Transfer Learning Toolkit (TLT) 3.0, which is designed to help users build customized AI models quickly and easily.  The process is rather straightforward:  Instead of creating and training a model from scratch, which is very time consuming, you instead take a pre-trained model such as PeopleNet, FaceDetectIR, ResNet, MobileNet, etc, add in your custom object (image for vision applications, sound for audio or language models), re-train with the added content, and then you can leverage the resulting output model for inferencing on smaller, edge devices.

The Transfer Learning Toolkit is part of the larger TAO (Train, Adapt, Optimize) platform, and is intended to be run on big hardware.  Their requirements state:

  • 32 GB system RAM
  • 32 GB of GPU RAM
  • 8 core CPU
  • 1 NVIDIA GPU
  • 100 GB of SSD space
  • TLT is supported on A100, V100 and RTX 30×0 GPUs. 

However, the end result is a model that can run on a much smaller device like a Jetson Nano, TX2, or Xavier NX.

Looking closer at the TLT Quick Start documentation, installation begins with setting up your workstation, or launching a cloud VM like an AWS EC2 P3 or G4 instance which have Volta or Turing GPUs.  There are a series of prerequisites to install, a TLT container that is downloaded from the Nvidia Container Registry, and once your python environment is setup, you launch a Jupiter notebook that will help guide you through the rest.  

There is also a sample Computer Vision Jupyter notebook that can get you up and running quickly, located here:  https://ngc.nvidia.com/catalog/resources/nvidia:tlt_cv_samples

Once you have the Transfer Learning Toolkit workflow established, you can begin testing the output and resulting models on Jetson devices.  This is where a hosted Jetson Nano functioning as an AI Server might make sense, as you could simply some automation or CI/CD workflow for training with TLT and testing the results on the Jetson.  Then, if everything passes, results are highly accurate, detection, segmentation, etc are all performing well, then you could deploy to your production devices in the field.

This is of course only one example of the value of hosted AI servers, and we’ll continue looking at more use cases in the near future! 

Posted on 2 Comments

How to Install Rocky Linux 8 on Arm, Using a Raspberry Pi 4

As you have probably read, Red Hat is changing the way that CentOS builds are delivered, moving the project to a Stream release, which brings in updates and patches faster, but may have less stability and potential for bugs to be introduced.  For some users, the more timely updates are a good thing, but for others, enterprise stability and long release cycles are better.  For those users who desire less frequent builds and have stricter testing and integration cycles, several projects sprang up to fill the hole left by CentOS.  One such new distro is Rocky Linux, which just had it’s first official release, known as Rocky Linux 8.4.

Let’s take a look at how to install Rocky Linux on Arm, as they produce a native aarch64 build typically geared towards Arm Servers, though in this case we will use a Raspberry Pi 4 just to demonstrate it works.

First and foremost, let’s cover what we’ll need for this project:

  • Raspberry Pi 4B
  • SD Card
  • USB stick for install media
  • USB Stick or USB-to-SSD adapter for destination (permanent storage) media

To begin our process, we need to download the community built UEFI firmware for the Raspberry Pi, which we will use to boot up the Pi, as opposed to the normal u-boot and device tree methodology used by Raspberry Pi OS.  The UEFI firmware will go onto an SD Card just like normal, and when the Pi is powered up it will read the firmware from the SD Card and can then proceed to boot from USB or over the network.  To install the UEFI firmware, simply go to https://github.com/pftf/RPi4/releases and download the latest release .zip file (RPi4_UEFI_Firmware_v1.28.zip at the time of this writing). 

Next, grab an SD Card and make sure it has a FAT32 partition.  Extract the contents of the .zip file you downloaded, and copy those contents to the FAT32 partition on the SD Card.  Once the files are in place, it should look like this:

At this point the SD Card is ready, and the Pi should be able to boot from it, but let’s set is aside for a moment as there are a few more steps to tackle.

Next, it’s time to grab a copy of Rocky Linux.  Head to https://rockylinux.org and click on the Download button.  You’ll see both the x86 and the Arm64 (aarch64) versions, we of course want the Arm distribution.  Choose the version you want from Minimal, DVD, or Boot, but make note, they are large files.  Even the “Minimal” installation .iso is 1.6gb, which is what we’ll use in this tutorial.  So, go ahead and download the “Minimal” aarch64 .iso, and wait for the download to complete.

Once downloaded, we’ll need to flash that .iso installation file to a USB stick.  You can use Etcher, WinDisk32, Rufus, or just `dd` it to your USB stick, whatever you prefer.

At this point, we’re almost ready to power up the Pi and start installing, but still have one more task to accomplish.  We need to figure out what storage media we are going to eventually install Rocky Linux to.  The SD Card slot is taken for the firmware, so we could perhaps use a second USB stick and let that be our long term storage medium where the OS lives, or perhaps a better idea is to purchase a USB-to-SATA adapter and use an SSD, just like a normal PC or laptop would use for storage.  This might prove to be more reliable long term, so let’s go for it.

Attach the SSD to the adapter, plug it into a USB3 port, hook up a keyboard and monitor, insert the UEFI firmware SD Card, and finally plug in the USB stick with Rocky Linux installer.  Now power up, and you should see a Pi logo, and after a moment the Pi will attempt to boot from the USB stick, which contains the Rocky Linux installation process (it’s nearly identical to the Fedora, CentOS, and Red Hat installers, if you have used those before).

In the installation wizard, you can add a user account, set your timezone, setup networking, and select the drive to install to.  Once you have everything configured, the installation will begin, and files are copied from the installer USB stick to the SSD.  This will take a while since we are using a RaspPi in this demo, but could potentially be much faster on big Arm Servers like an Ampere Altra, ThunderX, or even a SolidRun Honeycomb. 

At the end of the installation, you’ll be prompted to reboot.  You can remove the installation USB stick, reboot, and when the Pi boots back up it will automatically boot directly from the SSD.  At this point, you’ve successfully installed Rocky Linux on your Raspberry Pi!

Good luck and have fun with Rocky Linux 8 on your Raspberry Pi!

Posted on Leave a comment

The Edge AI Server Revolution (Driven by Arm, Of Course)

The past 2 years have seen rapid growth in experimentation and ultimately deployment and adoption of AI/ML at the Edge. This has been fueled by dramatic increases in on-device AI processing capability, and equally dramatic reduction in size and power requirements of devices. The Nvidia Jetson Nano with its GPU and CUDA cores, Google Coral Dev Board containing a TPU for Tensorflow acceleration, and even Microcontrollers running TinyML have quickly gained widespread adoption among developers. These devices are cheap, accurate, and readily available, allowing developers to deploy AI/ML workloads to places that were not practical just a short time ago.

hosted-nvidia-jetson-nano

This allows developers to re-think their applications, and begin to migrate AI workloads out of the datacenter, which was the only place to run their AI/ML tasks previously, potentially saving money or improving performance my moving processing closer to where it is needed. This also allows for net-new capability, adding computer vision, object detection, pose estimation, etc, in places that previously were not possible.

In order to help prepare developers and allow them to experiment and build their skills, miniNodes is making available some Edge AI inspired Arm Servers, starting with the Nvidia Jetson Nano. These nodes are intended to be used by engineers and teams just getting started on their Edge AI journey, who are testing their applications and deep learning algorithms. Another use for an Edge AI Arm Server is for light-duty AI processing, where it doesn’t make financial sense to rent big AI servers from the likes of AWS or Azure, and instead a smaller device will work just fine. Finally, developers and teams that do AI training that is not time-sensitive, or relatively small, can achieve significant savings by using a hosted Jetson Nano for their model training, instead of local GPU’s or AWS resources.

Whether you are just getting starting and beginning to explore Edge AI, or have been following the trend and already have Edge AI projects underway, a miniNodes hosted Jetson Nano is a great way to gain hosted AI processing capability or reduce AWS and Azure cloud AI costs.

Posted on Leave a comment

Arm Server Update, Spring/Summer 2021

As usual, we are overdue for an update on all things Arm Servers! Today’s announcement of the Arm v9 specification is a great time to review the state of Arm Servers, and what has changed since our last update.

First, let’s review our last update. Marvell canceled the ThunderX3 product, Ampere had announced the Altra but it wasn’t shipping, AWS Graviton was available, and Nuvia was designing a processor.

Fast forward to today, and the Ampere Altra’s are now becoming available, with limited stock via the Works on Arm program at Equinix Metal, and some designs shown off by Avantek, a channel supplier. Mt. Snow and Mt. Jade, as they are known, are also formally designated as “ServerReady” parts, passing standards compliance tests.

Nuvia, the startup that was designing a new Arm Server SoC from the ground up, was purchased by Qualcomm, in an apparent re-entry into the Arm Server market (or for use in Windows on Arm laptops?). Don’t forget, they previously had an Arm Server part, the Centriq, though they scrapped it a few years ago. So, it now remains to be seen if Nuvia will launch a server-grade SoC, or pivot to a smaller target-device.

The other emerging trend to cover is the role of Arm in the Edge Server ecosystem, where the trend of pushing small servers out of the datacenter and closer to customers and users is rapidly gaining momentum. In this scenario, non-traditional, smaller devices take on the role of a server, and the energy efficiency, small form-factor, and varied capabilities of Arm-powered single board computers are taking on workloads previously handled by typical 1U and 2U rackmount servers in a datacenter. But, small devices like the Nvidia Jetson AGX, RaspberryPi Compute Module 4, and NXP Freeway boxes are able to perform Edge AI, data caching, or local workloads, and only send what is necessary up to the cloud. This trend has been accelerating over the past 12 – 18 months, so, we may see some more niche devices or SoC’s start to fill this market.

Posted on Leave a comment

Arm Server Update, Fall 2020

The announcement yesterday of the cancelation of Marvel’s ThunderX3 Arm Server processor was a reminder that we were overdue for an Arm Server update!  So, continuing on in our regular series, here is the latest news in the Arm Server ecosystem.

As mentioned, unfortunately it appears at this time that Marvell has canceled the ThunderX3 Arm Server processor that was shown earlier this year, and would have been the successor to the ThunderX and ThunderX2 parts released previously.  The current rumors indicate that perhaps some specialized version of the SoC may survive and be used for an exclusive contract with a hyperscaler, but that means “regular” customers will not be able to acquire the part.  And with no general purpose, general availability part, the ThunderX3 will effectively be unavailable. 

That leaves AWS providing the Graviton processor in the EC2 cloud server option, or Ampere with their current generation eMag Arm Server, and forthcoming Ampere Altra SoC as the only server-class Arm processors left (for now).  The Ampere Altra is brand new, and available from our friends at Packet in an Early Access Program, but no specific General Availability date has been mentioned quite yet.  This processor offers 80-cores or 128-cores, and is based on Arm Neoverse N1 cores. 

There is another processor on the horizon though from Nuvia, a startup formed late last year who is designing an Arm-based server class SoC.  Nuvia has said it will take several years to bring their processor to market, which is a typical timeframe for an all-new custom processor design.  So in the meantime, only Amazon and Ampere are left in the market.

The NXP desktop-class LS2160 as found in the SolidRun Honeycomb could also be considered for some workloads, but it is a 16-core part based on A72 cores.

There is one other Arm Server that exists, but unfortunately it’s not able to be acquired outside of China:  the Huawei TaiShan 2280 based on the HiSilicon Kunpeng 920.  This is a datacenter part that is likely used by the large cloud providers in China, but seems difficult (or impossible) to obtain otherwise.  It is a dual processor server, with 64-cores in each processor, thus totaling 128 cores per server.

As usual, the Arm Server ecosystem moves quickly, and we look forward to seeing what’s new and exciting in our next update!

 

Posted on 17 Comments

How to Run Rosetta@Home on Arm-Powered Devices

How to Run Rosetta@Home on Arm-Powered Devices

This week, after an amazing Arm community effort, the Rosetta@Home project released support for sending work units to 64-bit Arm devices, such as the Raspberry Pi 4, Nvidia Jetson Nano, Rockchip RK3399-based single board computers, and other SBC’s that have 2gb of memory or more.

Sahaj Sarup from Linaro, the Neocortix team, Arm, and the Baker Lab at the University of Washington all played in role helping us port the Rosetta software to aarch64, get it tested in their Ralph (Rosetta ALPHa) staging environment, validate the scientific results, and eventually push it to Rosetta@Home.

Now, anyone with spare compute capacity on their Arm-powered SBC’s running a 64-bit OS can help contribute to the project by running BOINC, and crunch data and perform protein folding calculations that help doctors target the COVID-19 spike proteins (among other medicine and scientific workloads).

Here is a quick tutorial on how to get started, using a native operating system for your devices.  This methodology is not the only way to run Rosetta@Home, but, is intended for the technical users who want to run their own OS and manage the system themselves.

Raspberry Pi 4

To fight Covid-19 using a Raspberry Pi 4, you need a Raspberry Pi 4 with 2gb or 4gb of RAM.  The Rosetta work units are large scientific calculations, and they require 1.9gb of memory to run.  You will need to use a 64-bit OS for this, so Raspbian will not work, as it is a 32-bit OS.  Instead, you will need to download and flash Ubuntu Server from their official sources, located here:  https://ubuntu.com/download/raspberry-pi.  Once the SD Card is written, and your Pi 4 has booted up, connect an ethernet cable, and be sure to run ‘sudo apt-get update && sudo apt-get upgrade’ to make sure the system is up to date.  At this point a reboot may be necessary, and once the system comes back up, we can start to install BOINC and Rosetta.  Run ‘sudo apt-get install boinc-client boinctui’ to bring in the BOINC packages.  If you are using a 2gb RAM version of the Pi 4, we need to override one setting to cross that 1.9gb threshold mentioned earlier.  If you have a 4gb RAM version of the Pi 4, you can skip this next item.  But, 2gb users, you will need to type ‘sudo nano /var/lib/boinc-client/global_prefs_override.xml’ and enter the following to increase the default memory available to Rosetta to the maximum amount of memory on the board:

<global_preferences>
   <ram_max_used_busy_pct>100.000000</ram_max_used_busy_pct>
   <ram_max_used_idle_pct>100.000000</ram_max_used_idle_pct>
   <cpu_usage_limit>100.000000</cpu_usage_limit>
</global_preferences>

 Press “Control-o” on the keyboard to save the file, and then press Enter to keep the file name the same.  Next, press “Control-x” to quit nano.

Next, using your desktop or laptop PC, head to http://boinc.bakerlab.org and create an account, and while there, be sure to join the “crunch-on-arm” team!  

Back on the Raspberry Pi, we can now run ‘boinctui’ from the command prompt, and a terminal GUI will load.  Press F9 on the keyboard, to bring down the menu choices.  Navigate to the right, to Projects.  Make sure Add Project is highlighted, and press Enter.  You will see the list of available projects to choose from, choose Rosetta, select “Existing User” and enter the credentials you created on the website a moment ago.  

It will take a moment, but, Rosetta will begin downloading the necessary files and then download some work units, and begin crunching data on your Raspberry Pi 4!

You can press ‘Q’ to quit boinctui and it will continue crunching in the background.

 

Nvidia Jetson

If you have an Nvidia Jetson Nano, you can actually follow the same directions outlined above directly on the Nvidia-provided version of Ubuntu.  To recap, these are the steps:

  • Open a Terminal, and run ‘sudo apt-get update && sudo apt-get upgrade’.  After that is complete, reboot.
  • Using your desktop or laptop PC, head to http://boinc.bakerlab.org and create an account, and join the “crunch-on-arm” team
  • Back on the Jetson Nano, run ‘sudo apt-get install boinc-client boinctui’
  • Run ‘boinctui’, press F9, navigate to Projects, Add Project, and choose Rosetta@Home.  Choose an Existing Account, enter your credentials, and wait for some work units to arrive!

 

Other Boards

If you have other single board computers that are 64-bit, and have 2gb of RAM, that run Armbian, the process is the same for those devices as well!  Examples of boards that could work include Rockchip RK3399 boards like the NanoPi M4 or T4, OrangePi 4, or RockPro64, Allwinner H5 boards like the Libre Computer Tritium H5 or NanoPi K1 Plus, or AmLogic boards like the Odroid C2, Odroid N2, or Libre Computer Le Potato.  Additionally, 96Boards offers high performance boards such as the HiKey960 and HiKey970, Qualcomm RB3, or Rock960 that all have excellent 64-bit Debian-based operating systems available.

For any of those, simply install the ‘boinc-client’ and ‘boinctui’ packages, and add the Rosetta project!

Of course, if you just so happen to have a spare Ampere eMAG, Marvell ThunderX or ThunderX2 laying around, those would work quite nicely as well.

Posted on Leave a comment

The Future of AI Servers

The Future of AI Servers

Following up on the recent announcement of our new Raspberry Pi 4 AI Servers, it seems that AI servers running on Arm processors are gaining more and more traction in the market due to their natural fit at the IoT and Edge layers of infrastructure.  Let’s take a quick look at some of the unique properties that make AI Servers running on Arm a great strategy for AI/ML and AIoT deployments, to help understand why this is so important for the future.

Power – Many IoT deployments do not have luxuries that “regular” servers enjoy such as reliable power and connectivity, or even ample power for that matter.  While Intel has spent decades making excellent, though power hungry processors, Arm has focused on efficiency and battery life, helping to explain they they dominate the market in tablets and smartphones.  This same efficiency is then leveraged by IoT devices running AI workloads, so Edge devices responsible for computer vision, image classification, object detection, deep learning, or other workloads can operate with a much lower thermal footprint than a comparable x86 device.

Size – Similar to the underlying reasons behind power efficiency, the physical size and dimensions of Arm AI Servers can be made smaller than the majority of x86 designs.  Attaching AI Accelerators such as the Gyrfalcon 2801 or 2803 via USB to boards as small as 2 inches square (such as the NanoPi Neo2) is possible, or the addition of a Google Coral TPU via the mini-PCIe slot on a NanoPi T4 bring an enormous amount of inferencing to AI Servers in tiny form factors. 

Cost – Here again, Arm SoC’s and Single Board Computers typically have a rather large cost advantage versus x86 embedded designs.  

Scalability – This is a critical factor in why Arm will play a massive role in the future of AI Servers, and why miniNodes has begun to offer our Raspberry Pi 4 AI Server.  As mentioned, low power, cheap devices make great endpoints, but, there is also a role for “medium” sized AI servers handling larger workloads, and Arm partners are just now starting to bring these products to market.  An example is the SolidRun Janux AI Server, which also makes use of the same Gyrfalcon AI Accelerators used by our nodes.  So, you can get started training your models, testing out your deployment pipeline, understanding the various AI frameworks and algorithms, and getting comfortable with the tools, and very easily scale up as your needs expand.  Of course, once you reach massive amounts of deep learning and AI/ML processing, enterprise Arm server options exist for that as well.

Flexibility – Taking Scalability one step further, the Arm AI servers also allow for a great amount of flexibility in the specific accelerator hardware (Gyrfalcon, Google Coral, Intel Movidius), the frameworks used (Caffe, PyTorch, TensorFlow, TinyML), and the models (ResNet, MobileNet, ShuffleNet) employed.  

Ubiquity – A final piece of the overall AI Server ecosystem is the ease of access to this type of hardware, and low barriers to entry.  The Raspberry Pi and similar types of boards are distributed globally, and readily available in nearly all markets.

As you can see, our view is that the future of AI servers is based on Arm SoC’s, and now is the time to start exploring what that means for your workload.

Posted on 3 Comments

Where to Buy an Arm Server

Being Arm enthusiasts and deeply embedded in the Arm Server ecosystem, one of the questions we get asked often is,

“Where can I buy an Arm Server?”

In the past, it was difficult to actually find Arm Server hardware available to individual end-users. Not long ago, the only way to gain access to Arm Servers was to have NDA’s with major OEM’s or having the right connections to get engineering-sample hardware. However, over the course of the past 2 to 3 years, more providers have entered the market and hardware is now readily available to end users and customers. Here are some of the easiest ways to buy an Arm Server, although this list is not exhaustive. These servers all have great performance and are well supported thanks to standards compliance and UEFI.

First up is the Marvell ThunderX, and newer ThunderX2. These chips are sold in servers from several vendors, which come in various shapes and sizes. Some of the examples we’ve found include the Avantek R-series in both 1U and 2U sizes, and the Gigabyte Arm offering that closely match Avantek’s specs. There are High Density designs, single processor and dual processor options, and 10 GBE as well as SFP options available.  ThunderX2’s have been more popular in HPC environments, but even a first-generation ThunderX is a great choice, and still a very powerful machine.  They can be purchased with up to 48-cores, or in dual-processor configurations then containing up to 96 cores.

Another option is the Ampere eMag Arm Server from a company that formed a few  years ago, Ampere Computing.  They ship a turnkey Arm Server that is sold by Lenovo, the HR330A or the HR350A.  Their first-generation platform has 32 Arm cores running at 3.0ghz, 42 lanes of PCIe bandwidth, and 1 TB of memory capacity, and their second-generation product, the Ampere Altra, has up to 80 Arm Neoverse N1 cores.  Current models are available for purchase from their website, or through Lenovo.

Finally, although it is marketed as a workstation, the Solid Run Honeycomb LX2 motherboard can quite easily be repurposed as a proper server.  With 16x A72 cores, support for 64gb of RAM, up to 40gb Ethernet, and PCIe expansion, it can definitely handle medium sized workloads.  It is standards-compliant, making it easy to install your OS of choice, and affordable, thus its a great option for getting started on Arm.

And of course, if buying physical servers and hosting them yourself, or placing them in a datacenter, is not feasible or cost effective in your situation, then our hosted Arm servers are a great option as well!  Our miniNodes Arm servers are certainly more modest in comparison to those mentioned above, but, they are a great way to get started with Arm development, testing existing code for compatibility, or lighter workloads that don’t require quite so much compute capability.

Be sure to check back often for all things Arm Server related!