Posted on Leave a comment

AI Server And Hosted Jetson Use Cases

Let’s continue once again with our look at Edge AI Servers and why it makes sense to use Nvidia Jetson Nano’s and Xavier’s as small, cost-effective machine learning endpoints.

As many startups and developers already know, AI Servers from the major cloud vendors are very expensive.  Here is a quick sample of prices, at the time of this writing (July 2021), on for a few different EC2 platforms:

TypeHourly PriceCPURAMStorage
p3.2xlarge$3.06861 GiBEBS Only
g4dn.xlarge$0.53416 GiB125 GB NVMe SSD
g4ad.8xlarge$1.7332128 GiB1200 GB NVMe SSD
p3.8xlarge$12.2432244 GiBEBS Only
g3s.xlarge$0.75430.5 GiBEBS Only

P3 Instances include V100 Tensor Core GPUs, G4dn include T4 GPUs, and G4ad include Radeon Pro V520 GPUs.

Over on the Azure site, here are just a pair of the current pricing and options:

SizeHourly PricevCPUMemory: GiBTemp storage (SSD) GiBGPUGPU memory: GiB

You can save some money if you commit to a 1-year or 3-year term, but the regular “On-Demand” prices start at $374.40 per month on AWS or $460 per month on Azure for the smallest server, and go up dramatically from there.  The p3.8xlarge for example, is currently nearly $9,000 per month.  Many others are in the $3,000 to $5,000 per month range.  

If your use case involves training a model or building an algorithm and then deploying it to Edge AI devices, the large amount of power provided by one of those machines could certainly make sense as the first step in the Edge AI deployment process.  Using the Nvidia Transfer Learning Toolkit as one example, the workflow consists of leveraging existing publicly available models, adding your own custom classes and dataset, training it on the big hardware such as V100 and T4 GPU’s, and then deploying the output and resulting model onto smaller devices such as Jetsons.  Testing your TLT output on a hosted Jetson, prior to deploying to all the devices out in the field, would be wise.  Thus, a hosted Nano or Xavier that is built into your overall CI/CD pipeline, to QA and ensure functionality prior to deployment to the whole fleet, is a great use case.

A second very important use-case for hosted Jetson devices is for learning, experimentation, and development purposes.  As we can see above, the cheapest EC2 instance is about $375 per month.  If you are just get started learning how to build and develop AI applications, exploring the DeepStream or CUDA ecosystem, or looking to work through various online learning courses on the topic, $375 probably doesn’t make much sense…but a smaller AI Server could be just right for the task.  Hosted Jetson Nano’s can absolutely be used to learn the fundamentals of computer vision, deep learning, neural networks, and other machine learning topics.  They are also good for learning about containers and pulling down images from the Nvidia NGC Catalog, such as PyTorch, TensorFlow, Rapids, and more.  Those types of smaller tasks, Hello AI World projects, and basic Getting Started projects are far more cost effective to run on Jetsons than on the big GPU servers costing 10x (at a minimum).

Posted on Leave a comment

AI Server Workflow, Nvidia Transfer Learning Toolkit Example

Continuing our series on Edge AI Servers and the rapid transition underway as developers migrate their workloads from traditional datacenters / clouds to smaller, distributed workloads running closer to users, let’s investigate a specific Nvidia AI Server workflow where a hosted Jetson Nano or Jetson Xavier NX could make sense.

At GTC in May 2021, Nvidia launched the Transfer Learning Toolkit (TLT) 3.0, which is designed to help users build customized AI models quickly and easily.  The process is rather straightforward:  Instead of creating and training a model from scratch, which is very time consuming, you instead take a pre-trained model such as PeopleNet, FaceDetectIR, ResNet, MobileNet, etc, add in your custom object (image for vision applications, sound for audio or language models), re-train with the added content, and then you can leverage the resulting output model for inferencing on smaller, edge devices.

The Transfer Learning Toolkit is part of the larger TAO (Train, Adapt, Optimize) platform, and is intended to be run on big hardware.  Their requirements state:

  • 32 GB system RAM
  • 32 GB of GPU RAM
  • 8 core CPU
  • 100 GB of SSD space
  • TLT is supported on A100, V100 and RTX 30×0 GPUs. 

However, the end result is a model that can run on a much smaller device like a Jetson Nano, TX2, or Xavier NX.

Looking closer at the TLT Quick Start documentation, installation begins with setting up your workstation, or launching a cloud VM like an AWS EC2 P3 or G4 instance which have Volta or Turing GPUs.  There are a series of prerequisites to install, a TLT container that is downloaded from the Nvidia Container Registry, and once your python environment is setup, you launch a Jupiter notebook that will help guide you through the rest.  

There is also a sample Computer Vision Jupyter notebook that can get you up and running quickly, located here:

Once you have the Transfer Learning Toolkit workflow established, you can begin testing the output and resulting models on Jetson devices.  This is where a hosted Jetson Nano functioning as an AI Server might make sense, as you could simply some automation or CI/CD workflow for training with TLT and testing the results on the Jetson.  Then, if everything passes, results are highly accurate, detection, segmentation, etc are all performing well, then you could deploy to your production devices in the field.

This is of course only one example of the value of hosted AI servers, and we’ll continue looking at more use cases in the near future! 

Posted on Leave a comment

The Edge AI Server Revolution (Driven by Arm, Of Course)

The past 2 years have seen rapid growth in experimentation and ultimately deployment and adoption of AI/ML at the Edge. This has been fueled by dramatic increases in on-device AI processing capability, and equally dramatic reduction in size and power requirements of devices. The Nvidia Jetson Nano with its GPU and CUDA cores, Google Coral Dev Board containing a TPU for Tensorflow acceleration, and even Microcontrollers running TinyML have quickly gained widespread adoption among developers. These devices are cheap, accurate, and readily available, allowing developers to deploy AI/ML workloads to places that were not practical just a short time ago.


This allows developers to re-think their applications, and begin to migrate AI workloads out of the datacenter, which was the only place to run their AI/ML tasks previously, potentially saving money or improving performance my moving processing closer to where it is needed. This also allows for net-new capability, adding computer vision, object detection, pose estimation, etc, in places that previously were not possible.

In order to help prepare developers and allow them to experiment and build their skills, miniNodes is making available some Edge AI inspired Arm Servers, starting with the Nvidia Jetson Nano. These nodes are intended to be used by engineers and teams just getting started on their Edge AI journey, who are testing their applications and deep learning algorithms. Another use for an Edge AI Arm Server is for light-duty AI processing, where it doesn’t make financial sense to rent big AI servers from the likes of AWS or Azure, and instead a smaller device will work just fine. Finally, developers and teams that do AI training that is not time-sensitive, or relatively small, can achieve significant savings by using a hosted Jetson Nano for their model training, instead of local GPU’s or AWS resources.

Whether you are just getting starting and beginning to explore Edge AI, or have been following the trend and already have Edge AI projects underway, a miniNodes hosted Jetson Nano is a great way to gain hosted AI processing capability or reduce AWS and Azure cloud AI costs.