Positive Momentum

Young adults my age become overwhelmed with instant success and don’t know how to get a good traction on long term success. The little wins need to be noticed so that you can achieve the big wins…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Deploy Machine Learning Models on IoT devices using ONNX

One runtime to rule them all

To follow along, it is expected that you have basic knowledge of Python applied to ML, access to Google Colab or an equivalent workspace, and a Raspberry Pi.

The classical flow to deploy ML models requires the use of the same framework for both training the model and making the predictions. In the beginning, this makes perfect sense, but it can get out of hand quickly, especially if any of the following conditions are met:

There are a few ways to approach these challenges, but one that is gaining traction in recent years is to decouple the training and inference environment. This is because the ML frameworks are jacks of all trades, we use them from training and optimization to deploying the models, but they are not the best tools for each of the tasks.

That’s where a tool designed for inference comes in handy. It can help us get the most out of our models, optimize the process and reduce the dependencies needed to execute them. With that in mind, the Open Neural Network Exchange (ONNX) is one such tools. Among its features we will get:

With those functionalities, it is possible to convert models from their original framework to the ONNX format or any other compatible framework.

The second piece is the ONNX runtime, which executes ML models on a variety of target devices without requiring changes in the model’s file. This functionality completes the ecosystem and enables the interoperability of the models in different target devices.

The standard pipeline for inference, with an image classification model, starts from reading the input image, then it has a pre-processing stage to prepare the input to the format that is required by the model, this may include: resizing, cropping, or normalizing the image. With the input ready, the model inference takes place, and depending on the output format, a post-processing stage may be required to have the output in a human-readable format.

Let’s start by cloning the repo:

Now that we have all the files needed to complete the task, let’s open the Main.ipynb notebook and run the first line. This will do a few things: first, download our starter model in PyTorch, execute the inference pipeline and finally save the model. If all works properly, you will get as result the Dog’s breed “Samoyed” and the path of the model in the following line:

Now we are ready to run the pipeline again, but with a key change. This time using the ONNX-Runtime instead of PyTorch. To load the ONNX model we need to create an inference session, which requires two parameters: the path to the model, and the Execution provider (EP). The EP allows the runtime to optimize the model inference to the platform. For now, we will use the ‘CPUExecutionProvider’.

With this, we validated that our model works properly, but there is still a problem. In the preprocessing stage, we are using the transform function from PyTorch, which means that at this point we still need to install PyTorch in order to run the pipeline.

To eliminate the PyTorch dependency, we must rewrite the pre-processing function, for that we will use NumPy:

And now we are done, at this point, we finally execute our model with no PyTorch dependency.

Let’s run the script:

As expected we got the same result for all three cases, PyTorch, ONNX + PyTorch, and ONNX only, now let's see what the differences are.

For all the cases we got the same prediction from the model: Samoyed, which is the right breed in the input image, but are there any differences between the two models?

In terms of performance, we get a more interesting result, where the inference time is significantly lower when we use ONNX, 129ms vs 41ms when the inference is made in CPU. To replicate the inference using a GPU first we need to change the Colab runtime to GPU:

For the code in PyTorch, no changes are required, and for the ONNX code the only change needed is to add the CUDA EP:

Once the code is executed for both frameworks, we get the following inference times: 10ms for PyTorch and 6ms for ONNX. While the difference is smaller in this case, we are still getting almost twice the speed in ONNX compared to PyTorch.

So far, we have been using ONNX on Google Colab to prepare and test the pipeline, and now we are ready for the last step. What differences can we expect when deploying to an IoT device?

In contrast with the Colab setup where most of the libraries were available from the beginning, in this case, we will need to install all the dependencies in our Raspberry Pi. Hopefully, since it is an inference environment only there are not many.

To keep things clean, you can clone the repo in your device and look for the file requirements.txt, which has all the libraries we need to execute the model, as you can see, there are only four.

To install the libraries on your Raspberry Pi just run the code:

And that’s it, we have an inference environment with our model up and running on an IoT-compatible device. This is one of the main advantages of using a standard runtime, porting the models across platforms it’s much easier this way, and the trade-off is in some extra work at the beginning since we have to export the model, and in some cases modify parts of the pipeline as we did with the pre-processing stage.

So far we cover one use case of the ONNX ecosystem, converting a PyTorch model to ONNX and deploying it on a Raspberry Pi, but that’s only the tip of the iceberg. What happens if we start from a different framework or want to use another target device?

In this article, we describe the challenges of deploying ML models when several frameworks are used, or when we have performance constraints and presented the ONNX ecosystem as an alternative to approach these challenges. Then we took a hands-on approach with a computer vision where we:

And finally, look into some recommendations for further work e.g. to deploy models on other devices or using different EP to optimize the inference time.

Add a comment

Related posts:

How Quantum Computers Work and What They Mean for the Future

Quantum is hard. That is perhaps to put things lightly — the subject is often bizarre, defies common sense, and predicts all sorts of things that should, by rights, be utterly impossible. Computers…

five reasons why cold calling is crucial for achieving sales success.

Picking up the phone and dialing are the simplest things you can do. For seasoned sales development personnel, it comes naturally. You don’t need to devote hours or even days to create a single email…

My dream trip

I am very happy to share that moment,last month me and my family and my close friends are going Kodaikaanal. we are going to many places then we saw different colours of flowers. Then i saw the falls…