Azure Stream Analytics is a high volume, simple to use stream processing service. It’s totally stand-alone and does not require any additional software or program environment to run. It can be entirely configured in the Azure Management Portal.
One of the touted use cases for this service is IoT message processing. Obviously, real life is much more complicated, and here we will examine the pros and cons of using Azure Stream Analytics in potential IoT applications.
Azure Stream Analytics lets you connect to the event hub, transform data as it comes in, and save it to some sort of DB. The transformations are done in a SQL-like language (good for filtering, group by, etc.) called Stream Analytics Query Language.
In addition to standard SQL queries you can use, there is SAQL which also adds support for time window aggregations, allowing you to group the stream into 5 minute chunks, for example. This functionality is of course most helpful in IOT applications where you can get readings every few seconds but only care about some of that data, filtered or aggregated.
Let’s start with the positives:
- Easy to use: It takes mere minutes to set up the Azure Stream Analytics, and about a minute to start/stop it. Writing the internal SAQL query can take some time, but it’s very short and intuitive compared to writing actual code. You are only responsible for the business logic, which is what you want.
- Easy to scale: Microsoft uses a vague unit to measure the power of a stream analytics job. This is a known as a Streaming Unit – which is a “Blended measure of CPU, memory, throughput”. However, they translate that to about 1 MB of data per second. Adding more Streaming Units to a job is as simple as changing the number from 1 to 5 to 10 to even higher numbers. The only caveat is that it cannot auto scale.
- Good integration within Azure: Azure Stream Analytics can connect to Azure Event Hub/IOT Hub as input for the stream analytics, and can output to a variety of azure services (Azure SQL, Azure DocumentDB, Azure EventHub, Azure ServiceBus, etc.). If these services are in the same Azure account, they can be chosen easily from a drop down menu.
- Cheap: The cost of running the Azure Stream Analytics is 3 cents per hour per streaming unit. Microsoft claims that 1 streaming unit can handle a MB of data per second, which is roughly 22 dollars a month. Having a VM/Cluster that runs a different stream processing solution would be much more costly.
Unfortunately, it’s not all positive, and there are some limitations and problems with using Azure Stream Analytics.
Azure Stream Analytics is strictly a stream solution, however, when you compare what it can do versus solutions like Spark Streaming or Apache Storm, you can see that it is much more limited.
- Unable to join dynamic data: Azure Stream Analytics gives you an option to join the data against a file in blob storage, which they call Reference Data. Theoretically, this is their solution for joining extra data. For example, you can get just the readings in the event hub and then join them against the devices to add device name or any other device data. However, this file will be loaded once for the duration of the job lifetime. In an IOT solution, you will be adding and removing devices dynamically. This file will not be updated in the Azure Stream Analytics job. This is a big problem when you need to add data from external providers.
- Unable to save state: Azure Stream Analytics gives you the option to aggregate your data into windows based on time. However, sometimes you want to keep a relative state regardless of how much time has passed. For example, spike detection or just to know the maximum value all time. Azure Stream Analytics does not have a place to store this kind of state.
- Limits of SQL: SAQL is based on SQL, which is well known and simple to use. However, it is also a limited language. It is good for querying but does not have the openness of a full programming language.
- Coupled to Azure and Microsoft: Azure Stream Analytics is a pure Microsoft service, you will only use it if you are already using other Azure products. You cannot take this code out of Azure and reuse it somewhere else, because it is a proprietary language and solution. If you created a cluster on VMs, you could then move them to a different cloud provider with some work.
- Will crash on invalid data: One of the biggest quirks with Azure Stream Analytics is that if the data is misformatted, or if there is a type mismatch, it can cause the entire job to crash. Since in IOT applications you can be getting data from IOT sources that you might not be able to control, and since the IOT sources are connected directly to the event hub, you will need to either sanitize the data with a separate Azure Stream Analytics job or hope you can do it in the same job and that it doesn’t crash unexpectedly. This lack of recovery is a huge problem for relying on Azure Stream Analytics.
In summary, Azure Stream Analytics is an easy to use but very limited tool. It does the job it sets out to do well, but most real world IoT applications need much stronger capabilities than what Azure Stream Analytics currently provides.
Coupled with the issues involved in managing it (crashes/logs/source control), our findings are that it is only useful in a very limited sense for very specific use cases. For example, a great use would be just to save all the data from the event hub to blob storage. For analytics and more advanced business logic, it will probably not suit your needs.