Tolga Talks Tech is a weekly video series in which Onica’s CTO Tolga Tarhan tackles technical topics related to AWS and cloud computing. This week, Tolga discusses real time and batch data processing for IoT Data with Onica’s VP of Engineering, Amir Kashani. For more videos in this series, click here.
What are some considerations when processing IoT data and analytics?
Since IoT is focused around getting data, the different scenarios that arise revolve around how to react to that data. You can do this as the data is coming in, or you can save the data and look at it with deeper analysis later. These scenarios are known as real time analytics and batch analytics.
What are some examples of real time analytics and batch analytics?
Real time use cases are pretty simple. Imagine you have tanks deployed around a facility, and these tanks are depleted on a day-to-day basis. Real time analytics would be able to notify you once a tank gets to a certain level that you need to send out someone to fill it. With batch analytics, the scenario may be that you want to understand over how these tanks are utilized over time to make some staffing decision around how many people you need to meet the fill tanks, or even if you may need additional tanks or bigger tanks.
Could those be either algorithmic or AI based decisions?
Yes. You may be looking at simple trends and making decisions through simple algorithms, or there might be a lot of factors and other variables to consider that you may need to feed into an ML model to offer further insights.
What tools in AWS can be used for real time use cases?
All IoT analytics start with the AWS IoT Core, which is your message broker. From there, data is routed to Amazon Kinesis, which offers a buffer to take that data and either invoke AWS Lambdas or other data consumers to react. So you’re basically looking at the data as it comes in and make decisions based on that.
Do batch workloads use different tools?
With batch workloads data lakes come into play. You, again, start with AWS IoT Core but you’ll route to Amazon Kinesis Firehose instead, and then write the data to Amazon S3. Then you’ll have something like a Spark Job, Amazon EMR, or Amazon SageMaker to look at the data in batch and build some models and analytics out of it.
Do you see these types of data used together often or do they stay separate?
These data types are almost always used together. It’s pretty typical for the batch data to feed in the real time data. So a common use case is that you’ll have some batch data, you’ll run analysis on it and build some models, and as you get real time data, you’ll invoke those models from the batch data to do inference and decide how to react.