What is Big Data?

Onica

Data Analytics
November 17, 2016

Amazon Web Services helps you quickly and easily build and deploy big data analytics applications giving you access to flexible and low cost IT resources, so you can rapidly scale any big data application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing. Big data is the practice of analyzing large amounts of data that may come at you very fast in different formats. We’ve had Big Data for a long time, actually in the form of data warehouses. In today’s modern architecture, you have a lot of tools in your arsenal, things like Hadoop, Spark, and so forth enabling us to analyze things that are coming at us in gigabytes and terabytes per day in different formats from different data sources. Watch our video below to learn more about big data on AWS.

How does a Big Data Architect interface and provide value for data scientists?
A data scientist needs to have data available to them in real-time dashboards and BI tools and notebooks, and so forth. Big Data architects will architect their notebooks, spaces, and pipelines to get that data as fast as they need it, in a way that’s reliable and useful for them.

Why is architecting for perfection with Big Data important?
Architecting for perfection is important because these days you have dozens, maybe even hundreds of different data sources that just keep on growing with your business. There are also many different data sources – third party market research, relational databases, surveys, even manual entry stuff, that all need to get into a central Data Lake so data scientists can access that and start extracting business insights to make their organizations more data driven.

How has cloud computing affected Big Data processing, especially for small companies that now have access to tools that would have cost millions of dollars even just five years ago?
For small to medium enterprises, the cloud brings a lot of new features that were not accessible to them before. For example, with Amazon S3, you have virtually unlimited storage with very high durability allowing you to easily create a Data Lake. Whereas normally, processing gigabytes and terabytes of data would have cost you thousands of dollars to create, acquire, and set up the hardware. Now you can have a Data Lake at your fingertips in a matter of seconds or minutes. Even with Big Data processing through Hadoop and Spark where you need commodity hardware to do on-premises, now in the cloud you can spin up a Hadoop or Spark cluster in minutes.

Want to learn more about Big Data on AWS?
Contact us to learn more about enabling your business with AWS big data services. Or watch the follow up to this video blog – Architecting for Big Data Processing on AWS.