IBM Systems

Your choice of IT infrastructure matters

Get in touch to get started

Stream processing

Turn your growing data volume into actionable information and business insight
Big data solutions have tremendous momentum, and they're rapidly gaining more. Gartner says big data spending will more than double worldwide between 2012 and 2016, up from $27 billion to $55 billion. The reason is clear: with big data comes the opportunity to create positive change. Already, we are generating an estimated 2.5 quintillion bytes of new information every day — a tremendous resource just waiting to be tapped. The challenge, however, is that leveraging such huge data volumes for the highest value means analyzing them quickly. Ideally, that happens in real time, as soon as the data becomes available while following best practices to maintain data quality and auditability. This is what stream processing is all about: aggregating and analyzing huge data volumes very quickly, so that the business insights they yield can lead to actual strategies, and those strategies can be executed to gain the greatest competitive advantage.

Image: Stream Processing

Stream processing can empower organizations to get insights from data far more quickly, and in far more ways, than ever before — even in cases where insights are needed in real time, or very close to it.

Nagui Halim, IBM Fellow, Director and Chief Architect of Big Data

What is stream processing?

To automate and incorporate streaming data into the decision-making process, you can use a new paradigm in programming called stream processing. Stream processing supports in harnessing the potential of data in motion. In traditional computing, you access relatively static information to answer evolving and dynamic analytic questions. With stream processing, you can deploy an application that continuously applies analysis to an ever-changing stream of data before it ever lands on disk.

The best stream processing systems have been built with a data centric model that works with traditional structured data as well as unstructured applications with video, image, and digital signal processing. Stream processing is especially suitable for applications that exhibit three application characteristics: compute intensity (high ratio of operations to I/O), data parallelism allowing for parallel processing, and ability to apply data pipelining where data is continuously fed from producers to downstream consumers. As the number of intelligent devices gathering and generating data has rapidly increased and we add data that we generate using all the personal devices developed in the last decade, the volume of data sent to servers at high speed has exploded. Furthermore, organizations need to make more timely decisions than ever before. We want to analyze that data as it arrives from monitors and equipment (measurements and events) plus text, voice transmissions, and video feeds – this leads to the need for stream processing.

You can get Nagui Halim's expert opinion on Stream Processing by reading this post on the Smarter Computing Blog.

Why is stream processing important?

The volume of data available to organizations is expected to escalate tremendously, and the opportunity for smart stream processing solutions to create new value will grow in proportion. The faster all of this data can be analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. To make that concept a reality, however, requires stream processing solutions that are not just efficient and accurate, but also versatile. Consider how many forms of data streams exist in the world today, from social media feeds to stock market reports to call data records to video streamed by airport cameras and assessed for security purposes. These types of diverse data cannot be easily stored because of their volume, and accurate analysis is no simple matter. And in cases where the data can be stored, by the time it is analyzed, any insights gained will be in the past, often discovered too late to be effective. With stream processing, organizations can react to events as they are happening, enabling them to store less, analyze more, and make better decisions, faster. With stream processing technology, you can continuously analyze massive volumes of your data in memory to take action in real-time.

IBM’s stream processing capability can benefit any industry wrestling with the challenge of processing the daily flood of data – healthcare, telecommunications, utility companies, municipal transit, law enforcement and more. For example, a hospital in Toronto, Canada uses IBM InfoSphere Streams solution to monitor premature babies in a neonatal intensive care unit to help predict the onset of illness. Remote telemetry from a US hospital has been operational for a year using the same analytic routines. And more hospitals in China and Australia recently began implementing this solution.

How can IBM help you benefit from stream processing?

As a pioneer and world leader in big data analytics solutions, IBM is exceptionally well positioned to help organizations take full advantage of stream processing to improve business insights and decision making. IBM’s robust stream processing solution portfolio enables organizations to create a tailored big data architecture that yields better insights, faster, and less effort — essentially, a formula for enhanced business agility. One of the leading elements of IBM portfolio in stream processing is InfoSphere Streams, a versatile, high-performance and cost effective solution that manage and analyze massive volume, variety and velocity of data that consumers and businesses create every day. This exceptionally scalable offering can analyze up to petabytes of data in a single day — empowering an incredibly fast, sub-millisecond response in environments where millions of decisions can be made every second. And because this solution is so versatile, businesses can perform in-motion analytics on a wide variety of structured and unstructured data types at unprecedented volumes and speeds — enabling real-time analytic processing. IBM InfoSphere Streams takes a fundamentally different approach to Big Data analytics and differentiates itself with its distributed runtime platform, programming model, and tools for developing and debugging analytic applications that have a high volume and variety of data types. Using in-memory techniques and analyzing record by record enables high velocity. Built in analytic toolkits for geospatial, time series, and complex event analysis enable rapid application construction.

A related breakthrough from IBM is InfoSphere BigInsights that brings the power of Hadoop to the enterprise. BigInsights is enterprise-class big data solution designed to augment the IT infrastructures and data repositories already in place in enterprise-class organizations. As a pre-processing hub, it helps them explore incoming data, determining which data is most valuable, and quickly leverage it for best value. It also complements existing data warehouses as an archive that can easily be queried, supporting particularly rapid analysis in ways that won't strain the architecture, or for ad hoc analysis of all data in all repositories.

IBM PowerLinux supports InfoSphere Streams and InfoSphere BigInsights using Apache Hadoop to provide next-generation analytics capabilities for the increasingly popular Linux platform. These offerings build on proven POWER7+ processing technology to scale workloads both efficiently and cost-effectively. They support analysis of both structured information (databases) and unstructured information (typically, text documents, log files, and similar items), making it easier for organizations to capture, manage, and analyze information to drive better decision making. And because these solutions integrate with InfoSphere Streams and InfoSphere BigInsights, they can be used to analyze both data in motion and data at rest.

IBM PureData System for Hadoop offers a variety of capabilities to help users address their big data requirements with built-in analytics and enterprise functionality, on top of Hadoop technology.

To understand deeply how IBM Smarter Computing can help you fulfill your specific needs, explore all the Resources for stream processing on this page.

Meet the stream processing expert

  • Image: Expert - Halim

    Nagui Halim

    IBM Fellow & Director of InfoSphere Streams

About Nagui

Nagui Halim is an IBM Fellow and currently director of InfoSphere Streams in Software Group's Information Management organization. Nagui has spent most of his 30-year career with IBM in the Research Division in a series of positions, from software engineer, research staff member, and manager to senior manager, department group manager, and director. His technical areas of expertise include systems software, operating systems, transaction processing, fault-tolerant computing, distributed systems, programming languages, computer communications, and computer architecture.

 

On the Smarter Computing Blog: Stream processing

  • Image: Thumb - Breakthrough Benefits

    Stream processing: analyze data in real time for a more agile response

Resources for stream processing

Attend

Image: SC Virtual Event Center