Apache Hadoop vs. IBM Platform Symphony and Infosphere BigInsights: see our breakthrough Hadoop performance
IBM has completed several big data benchmarks of significance employing IBM Platform Symphony and various Hadoop distributions including IBM Infosphere BigInsights. Platform Symphony is a distributed computing and big data analytics product widely used in large scale grid computing environments. IBM InfoSphere BigInsights brings the power of Hadoop to the enterprise. Organizations using the two products together get the benefit of a multi-tenant, heterogeneous application cluster with higher utilization and performance. Using InfoSphere BigInsights you can gain new insights from a combination of data sources and overcome the high costs of converting unstructured data sources to a structured format.
Benchmarks that IBM has run include:
Get the Platform Symphony Advantage
While a major benefit of IBM Platform Symphony is its ability to support diverse applications in a multi-tenant environment while ensuring service levels, these performance tests show that IBM Platform Symphony also helps provide dramatically better performance and efficiency, as well as superior management and monitoring.
Clients using this technology in conjunction with Infosphere BigInsights can get a fully supported high performance Hadoop stack with ease of use, higher productivity with built-in accelerators, and management tools.
These results are important not only because they demonstrate faster MapReduce job execution times, but because they show that organizations running Hadoop workloads can save a significant amount of money on computing infrastructure by using IBM Platform Symphony.
Running IBM InfoSphere BigInsights on a private cloud environment managed by IBM Platform Symphony in August of 2012, IBM demonstrated a 100 TB terasort result on a cluster comprised of 1,000 virtual machines, 200 physical nodes and 2,400 processing cores. Running the industry standard Terasort benchmark in this private cloud, IBM beat a prior world-record1 using 17 times fewer servers and 12 times fewer total processing cores. This result showed not only that it is straightforward to build a large-scale Hadoop environment using IBM's cloud-based solutions, but that big data workloads with IBM BigInsights can be run more economically using IBM Platform Symphony, providing dramatic savings related to infrastructure, power and facilities.
Contrail-bio Genome Sequencing Benchmark
Contrail is an open-source software effort that leverages Hadoop MapReduce to accelerate de novo genome assembly. During March of 2013, IBM conducted a series of tests to understand the performance advantage that Symphony could offer on a reference Hadoop cluster running a 10K read sample of an e-coli bacterium included as part of the Contrail software suite. In an eight-node Hadoop cluster with 108 cores dedicated to Map and Reduce tasks, Platform Symphony was found to compute results 3.4 times faster than Hadoop alone, reducing the job run-time from 873 seconds to 258 seconds on the same cluster and dataset. Get the results
Equally compelling are results obtained using social media workloads. The SWIM benchmark developed at University of California, Berkeley with co-operation from Facebook, measures real-world MapReduce workloads by simulating traces of application activity captured at Facebook in 2009 and 2010. The benchmark authors view it as a rigorous predictor of MapReduce performance2.
Using this benchmark, IBM demonstrated, in results audited by an independent testing organization, that by augmenting a Hadoop cluster with BigInsights powered by Platform Symphony technology, the simulated Facebook workloads were speeded up by a mean of 3.8X compared with running on Apache Hadoop alone. As a corollary, given the nature of the SWIM benchmark, this result demonstrated that equivalent performance with Symphony could have been obtained with dramatically less hardware and lower infrastructure cost.
The Hadoop “sleep” benchmark shared at Hadoop World in 20114 was run to demonstrate the relative scheduling efficiency of IBM Platform Symphony to competing Hadoop distributions. This was an audited result published by a third party. Running this standard test, which is promoted as a measure of scheduling efficiency, IBM InfoSphere BigInsights powered by Platform Symphony technology achieved a speedup of just under 11X for a Hadoop 1.1.2 sleep test result comprised of 5000 x 1 msec map tasks.
IBM Platform Symphony brings many advantages to distributed computing environments including multi-tenancy, guaranteed service levels, superior management tools, and support for diverse, heterogeneous workloads. These benchmark results demonstrate that IBM Platform Symphony can provide dramatic performance advantages and financial savings to customers deploying big data environments. For IBM InfoSphere BigInsights users, or those considering open-source or derivative Hadoop environments, IBM Platform Symphony can help accelerate Hadoop workloads while reducing cost and improving workload reliability.