Skip to main content

IBM Server Sets World Record for Sorting Data


Select a topic or year


Poughkeepsie, N.Y - 21 Jul 1999: . . . In just 17 minutes, the time it takes most people to balance their checkbooks, researchers from IBM have set a world record by sorting one trillion bytes of data.

In business terms, this would be the rough equivalent of listing in numerical order the tracking numbers for 10 billion overnight packages. Working at the rate of 15 numbers a minute, a person with pencil and paper would have to write non stop for 13 centuries just to make a list of 10 billion tracking numbers, let alone put them in order.

"The breakthrough techniques IBM used to set this record have already been put to work solving many types of deep computing problems in both science and business," said David V. Gelardi, director of benchmarking and applications performance. "In any problem involving large amounts of data, rapid data sorting is essential to extracting critical information."

Today's announcement is the latest example of IBM's commitment to providing its e-business customers in a variety of industries the ability to speed processing and management of valuable data.

IBM's world record was set using the parallel processing capabilities of an IBM RS/6000 SP, the same computer server behind the "Deep Blue" chess match with Garry Kasparov in 1997. That's about one-third the time of the previous record of 50 minutes set November 10, 1998, by scientists at Sandia National Laboratories.

The techniques developed to set the record are already being used to boost the performance of business intelligence software that extracts valuable information from large stores of data. Although not required by the benchmark ground rules, the IBM sort also left the sorted data in a very useful format for typical customer applications.

The sorting benchmark tests the efficiency of a computer's input-output (I/O) management and internode communication rather than pure calculating power.

The key to the RS/6000 SP system's performance is the use of IBM's General Parallel File System (GPFS), which permits any of the SP processors to have high-speed direct access data on any of the attached disks. One of the ideas behind developing a parallel file system for the SP is a concept that drives all parallel implementations: spread workload across many nodes in order to scale up the amount of work you can do, while delivering excellent performance.

The record was set on a commercially available 488-node RS/6000 SP with six trillion bytes of attached disk storage at the IBM RS/6000 SP Teraplex Integration Center in Poughkeepsie, N.Y. The Teraplex Integration Center allows existing IBM customers, potential new customers, business partners, and IBM hardware and software developers to stress test hardware and software. Scalability and functionality issues can be identified and resolved in a controlled environment before business intelligence systems go into operation on site. The system consisted of 604e 332 MHz processors, configured with 432 SP nodes dedicated for processing and 56 nodes for I/O.

As of this month, IBM has shipped more than 6,100 RS/6000 SP systems in the six years since they were introduced.

# # #

Additional information: IBM RS/6000: http://www.ibm.com/servers or http://www.rs6000.ibm.com

IBM, SP and RS/6000 are registered trademarks or trademarks of the IBM corporation in the United States, other countries, or both. UNIX is a registered trademark in the United States and/or other countries licensed exclusively through X/Open Company Limited. Other company, product and service names, which may be denoted by a double asterisk (**) may be trademarks or service marks of others.

Contact(s) information

Jeff Gluck
IBM
914-766-3839
jgluck@us.ibm.com