The base reliability of a computing system is, at its most fundamental level, dependent upon the intrinsic failure rates of the components that comprise it. Very simply, highly reliable servers are built with highly reliable components. On Power Systems, this basic premise is augmented with a clear “design for reliability” architecture and methodology. The POWER6 reliability strategy evolves from, and improves upon, the reliability design points developed throughout the IBM POWER™ program.
At IBM, trained RAS engineers use a concentrated, systematic, architecture-based approach with the objective to improve overall server reliability with each successive generation of system offerings. At the core of this effort is an intensive focus on sensible, well-managed server design strategies that both stress high system instruction execution performance and require logic circuit implementations that operate consistently and reliably despite potentially wide disparities in manufacturing process variances and operating environments. Intensive critical circuit path modeling and simulation procedures are used to identify critical system timing dependencies, so that time-dependent system operations complete successfully under a wide variety of process tolerances.
This white paper provides an overview of the design points that contribute to a reliable POWER6 processor-based system.