
Tab navigation
- Overview
- Dare to compare - selected tab,
- Migration path
The IBM Power servers implement reliability, availability and serviceability features inspired by more than 40 years of mainframe experience in running mission-critical applications. POWER7® processor-based servers support:
- Reliability in hardware is all about how often a hardware fault requires a system to be serviced – the less frequent the failures, the greater the reliability.
- Availability is how infrequently such a failure impacts the operation of the system or application. A highly available system design insures that most hardware failures will not result in an application outage.
- Serviceability concerns itself with identifying what fails and ensuring efficient repair.
POWER7® processor-based servers support these RAS features:
- First Failure Data Capture
- Processor Instruction Retry
- Alternate Processor Recovery
- Bit Steering
- I/O EEH
- FW isolated partitions
- Live Partition Mobility and Application Mobility
No competitive UNIX® or Linux® server has a majority of these features. For details, please see the POWER7 System RAS white paper.
More on processor RAS features
If a processor failure occurs, the POWER based servers are designed to recover in many instances. Processor Instruction Retry is designed to retry the instruction in case of a soft failure. If the failure persists, Alternate Processor Recovery is designed to retry the instruction on a different processor. If the system either cannot recover or if a threshold of soft failures on a processor has been reached, then the system is designed to take that processor out of service using Dynamic Processor Deallocation. If there are spare processors in the system, either because they are not allocated to a partition or because they are inactive CoD processors, then the system is designed to automatically replace the processor it deallocated with a spare processor using Dynamic Processor Sparing.
SPARC64 based servers from Sun have an instruction retry, but none have Alternate Processor Recovery. Integrity and Xeon have neither of those features. Both the SPARC and Integrity support dynamic sparing, but because of the granularity of the hardware isolation, the system board or the cell board with the spare processor have to be in the partition.
More on memory RAS features
POWER servers use Redundant Bit Steering which uses spare memory chips to replace failing bits. These spare chips are part of the standard memory system design. The Integrity replaces failing pages with spare memory. Clients must purchase adequate spare memory for this feature to work. SPARC uses optional mirroring. Customer must purchase two times the memory they need for production to use this feature.
For the highest level of application availability and recovery, mechanisms allowing failing applications to be quickly recovered are typically accomplished through clustering solutions that allow for complete redundancy in hardware systems and often software stacks, as well as fault monitoring so that automated restart of applications can be invoked on system failure.
When the availability afforded by full redundancy is required, IBM and third party software vendors provide a number of high-availability clustering solutions such as IBM PowerHA™ SystemMirror™ which provides many options for partition failover and IBM DB2® Purescale® which enables high availability for the IBM DB2 database. These products and additional features such as PowerVM Live Partition Mobility and AIX Live Application Mobility are designed to help eliminate downtime – planned and unplanned. If you need to take a system down for reconfiguration, firmware updates or another reason, you will have the option of moving your applications to a different server without any impact to production operation. No reboots, no restarts, no service interruption — just continued outstanding service to your users.
More on Application/Partition RAS features
Live Application Mobility is a feature of AIX® that allows applications running in Workload Partitions (WPARs) to be moved from a WPAR to another WPAR without interrupting production operation. The target WPAR can be a different server. Live Partition Mobility is a feature of PowerVM™ Enterprise Edition that allows entire partitions including all of the applications running the operating environment of the partition to be moved from one server to another without interrupting production environment. Both of these features allow you to avoid planned outages when upgrades or maintenance is required.
Partition Availability priority is a feature of PowerVM that automatically shifts processor resource from low priority partitions to high priority partitions in case where the total processor resource has dropped below the minimum required for all partitions because of a dynamic processor deallocation. This shift is based on priorities set during the partition definition.
The POWER based servers use firmware to provide isolation between partitions. The firmware is designed to prevent a failure in an action or failure in a partition from impacting the operation of other partitions. The SPARC and Integrity both support hardware isolated partitions on some models. However these partitions are limited to a granularity of a system board in the case of the SPARC and a Cell in the case of the Integrity. In either case this is a minimum granularity of four processors.
All statements regarding IBM future directions and intent are subject to change or withdrawal without notice and represent goals and objectives only. Any reliance on these Statements of General Direction is at the relying party's sole risk and will not create liability or obligation for IBM.
