Skip to main content

 
IBM Systems  > System p  > Hardware  > 

Power Architecture

A High-Performance Architecture with a History

  
Introduction
Broadening the Application Market
PowerPC Architecture
The Slimming of POWER Architecture
Completing PowerPC Architecture
Summary


The Slimming of POWER Architecture 
In addition to excluding some POWER instructions, the PowerPC Architecture also removed or relaxed some noncritical requirements. PowerPC also gained new instructions and features to complete the architecture and to achieve the goals set by the architecture group.

The architecture group identified the features of the POWER Architecture that were too restrictive, too burdensome, or not cost-effective. This resulted in the exclusion of 32 problem-state instructions for reasons described in the remainder of this section.

The POWER Architecture contained an MQ register which had two characteristics that could complicate a superscalar design: the use of the register was implicit, and it was a single resource. The MQ register would have required the instruction dispatching function in an aggressive design to identify all instructions that use the MQ and to implement a special renaming mechanism for it. The functions provided by the MQ did not justify the added complexity. The exclusion of the MQ resulted in the exclusion of 15 extended-precision shift instructions, two divide instructions, and one extended-precision multiply instruction.

Several instructions from the Power Architecture would have made future processor designs significantly more complex. Three of the POWER fixed-point rotate-insert instructions required three source operands. The need for an extra source operand could have required an additional register file port and would have drastically increased the complexity of the renaming mechanism in a superscalar design. Two instructions that computed absolute values, and two others that performed a subtract with a lower bound of zero, required selection logic after the arithmetic computation and before the writeback to the register file. The need for selection logic after the arithmetic computation could have resulted in a longer minimum cycle time for some implementations.

PowerPC also excluded five other seldom used problem-state instructions. The most complex instruction in the POWER Architecture, Load String and Compare Byte, was one of these.

The specification of the PowerPC memory management mechanism and other related changes resulted in the exclusion of the POWER cache management instructions. PowerPC excluded the database locking and tracking mechanism, another feature of the POWER memory model, because it would have made memory management significantly more complex, and it did not provide the granularity needed by most applications.

In some areas, PowerPC requirements are more flexible than POWER requirements. In the POWER Architecture, an update form load could specify the same register for both target (RT) and base address (RA). If this occurred, the processor would perform the load but suppress the update. The PowerPC Architecture relaxed this requirement, recognizing that there is no reason to use the update form instruction in this case, because the processor would effectively discard the updated address. In the PowerPC Architecture, this use of the update form is not a valid operation and the result is undefined. A similar problem exists with an update form load and store instructions for which RA is GPR 0. If RA is GPR 0, the effective address computation uses the value 0 rather than the content of GPR 0. Saving the updated effective address is senseless when the next use also ignores the value in RA (GPR 0); this operation is not valid in the PowerPC Architecture and the result is undefined.

Another example of relaxed requirements deals with fields in instruction formats which are not necessary for a particular instruction. For example, many POWER instruction formats have an Rc" bit, which controls whether the condition code records the status information describing the result of the operation. In the POWER Architecture, if the Rc bit is a 1 in an instruction for which this status information is irrelevant (for example, loads and stores), the instruction executes correctly, but the status information consists of undefined values. The PowerPC Architecture considers such instructions as invalid forms." If the processor executes an invalid form instruction, the result is undefined, except that the requirements of memory protection and privilege are not violated.

The timing facility defined in the POWER Architecture would have been very difficult to implement in an environment of power management and of processors clocked at rates that might be noninteger multiples of the system clock. The timing facility specified in the PowerPC Architecture allows more flexibility, permitting more cost-effective designs. A consequence of this approach is that PowerPC applications will normally have to use a system service to obtain the correct time, because direct use of the timing facility will not be possible without knowledge of environment variables, available only to the system.

Because of the rapid progress of software and hardware technology, the cost-to-value relationship of some POWER Architecture features has changed. PowerPC relaxed some of the requirements of the POWER Architecture, eliminating aspects of the POWER Architecture that added significant complexity, that limited the speed of implementations, and that restricted the number of execution units that a superscalar design could implement.

These exclusions created deficiencies in the resultant architecture. Furthermore, the POWER Architecture lacked other features necessary to compete in the fast moving marketplace. For example, low-cost processors with good single- precision floating-point performance, and processors for use in symmetric multiprocessor systems, were not possible using the POWER Architecture.

Back to top
   

Completing PowerPC Architecture 
The PowerPC Architecture added features correcting both sets of deficiencies. In defining the additions, the architects defined features in a manner to avoid hardware complexity for handling software errors where it was easy for software to avoid the errors. An example is requiring software to put 0's in the fields of certain instructions.

The architects added a full set of single-precision floating- point arithmetic instructions, as well as two instructions to convert floating-point values to integers. Because most of these instructions provide the single-precision version of existing double-precision instructions, the complexity added by the new instructions is minimal.

The architects also added fixed-point divide and multiply instructions to provide both normal and extended-precision computation without the complexities associated with the MQ register. Two other new instructions (subtract without carry and extend sign byte) completed the fixed-point computation set.

Apart from the changes to the instruction set, the most significant changes were in the memory model and the memory management definition. In the POWER Architecture, the processor did not maintain data memory consistent with either I/O accesses or instruction fetches. Software had to manage memory consistency for both these areas. Before copying an area of memory to disk, software had to ensure that any modified copies of the memory area that were in the data cache had been written to main memory. Before starting a read from disk, software had to ensure that the data cache did not contain a copy of any part of the memory area, and software had to invalidate any copy of the memory area in the instruction cache before restarting the program that requested the operation.

POWER processors always accessed main memory through the caches. The only memory accesses that did not use the caches were accesses to a separate address space referred to as "T=1 space." Programmed I/O (PIO) operations access this address space.

The PowerPC memory model provides greater flexibility. Attributes associated with each page of virtual memory allow software to control how the system performs accesses. The model allows speculative access to any page unless it has an attribute indicating that it contains I/O or it exhibits other volatile characteristics. Other attributes control whether a page may be cached, must not be cached, or may be cached with stores being written through to main memory. This definition makes it possible to map I/O into the main memory space. The definition also included processor-enforced data memory consistency, relieving software of the responsibility for the consistency of memory with respect to I/O operations. As in the POWER memory model, the PowerPC memory model requires software to maintain instruction memory consistent with data memory. Programs that modify or generate instructions must ensure that cached copies of a memory area containing the new instructions are consistent with the main memory before attempting to execute those instructions.

Because software must manage the consistency of instruction memory and the PowerPC Architecture excluded the POWER cache management instructions, the architects defined five new cache management instructions.

The architects defined the PowerPC memory model to be weakly consistent, allowing memory accesses to complete out of order. The Enforce In-order Execution of I/O (eieio) instruction definition gives software an efficient means of controlling the order in which accesses to I/O devices complete.

Because POWER Architecture lacked an atomic memory operation, two new instructions provide a means of performing a memory operation that appears to read and then modify a memory location as one atomic operation. These instruction are similar to those proposed in a report from the Lawrence Livermore National Laboratory [6]. The Load Word and Reserve (lwarx) instruction copies the content of the target memory location, which can be regarded as a lock variable, into a register and then creates a reservation on that location. The program could use the loaded value to compute the new value to be stored in the lock variable. When the processor executes a Store Word Conditional (stwcx) instruction, the processor attempts to store the new value. If the lock variable has not been modified since the lwarx completed, the processor performs the store; otherwise it does not perform the store. In both cases, the processor records the status to indicate the result. If the stwcx fails to perform the store, the programmer can repeat the sequence.

Because these are new instructions that do not affect the binary compatibility of POWER applications, the architecture defined them to access only word-aligned memory locations. If these instructions attempt to access a location that is not word-aligned, the architecture does not define the result of the access. This definition permits simpler implementations, avoiding the need for hardware checking for a programming error that is easily avoided.

The POWER memory model was big-endian. The final addition to the PowerPC memory model was to support both big-endian and little-endian memory models. A mode, controlled by software, specifies whether the current memory semantics are big-endian or little-endian. This permits the use of PowerPC processors in systems designed to run big-endian applications, little-endian applications, or both.

Back to top
   

Summary 
In early 1991, a group of technical leaders from Apple, Motorola, and IBM began work to develop a sleeker and cleaner architecture with expanded function, using the POWER Architecture as a base. The PowerPC Architecture, the result of this effort, permits a range of implementations from low-cost controllers through high-performance processors. It allows the implementation of processors targeted for desktop and notebook systems, yet it contains features to support the efficient implementation of processors for use in a range of multiprocessor systems.

Back to top

  

References

  1. IBM RS/6000 Technology, SA23-2619, IBM Corporation, 1990.
  2. IBM Journal of Research and Development, Volume 34, 1990.
  3. Moore, Balser, Muhich, and East, IBM Single Chip RISC Processor (RSC)," Proceedings of the International Conference on Computer Design, 1992.
  4. Steven W. White and Sudhir Dhawan, POWER2: Next Generation of the RS/6000 Family," PowerPC and POWER2: Technical Aspects of the New IBM RS/6000, IBM Corporation, SA23-2737, pp. 8-18.
  5. Paap and Silha, PowerPC: A Performance Architecture," Proceedings of the IEEE Computer Society International Computer Conference, 1993.
  6. A New Approach to Exclusive Data Access in Shared Memory Multiprocessors," Lawrence Livermore National Laboratory Report UCRL-97663, November 1987.

Previous | Next