Skip to main content

 
IBM Systems  > System p  > Hardware  > 

PowerPC 601 Microprocessor

  
   
Introduction
Block diagram
Pipeline Description
Processor Functional Units
Chip and Packaging Technology
Multiprocessor Features
Performance
Conclusion
 

Introduction 
The PowerPC 601 microprocessor* is a highly integrated single-chip processor that combines a powerful RISC architecture, a superscalar machine organization, and a versatile high-performance bus interface. The processor contains a 32KB unified cache and is capable of dispatching, executing, and completing up to 3 instructions per cycle. The bus interface configurations provide a wide range of system bus interfaces, including pipelined, non-pipelined, and split transactions. The result is a cost effective, general purpose microprocessor solution that offers very competitive performance.

The 601 microprocessor is the first member of a family of devices which IBM and Motorola jointly developed in Austin, Texas. IBM and Motorola have committed to providing a wide range of processor designs that are compliant with the PowerPC Architecture [1].

The 601 project had several key goals. First, it was important that the 601 establish the PowerPC Architecture in the marketplace as early as possible. Second, the processor had to offer competitive performance at a low cost. Finally, the 601 had to be suitable for a wide range of system design points, including those requiring support for symmetric multiprocessing.

To achieve these goals, the 601 design group accepted an aggressive 12-month development schedule. An evolutionary path exploited technology that existed at both IBM and Motorola. From IBM, the RISC Single Chip (RSC) microprocessor [2] became the base design for 601. The 601 enhanced the superscalar machine organization to achieve greater performance and applied additional custom circuit design to reduce the die size and to allow higher frequency operation. The Motorola 88110 microprocessor bus interface formed the basis of the development of the 601 bus interface [3].

Figure 1
Back to top

Block diagram 
As shown in the high-level block diagram ( Figure 1 ), the 601 is a superscalar design with three pipelined execution units. The processor can dispatch up to three 32-bit instructions each cycle - one each to the Fixed-Point Unit (FXU), the Floating-P oint Unit (FPU), and the branch unit (BPU). The common on-chip processor (COP) is the master control logic for the built-in self-test (BIST), the debug, and the test features of the 601 chip. The 32KB unified cache provides a 32-bit interface to the FXU, a 64-bit interface to the FPU, and a 256-bit interface to both the instruction queue and the memory queue. The chip I/Os include a 32-bit address bus and a 64-bit data bus. In addition, the chip supports a special asynchronous serial port, the COP Bus, which provides advanced debug and test features.


Back to top

Pipeline Description 
The designers optimized the 601 pipeline structure for high performance and concurrent instruction processing in each of the execution units. (See Figure 2 ). In general, the FXU serves as the master pipeline, managing the synchronization control required to achieve precise exceptions. The fixed-point pipeline performs all integer arithmetic logic unit (ALU) operations and all processor load and store instructions, including floating-point loads and stores. In some cases, floating-point loads and stores also progress through the floating-point pipeline. Hardware handles all hazards and dependencies. Forwarding logic allows the integer ALU operations to operate in a fully pipelined manner. There is a one cycle penalty for a dependent operation following a load instruction.
Figure 2

The branch instruction pipeline has only two stages. The first stage can dispatch, decode, evaluate, and, if necessary, predict the direction of a branch instruction in one cycle. On the next cycle, the resulting fetch can be accessing new instructions from the cache. This allows the processor to react quickly to branches detected in the instruction stream and to reduce the latency of subsequent instructions.

The floating-point instruction pipeline contains six stages and has been optimized for fully pipelined execution of single-precision operations. Although hardware supports all double-precision operations, most of which are fully pipelined, those that involve multiplication are double pumped through the Execute1 and Execute2 stages of the pipe.

The 601 microprocessor is comprised of several functional units as shown in the block diagram in Figure 1 . The following sections describe each of these functional units.

The 601 contains an eight-entry instruction queue for holding prefetched instructions. An eight-word bus from the cache feeds the queue. During each cycle, the dispatch logic considers the bottom four entries of the instruction queue and dispatches up to three instructions ( Figure 3 ). There are no instruction alignment restrictions, and the queue supports all possible shift amounts.

The processor can dispatch branch instructions and most floating-point instructions from any of the bottom four entries of the instruction queue. The 601 microprocessor dispatches fixed-point instructions only from the bottom queue entry. The dispatch logic dispatches floating-point stores to both the FXU (for address generation) and the FPU (for data sourcing). The processor dispatches floating-point loads only to the FXU, but the FPU is made aware of any dependencies, and the cache forwards data directly to the FPU.


Back to top

Processor Functional Units 
The 601 microprocessor consists of the following units:

  • Instruction queue and dispatch unit
  • Instruction fetch unit
  • Branch processing unit
  • point execution unit
  • point execution unit
  • Memory management unit
  • Cache
  • Memory queue
  • Bus interface unit
  • Sequencer unit
  • Common on-chip processor unit

Instruction queue and dispatch unit
The 601 supports out-of-order dispatch; the processor can dispatch branch instructions and most floating-point instructions (and remove them from the instruction queue) even if an interlock exists for a preceding fixed-point instruction.

Figure 3

These folded instructions execute concurrently and do not occupy a position in the fixed-point pipeline. A unique tagging and counting mechanism preserves the program order completion of these instructions. Although out-of-order dispatch is more complex to implement, it also allows the 601 to expose subsequent branches earlier, reducing the potential dispatch stalls that may otherwise result.

Instruction Fetch Unit
The instruction fetch unit coordinates instruction fetching from the cache. Several different sources generate instruction fetch addresses. The branch processor provides the address that results from branch instructions. The sequencer unit provides addresses associated with interrupts and other synchronizing events. The instruction fetcher itself generates the next sequential address if no branch or interrupt has occurred. During each cycle, the processor selects, translates, and forwards the appropriate address to the cache arbitration logic for consideration to access the cache. Instruction queue and dispatch logic accepts and processes instructions fetched from the cache.

The instruction fetcher also provides means for address translation of instruction fetch addresses. The translation shadow array (TSA) automatically keeps track of the four most recently used instruction address translations and provides an associative comparison in parallel with the address generation of any instruction fetch. The TSA provides support for both page-oriented and block-oriented address translations. In the event of a miss in the TSA, the instruction fetcher arbitrates for access to the 601's primary memory management unit for translation, and then updates the TSA based on a least recently used replacement algorithm.

Branch Processing Unit
The branch processing unit (BPU) executes all branch instructions. In the PowerPC Architecture, branch instructions can be conditional on a bit in the Condition Register, conditional on the state of the Count Register (useful in iterating loops), conditional on both registers, or simply unconditional. In addition, the branch target address can be absolute, program counter relative, or indirect from either the Link Register or the Count Register. Also, the execution of certain branch instructions require that the Link Register automatically saves the next sequential address. (This is useful for subroutine linkages.) The branch processor optimizations target the various interlocks and provide fast responses to all types of branch instructions.

The branch processor can execute branches out of order with respect to the fixed-point unit. As a result, it is possible for the BPU to alter the Link and Count Registers before all preceding fixed-point instructions complete. To guarantee correct program operation when a preceding fixed-point instruction is dependent upon either the Link or Count Registers, the 601 uses a technique called register renaming to keep the architected values of these registers synchronized with the fixed-point unit. This technique also allows restoration of the correct value of the Link and Count Registers if a preceding instruction takes an exception.

The BPU dispatches and completely executes unconditional branches (or branches that are conditional only on the Count Register) in a single cycle. As a result, these types of branch instructions generally provide zero cycle branches ; they present no delay in the instruction dispatch stream to the fixed-point or the floating-point units.

Conditional branches that are dependent on the Condition Register can either be immediately resolved or unresolved at the time of dispatch. Each of the eight fields of the Condition Register has a set of associated interlocks which are activated by instructions that will eventually update that field. If a conditional branch is dependent upon a noninterlocked Condition Register field, then the BPU can immediately resolve the branch. In this case, the BPU evaluates the condition based on the contents of the Condition Register; and the branch exhibits performance characteristics similar to the unconditional branches described previously.

On the other hand, unresolved branches are branches which are conditional on an interlocked Condition Register field. In these cases, the 601 employs a static branch prediction algorithm to predict the direction of the unresolved branch. The 601 algorithm essentially predicts that the branch will be taken if the displacement of the target address is negative, and predicts not taken if it is positive. As an aid for compiler- directed predictions, a bit in the opcode of the branch instruction allows this prediction scheme to be reversed. The processor can conditionally dispatch instructions fetched on behalf of a predicted conditional branch, but it will never execute them until it validates the prediction as correct.

Eventually, the processor releases the Condition Register interlocks and the checks the validity of the prediction. If the prediction was correct, then the branch is complete and the processor marks all prefetched instructions as valid and eligible for execution. If the prediction was incorrect, then the processor discards any prefetched instructions, and fetching resumes down the correct path. To ease the burden of incorrectly- predicted conditional branches, the 601 employs several features to speed up recovery. First, a fast alternate address restore mechanism saves the alternate path address and quickly redirects the instruction fetcher. Second, the FXU produces and forwards the results of compare instructions directly to the branch processor as well as to the Condition Register. This permits earlier resolution of unresolved conditions. Finally, the instruction queue implements a delayed purge mechanism which retains any prefetched sequential instructions beyond the conditional branch until the prefetched predicted instructions are made available for loading into the instruction queue. If the 601 resolves the condition before the predicted instructions are available (the likelihood of which is increased by the optimizations for the compare instructions in the fixed-point unit), and the prefetch was incorrectly predicted down the taken path, then dispatch simply continues down the sequential stream using the instructions that are remaining in the instruction queue.


Previous | Next Back to top