|
Introduction
The PowerPC 603 microprocessor* is the second member of the PowerPC microprocessor family [1, 2]. This powerful low-cost superscalar implementation of the PowerPC Architecture [3, 4] is intended for use in notebook computers and low-end desktop computers.
The 603 design offers high performance at a low power level. The 7.4 mm by 11.5 mm CMOS chip features on-chip 8KB instruction and data caches coupled to a high performance 32/64-bit system bus. Peak instruction rates of 3 instructions per cycle at 80 MHz, with power levels below 3 watts at 3.3 volts, offer unparalleled notebook and portable computer performance.
Microarchitecture
The 603 represents a new microarchitecture organization for the PowerPC Architecture family. The goals for the microarchitecture are to increase fixed-point performance, to simplify exception recovery, and to provide specialized units for dynamic power management.
The improved fixed-point performance is accomplished by improving the previous POWER implementation of the fixed-point unit [5]. In contrast to POWER's single fixed-point execution unit, which handles all instructions except floating-point instructions and branches, the 603 incorporates three separate units that run in parallel for fixed-point superscalar operations.
The load/store unit handles all data movement between the data cache and the register files. The fixed-point unit handles all register-to-register operations. The third unit, the system unit, handles all system register operations.
Additional development effort was focused on a generalized dispatch/rename scheme which utilized simple rename buses and autonomous functional units. This approach allowed the design teams to work on their respective functional units independently within the structure of the microarchitecture.
Exception recovery is simplified by having rename buses for both fixed-po int and floating-point registers. Combined with a completion register queue and a centralized in-order tracking mechanism, this robust structure supports superscalar dispatch, out-of-order execution, and speculative execution.
To conserve power, a new dynamic power-management system controls the processor clocks. With this system, functional unit clocks run only when specific instructions are dispatched to the corresponding unit.
Modeling
Extensive modeling determined the final design point for the 603. The proper balance between power, performance, size, and schedule was crucial for the success of the design project.
The project goals were set to have the 603 offer performance comparable to contemporary workstation processors while operating at notebook power levels (below 3 watts) and while consuming less die area than other high volume personal computer processors.
Tradeoffs
A significant trade-off for high-performance, low-power operation occurs between frequency and complexity (size). Figure 1 depicts the performance for two power levels, 2 watts and 3 watts, as well as two levels of complexity. The performance for the first configuration (single instruction issue) is lower even though the clock frequencies are higher for a given power level of 2 or 3 watts. Based on this and other analyses, the 603 was targeted to be a dual-issue design with five major functional units: branch, fixed, float, load/store, and system.
Overview
The block diagram of the 603 processor is shown in Figure 2 . The instruction fetch unit prefetches instructions from the instruction cache into the instruction buffers. The branch unit executes any branch in the prefetch buffer and redirects the prefetch unit accordingly.
The dispatcher decodes two instructions at a time from the instruction buffer and dispatches them if possible to available execution units.
The 603 fixed-point unit (FXU) executes most integer instructions in a single cycle. The floating-point unit (FPU) is a pipelined unit with single-precision and double-precision instructions. The FXU contains the general purpose register (GPRs); the FPU contains the floating-point registers (FPRs). The load/store unit (LSU) handles loads and stores to the register files through its data cache interface. The system unit executes condition-register and other system-register instructions. The execution units write the result of a finished instruction to the proper rename bus and to the associated rename register.
The completion logic retires instructions in order and allows the exception logic to post interrupts at the proper point in the instruction flow. As instructions are retired, architectural registers are updated from the rename registers and the proper program state is maintained. Up to 2 instructions per cycle can be retired by the completion unit. Including the branch unit execution, the 603 has a peak instruction throughput rate of 3 instructions per cycle.
The split 8KB data and instruction caches have associated memory management units (MMUs) which implement the PowerPC Virtual Environment Architecture.
Pipeline
The 603 pipeline is shown in Figure 3 for several types of instructions. The four major pipeline stages are: fetch, dispatch, execute, and writeback. Note that with the use of the rename buses and associated registers, the actual writeback stage does not always occur immediately following the execute stage.
Functional Units
The major 603 partitions include split instruction and data caches; fetch, dispatch, and completion units; fixed-point, floating-point, load/store, branch, and system units; and the bus interface and common on-chip processor (COP) units.
Caches
The 8KB instruction and data caches in the 603 are two-way set associative with physically addressed cache and tag arrays. The 32-byte cache lines are selected for replacement based on a least recently used (LRU) algorithm.
A three-state coherency protocol is supported by the 603 with the Invalid, Exclusive Unmodified, and Exclusive Modified states. This protocol is a compatible subset of the MESI (modified, exclusive, shared, invalid) four-state protocol and can operate coherently in systems using the MESI protocol. However, since the 603 does not broadcast cache operation instructions, symmetric multiprocessing is not supported in hardware.
Cache loads use four 8-byte transfers. While a cache line is being loaded, access by the functional units is blocked until the reload completes. The critical double-word is simultaneously written to the cache and forwarded to the requesting unit, thus minimizing stalls due to load delays.
The 603 implements the PowerPC 32-bit virtual memory management scheme. The major features of the system include:
- 64-entry, two-way set-associative data and instruction translation look-aside buffers (TLBs)
- TLB updates handled in software and supported by a fast-trap mechanism
- entry, fully-associative data and instruction Block Address Translation registers (BATs)
- 16 Segment Registers
Previous | Next
|