Enhancing throughput with CCA and IBM PCIe Cryptographic Coprocessor
The following describes how to enhance throughput of the IBM PCIe Cryptographic Coprocessor when using the IBM Common Cryptographic Architecture (CCA) application programming interface.
When using the CCA API, characteristics of your host application program can affect performance and throughput of the coprocessor. Before designing your CCA applications, there are two areas that you should understand. These areas are (1) multi-threading and multi-processing and (2) caching of AES, DES, and PKA keys. Understanding these areas will also enhance your ability to properly evaluate coprocessor performance.
Multi-threading and multi-processing
The coprocessor can process multiple CCA requests simultaneously. This is because the coprocessor has several independent hardware elements that enable it to multi-process. These elements include the RSA engine, DES engine, CPU, random number generator, and PCIe communications interface. All of these elements work independently, and each can process a part of a different CCA verb at the same time. By being able to work on several CCA API calls at the same time, the coprocessor can keep some or all of its hardware elements busy. Keeping all of the hardware elements busy maximizes overall system throughput.
In order to take advantage of the multi-processing capability of the coprocessor, your host system must send multiple CCA requests at the same time. These requests need to be sent without waiting for each one to finish before sending the next one. The best way to accomplish this is to design CCA host application programs that are multi-threading. Each thread would independently send CCA requests to the coprocessor. Take as an example a Web server application that starts a new thread for each request that it receives over the network. Each thread would independently send requests to the coprocessor. This results in enhanced utilization of the coprocessor and maximium system throughput. Another option is to have several independent host application programs all using the coprocessor at the same time.
Be assured that it is not possible to overload the coprocessor. Incoming requests are automatically managed.
Caching AES, DES, and PKA keys
To increase system performance, the coprocessor can keep copies of recently used AES, DES, and PKA keys in memory caches. These caches are contained within the secure module. Cached keys are stored in a decrypted form, ready for use. Whenever a cached key is reused by a later CCA request, the coprocessor uses the cached copy. This avoids the overhead associated with validating a key token and decrypting the key. For retained PKA keys, the cache eliminates the overhead of retrieving the key from the internal Flash EPROM memory.
As a result of using key caching, applications that reuse a common set of keys can run much faster than those that use different keys for each transaction. Typical CCA applications use a common set of AES, DES, and PKA keys. This makes caching very effective at improving throughput.
Public keys, which have very little processing overhead, and AES keys that are in the clear are not cached.
Note: To improve performance, the CCA implementation provides caching of key records obtained from key storage within the CCA host code. However, the host cache is unique for each host process. Caching can be a problem if different host processes access the same key record. An update to a key record caused in one process will not affect the contents of the key cache held for other processes. To avoid this problem, caching of key records within the key storage system can be suppressed so that all processes will access the most current key records. To suppress caching of key records, use the export command to set the environment variable CSUCACHE to 'N' or 'n' or to any string that begins with the characters 'N' or 'n'.