IBM i Bidirectional CCSID Information
When Arabic and Hebrew support was added to the IBM i originally, the system was used in a stand alone environment. The designers of this support decided to store the Bidi data in a visual form. This means that the data is stored in memory like you would see it on the display or printer. This had the advantage that no special processing is needed to format the data for presentation, since it is already in presentation form. Since the data only existed on the IBM i, it did not matter what form was used.
When Arabic and Hebrew support was added to the PC systems, the designers of this support decided to store the Bidi data in a logical way. This means that the data is stored in memory in the order it is typed, not how it is displayed. This had the advantage that Bidi data looked to non-bidi applications as "normal" data. The disadvantage was that the system needed to format the data for presentation. Since the data only existed on the PC, it did not matter what form was used.
To show this, here is a set of seven Arabic characters preceeded by 2 blank spaces.
In this sample, the same characters are stored in memory as either "abcdefghi" (logical) or "ihgfedcba" (visual).
However as time went on, customers began to interchange data back and forth between the IBM i and PC applications. They then discovered that even though the same characters were used, the data was not the same. The data needed to be logically reordered.
For several releases, customers have requested that the IBM i address these issues and "do the correct processing" of Bidi data. To solve this situation, several new CCSIDs were created to allow the customer to tell the system what type of action they wished to occur.
Beginning in V4R4M0, PTF(s) were made available to allow customers the option and time to transition to new CCSID behavior for Hebrew and Arabic CCSIDs. This behavior incorporates the concept of visual vs. logical. With the PTF(s) and now with base V5R1M0 (and later releases), the IBM i recognizes the behavior of CCSIDs such as 916, 1255, etc.
There are several reasons to move the data to a logical view. For example, sort and search generally work better when the data is in logical format.
Retain the previous behavior
If customers do not wish to have the data reordered but wish to retain the previous behavior they can change to use CCSID that provides the correct action.
More IBM i Information:
- CCSID mappings that result in Bidi processing.
- Hebrew, CCSID, string type and info.
- Arabic, CCSID, string type and info.
- For Arabic data only, the ligature Lamalif issue.
- What are the string types? What do they mean?
More general bidirectional information
Complex-Text Languages - An Overview (link resides outside of ibm.com)" Presents an overview of aspects of the writing systems of a large family of languages that are collectively called complex-text languages. Complex-text Languages is an introduction to complex-text languages. Bidirectional Languages discusses bidirectional languages, in particular Arabic and Hebrew, including their use in a data processing environment".
The open group has a set of APIs that do this type of work. Information on them can be found here. (link resides outside of ibm.com)