IBM i globalization

When Arabic and Hebrew support was added to the IBM i originally, the system was used in a stand alone environment. The designers of this support decided to store the Bidi data in a visual form. This means that the data is stored in memory like you would see it on the display or printer. This had the advantage that no special processing is needed to format the data for presentation, since it is already in presentation form. Since the data only existed on the IBM i, it did not matter what form was used.

When Arabic and Hebrew support was added to the PC systems, the designers of this support decided to store the Bidi data in a logical way. This means that the data is stored in memory in the order it is typed, not how it is displayed. This had the advantage that Bidi data looked to non-bidi applications as "normal" data. The disadvantage was that the system needed to format the data for presentation. Since the data only existed on the PC, it did not matter what form was used.

ihgfedcba abcdefghi To show this, here is a set of seven Arabic characters preceeded by 2 blank spaces.

In this sample, the same characters are stored in memory as either "abcdefghi" (logical) or "ihgfedcba" (visual).

However as time went on, customers began to interchange data back and forth between the IBM i and PC applications. They then discovered that even though the same characters were used, the data was not the same. The data needed to be logically reordered.

For several releases, customers have requested that the IBM i address these issues and "do the correct processing" of Bidi data. To solve this situation, several new CCSIDs were created to allow the customer to tell the system what type of action they wished to occur.

Beginning in V4R4M0, PTF(s) were made available to allow customers the option and time to transition to new CCSID behavior for Hebrew and Arabic CCSIDs. This behavior incorporates the concept of visual vs. logical. With the PTF(s) and now with base V5R1M0 (and later releases), the IBM i recognizes the behavior of CCSIDs such as 916, 1255, etc.

There are several reasons to move the data to a logical view. For example, sort and search generally work better when the data is in logical format.

Retain the previous behavior

If customers do not wish to have the data reordered but wish to retain the previous behavior they can change to use CCSID that provides the correct action.

More IBM i Information:

  1. CCSID mappings that result in Bidi processing.
  2. Hebrew,  CCSID, string type and info.
  3. Arabic,  CCSID, string type and info.
  4. For Arabic data only, the ligature Lamalif issue.
  5. What are the string types? What do they mean?
CCSID String Type Code Page Description
420 4 420 EBCDIC (Original CCSID for Arabic data)
425 5 425 EBCDIC with POSIX chars,   like [ ] { } etc.
864 5 864 PC Data
1046 5 1046 Old Windows 3.1
1089 5 1089 ISO 8859-6 
1256 5 1256 MS Windows
8612 5 420 EBCDIC
12708 8 420 EBCDIC 
62218 4 864 PC Data 
62224 6 420 EBCDIC 
62228 6 1256 MS Windows 
62251 6 425 EBCDIC with POSIX chars,   like [ ] { } etc.
The system maps 420 to:
00037, 00256, 00500, 00720, 00737, 00775, 00819, 00850, 00864, 00937, 01008, 01046, 01089, 01112, 01122, 01208, 01256, 04960, 08612, 09030, 09056, 12708, 13488, 28709, 61952, 62218, 62224, 62228

The system maps 425 to: 37 500 819 864 1046 1089 1252 1256 8612 13488 61952 62224 62228

In the cursive languages, ligatures use one glyph to represent two or more specific letters. For example, the ligature Lamalif is used to represent the frequently used pair of letters Lam and Alif.

Since Lam followed by Alif is a very common occurrence, the designers of the IBM i codepage 420 support decided to create one hex code point to store both of these characters in the combined way. They also have an isolated Lam and an isolated Alif at two other code points. So the Lamalif together is hex B8, the Lam is hex B1 and the Alif is hex 56.

Since the data only existed on the IBM i, it did not matter what form was used and the combined form was used quite often.

However, the designers of the Windows Bidi support decided to use another method to support this. They required the user to enter a Lam followed by an Alif, but then showed them a combined LamAlif as one position on the display, but it was still two positions in the buffer. Since the data only existed on the Windows machine, it did not matter what form was used.

However a problem now arises when we want to send data back and forth between the two machines. On the IBM i side, a five character field could contain "abLc ". When it is transferred to the Microsoft® Windows® side, the combined LamAlif can be mapped in one of four ways.

  1. Map the LamAlif to a Lam. This is not a good choice because:
  2. Map the LamAlif to an Alif. This is not a good choice because:
  3. Map the LamAlif to a substitution character to indicate that a character has been "lost". This is a better choice but not perfect because:
  4. Map the LamAlif to a Lam followed by an Alif if there is space in the buffer. This is the best choice but not perfect because you may not have blank places at the end of the buffer to use.

CCSID String Type Code Page Description
424 4 424 EBCDIC (Original CCSID for Hebrew data)
916 5 916 ISO 8859-8
1255 5 1255 MS Windows
62210 4 916 ISO 8859-9
62211 5 424 EBCDIC 
62215 4 1255 MS Windows
62222 6 916 ISO 8859-9 
62223 6 1255 MS Windows 
62235 6 424 EBCDIC
62238 10 916 ISO 8859-9
62239 10 1255 MS Windows
62245 10 424 EBCDIC
The system maps 424 to:
00037, 00256, 00500, 00737, 00775, 00819, 00850, 00862, 00916, 00937, 01112, 01122, 01208, 01255, 04952, 09030, 13488, 28709, 61952, 62210, 62211, 62215, 62222, 62223, 62235, 62238, 62239, 62245

Listed below are the CCSID pairs that will result in bidirectional processing.

  1.        424, 916,
  2.        424, 1255,
  3.        424, 62245,
  4.        424, 13488,
  5.        424, 61952,
  6.        424, 62211,
  7.        424, 62222,
  8.        424, 62223,
  9.        424, 62235,
  10.        424, 62238,
  11.        424, 62239,
  12.        862, 916,
  13.        862, 1255,
  14.        862, 62245,
  15.        862, 13488,
  16.        862, 61952,
  17.        862, 62211,
  18.        862, 62235,
  19.        62235, 862,
  20.        62245, 62210,
  21.        62245, 62215,
  1.        62210, 62211,
  2.        62210, 62235,
  3.        62211, 62215,
  4.        62215, 62235,
  5.        420,   864,
  6.        420,  1046,
  7.        420,  1089,
  8.        420,  1256,
  9.        420,  8612,
  10.        420, 12708,
  11.        420, 13488,
  12.        420, 61952,
  13.        420, 62224,
  14.        420, 62228,
  15.        1256, 12708,
  16.        8612, 12708,
  17.        12708, 13488,
  18.        12708, 61952,
  19.        12708, 62224
  20.         425, 420
  21.         425, 12708

Table 1. Bidirectional Language String Types and Associated Attributes

String Type Text Type Numeric Shaping Orientation Text Shaping Symmetrical Swapping
4 Visual pass-through LTR Shaped Off
5 Implicit Arabic LTR Unshaped On
6 Implicit Arabic RTL Unshaped On
7* Visual pass-through Contextual* Unshaped-Lig Off
8 Visual pass-through RTL Shaped Off
9 Visual pass-through RTL Shaped On
10 Implicit   Contextual Left   On
11 Implicit   Contextual Right   On
12 Implicit Arabic RTL Shaped Off

Note: (*) Field Orientation is left-to-right (LTR) when the first alphabetic character is a Latin one, and right-to-left (RTL) when it is a Bidi character; characters are unshaped, but LamAlif ligatures are kept, and not broken into constituents.

Orientation: In bidirectional languages, some characters, such as English letters, are considered to have a strong left-to-right orientation. Other characters, such as the Arabic characters, are considered strong right-to-left characters. And other characters, such as punctuation marks, spaces, and so on, do not have a strong direction associated with them. These are also contextual. In this situation, the global orientation is set according to the direction of the first significant (strong) character.

Numeric Shaping: In Arabic, it is common to use Hindi numbers instead of Arabic numbers. "1" "2" etc. are the Arabic version of the numbers.

Text Shaping: Specifies the shaping: that is, choosing (or composing) the correct shape of the input or output text.

Note: This value is important, in particular for languages where the shapes of the characters, when presented, correspond to code points that may be different from the code points of the characters stored for processing. In languages such as Arabic or Farsi, the character can have up to four different shapes (see Shapes of the Arabic Characters). In these languages the character is most frequently (but not always) stored and processed using a code point related to a basic shape. Often the basic shape chosen is the isolated shape.

An Arabic Script character often has initial form, middle form, final form, and isolated form

Symmetrical Swapping: The Swapping descriptor specifies whether symmetric swapping is applied to the text. A list of symmetric swapping characters is given in the ISO/IEC 10646 standard. For example, the string "(1)" without might become ")1("

Contact IBM

Browse Power Systems

Next generation applications for big data and analytics and cognitive computing are providing unprecedented insights into opportunities, threats and efficiencies. IBM Power Systems is at the forefront of delivering solutions to gain faster insights from analyzing both structured information and unstructured big data. With the secure, flexible and open platform of IBM Power Systems plus solutions and software, organizations can outpace their competitors by delivering faster services, providing differentiated offerings and turning operational cost into investment opportunity.

To draw insights and make better decisions, businesses rely on the secure, flexible and open platform of IBM Power Systems. Built with the first processor designed for big data workloads, the design of Power Systems combines the computing power, memory bandwidth and I/O in ways that are easier to consume and manage, building on strong resiliency, availability and security.

IBM Power Systems deliver flexibility and choice of operating systems to enable your business to support the next generation applications for big data and analytics and cognitive computing that are transforming how organizations work today. Whether running 1, 2, or all 3 - coupled with PowerVM, they maximize the benefit of Power Systems in your business.

Transform your business with Systems Software that enables virtualization, high availability, flexibility, security and compliance on Power Systems™. IBM’s integrated approach to developing Systems and Systems Software together delivers optimized results with Power Systems.

As an open innovation platform, Power Systems is optimized for big data and analytics performance and to deliver scale-out economics and security for the cloud. IBM and IBM Business Partner solutions exploit key capabilities in IBM Power Systems.

Over the last five years thousands of clients have migrated to IBM Power Systems. Learn how Power Systems has helped them support next generation applications for big data and analytics and cognitive computing on an open platform for choice while improving business performance, reducing risk, and establishing a platform for growth.