ASCII to EBCDIC conversion


Typical problem areas

When porting a program to z/OS UNIX System Services, you must keep an eye out for these areas where the ASCII to EBCDIC conversion may cause problems:

  • Hard-coded ASCII characters in C code as well as shell scripts

    Avoid using hardcoded values or depending on the values of characters at all costs. For example, a program might use '\012' (octal) instead of

     '\n'
    
    . A program might use characters as indices into arrays that were populated using the ASCII values for indices

    (for example, hash_table['a'] --> is not the same as hash_table[0x61] ), etc.

  • Using the high-order bit of a character for some special purpose

    You can do this in ASCII because only 7 bits are necessary for all the printable characters, but that is not true in EBCDIC.

  • Assuming the alphabet ('a'...'z') is contiguous

    This is true in ASCII, but not in EBCDIC where there are three noncontiguous groups of letters. Even seemingly harmless code like the following probably needs to be changed: char c; for (c='a'; c<='z'; c++) { ... }

  • Using code generated by lexx or yacc

    Often, packages contain C code that were generated by the lexx or yacc utilities. This code will probably contain ASCII dependencies and won't work on . The code needs to be generated on OS/390 by re-running the utilities. Note that this may introduce EBCDIC depenendencies making the code less-portable to other systems but at least it will work on OS/390.

    For example, y.tab.c is typically generated by yacc and there should be commands in the package's makefile instructing make how to invoke yacc to rebuild y.tab.c. There should also be a comment in y.tab.c that specifies the source file that yacc processed to generate y.tab.c.

  • Applications that talk to arbitrary remote systems via sockets (such as an ftp client)

    These applications typically have to assume all text they receive is ASCII and they send out all text as ASCII. They have to convert the data locally as they go along.

    Consider an ftp client: a user will type a command such as

    dir foobar
    

    The ftp server does not want to see these characters in EBCDIC, so the client must convert the data to ASCII before they are written to the socket. Likewise, if you simply write the data coming from the server to the user's screen, it will be meaningless because it will be in ASCII. The client must first convert the data to EBCDIC. This is true even if the server is running on an EBCDIC system, such as OS/390 or VM.

    However, you must be careful to convert only text data. Some applications may mix binary and text in a data stream. For instance, the server might send 2 bytes of binary data preceding a block of text to represent the number of bytes in the block, a cksum value, etc.

  • Code that relies on byte order of data may not be portable. PC systems are "little endian" (that is, the leftmost byte is the least significant), however S/390 and most UNIX systems are "big endian." This typically affects integer and floating point data. If an application is responsible for transferring such data between platforms, you need to either (1) write data exchange logic or (2) translate to text, transfer as text, and then recreate as binary.


 

Functions that support ASCII input/output

The OS/390 C/C++ run-time library functions support EBCDIC characters. The libascii package and V1R3.0 C/C++ __STRING_CODE_SET="ISO8859-1" predefined macro provide an ASCII-like application environment on OS/390.

As of OS/390 V2R8, the libascii functions are integrated into the base of the Language Environment. If you are running on an earlier release of OS/390, you can download our libascii package, which provides an ASCII interface layer for some of the more commonly used C/C++ run-time library functions. libascii supports ASCII input and output characters by performing the necessary iconv() translations before and after invoking the C/C++ run-time library functions. The __STRING_CODE_SET="ISO8859-1" predefined macro generates ASCII characters, constants, and strings. [More]

  

Setting a variable to convert text files in an archive

You can set an environment variable in your .profile to handle conversion from ASCII to EBCDIC for text files contained in archives. Here is an example showing how to set an environment variable called A2E and then use it:

  $ export A2E= '-ofrom=ISO8859-1,to=IBM-1047'
  .
  .
  .
  $ pax $A2E -rzf foobar.tar.Z
  

Commands and functions that handle conversion

There are shell commands, TSO/E commands, and C functions that handle ASCII to EBCDIC conversion.

Here are two shell commands that are useful:

iconv:

For example, the command:


iconv -f IBM-1047 -t IS08859-1 words.txt >converted.txt

converts the file words.txt from the IBM-1047 standard code set to the ISO 8859-1 standard code set and stores it in the file named converted.txt.

pax:

For example, the command:

pax -wf tpgm.pax-oto=IBM-1047,from=ISO8859-1 /tmp/posix/tpgm

backs up the /tmp/posix/testpgm directory, which is in the character set CP1047, into an archive file that is targeted to an ASCII character set(IS646).

The TSO/E commands OPUT, OGET, and OCOPY let you convert files between ASCII and EBCDIC.

The C functions __atoe(), __atoe_l(), __etoa(), and __etoa_l() also perform ASCII-EBCDIC conversion.

As of z/OS V1R2, porting applications using pax or iconv may now be avoidable. EBCDIC programs can rely on autoconversion and file tagging instead. Autoconversion and file tagging can be controlled without changing the program or the file's contents.

  

Enhanced ASCII considerations

As of z/OS V1R2, some improvements were made to make porting applications to the z/OS platform easier. These improvements are collectively known as Enhanced ASCII. The following list is a summary of the Enhanced ASCII addition to z/OS:

  • Enhanced ASCII should be suggested as an improvement to the libascii method. That is, C/C++ programs should be compiled with ASCII instead of CONVLIT(ISO8859-1) or __STRING_CODE_SET="ISO8859-1". One reason is that more CRTL functions are supported with ASCII.
  • It is now easier for z/OS applications to read/write ASCII files, including those that may be shared with other systems. Previously, a porting effort might require adding code to convert ASCII to EBCDIC. If, however, the file can be tagged, it can be automatically converted without writing new code. This is one of SAP goals.
  • Porting of ASCII files using pax or iconv might now be avoidable. EBCDIC programs can rely on autoconversion and file tagging instead. Autoconversion and file tagging can be controlled without changing the program or the file's contents.

 

Contact IBM

Browse z/OS