|
Table of Contents
Introduction
Data Set Naming Guidelines
Things Not To Include In The Data Set Name
Common Applications — Naming Conventions
Introduction
This document was written by John Tyrrell of IBM's Storage Systems Division. John is one of the original senior architects of DFSMS. He also invented TMM (Tape Management Methodology) and is the author of the Volume Mount Analyzer program which is part of DFSMSdfp. He is the senior architect/inventor of the DFSMS Optimizer product, one of the major components of DFSMS. John interacts with a wide variety of IBM customers, and has personally been to over 600 customer data centers and participated in over 350 tape and DASD studies to better understand and address IBM customer needs.
John spent 10 years in application development of a system with 6 million lines of code which ran in over 60 IBM data centers. Many of the data set naming techniques in this document were developed based on both John's application experience as well as working with IBM customers.
This document is intended to offer suggestions on the naming of data sets such that one would be able to fully exploit the technology of both DFSMS (TM) and MVS/ESA (TM). It does not imply that one must rename all of their data, nor does it imply that these are the only conventions that will work under DFSMS and MVS/ESA.
These naming conventions are suggestions resulting from many visits and interactions with IBM internal and IBM customer application areas. These suggestions reflect both the opinion and the experience of author, and as such, do not necessarily represent the view the IBM Corporation.
As with all standards, they should act as a guide for the reader. The reader, in this case, should really be the application designer. This is the person who will set up all of the JCL, CLISTs, ISPF panels, JCL skeletons, JOB networks, etc. in order to run the major application. He or she is usually the person who sets up the standards to be applied to all procedures involved with running a major application.
It is recommended that the reader go through the entire document, and then build his or her own set of rules based on the suggestions offered here as applied to the particular application in question.
Data Set Naming Guidelines
The purpose of this document is to offer some guidelines as to what constitutes proper data set naming conventions. This would allow customers to easily exploit the functions of DFSMS (TM) for the proper assignment of System Managed Storage (SMS) policies to manage the data. Although the ACS (Automatic Class Selection) Routines will allow the Storage Administrator to filter on more than just the data set name, the name of the data set is fundamentally important. Some of these advantages are listed below:
- It allows more flexibility for the assignment of levels of service to data sets
- It is easier to write/maintain ACS Routines
- The ACS Routines are more durable (i.e., meaningful over time)
- Data set filters can be useful for other storage management techniques (e.g., ISMF, DFDSS, etc.)
There are two basic pieces of information that one should be able to obtain from the data set name:
- Who owns it?
- What is it?
The following sections will highlight all of the basic components that could potentially be used in a data set naming standard.
Componets of a Data Set Name
High - Level Qualifier (HLQ)
Relative Importance
File Contents
User Name
Data Set Level
Components of a Data Set Name
Not all of the following levels of qualification are necessary for naming data sets. Instead, these represent some common levels of qualification that one tends to find in a good and meaningful data set naming convention. Some of the qualifications make sense for certain types of data while other levels don't. This list is intended to be a superset of all possible types of qualification levels.
Also, not all of these levels have to be coded as a separate level of qualification (i.e., separated by periods). Other possibilities are via positional characters within a given qualifier. The one exception to this is the High-Level Qualifier (HLQ). One should not create unnecessary numbers of these due to positional characters. This is explained in more detail in the next section.
High-Level Qualifier (HLQ)
The HLQ should identify who "owns" the data. The purpose of this could be for things like billing purposes, or simply for locating the owner in case of a problem. It may represent a userid, a project, an application, a business unit, a group, etc. It may also represent a sharing of the same set of data by a set of individuals from a security standpoint (e.g., like the notion of RACF userid and groupid).
There should be no other levels of qualification imbedded in this portion that would tend to artificially multiply the number of HLQs in an installation. The goal should always be to minimize the number of HLQs to the point that they serve the management purpose (i.e., billing, identification, etc.).
It may be important to even have a standard convention within the HLQ. For example, all TSO userids begin with a "$" as the first character -- this would allow the Storage Administrator to easily avoid filter collisions in ACS Routines. Again, the goal of doing this would be to allow the Storage Administrator to get his job done more easily in that all TSO data could be easily filtered out. The trade-off here, of course, would be with the usability of the TSO LOGON ids.
Another trade-off here would be the intent of managing groups of application data. It may be more important, for example, to associate certain TSO LOGONs with the particular application area so that large applications could easily be identified and moved by filtering on the first character of the HLQ. There is also a usability problem here in that the TSO user would have to keep changing his id if his job changed from one large application to a different one. This also means that electronic mail might be a problem if individual users had a lot of LOGON id changes -- this also has some security implications as well (electronic mail being sent to the wrong id).
Note: As a personal recommendation from the author, it has been found that one tends to cause more problems by choosing userids that will definitely change due to career changes, etc. A better way is to have individuals keep a standard LOGON id, and change the set of filters instead of the ids.
As an example of this set of conventions, consider a common problem of constantly shifting workloads. If the Storage Administrator was always faced with getting the data for a given application and moving it to another system, then it would make good sense to have a naming convention for the HLQ that would allow him to easily accomplish this via filtering techniques. Other reasons are for data portability to other installations for disaster/recovery situations that would cause an application to be brought up on an alien system.
One example of a naming convention for HLQs might be the following:
First Character:
- A - Accounting Support
- D - Documentation
- E - Engineering
- F - Field Support
- M - Marketing Support
- P - Programming
- $ - TSO userid
Note: An alternative here is to use one of the characters above as the first character of the TSO userid to allow movement of all data within an application (including the TSO data). This is not the way recommended by the author -- the preference would be a standard code for TSO, such as "$".
The above list does not represent all of the possibilities. For example, a bank might have separation of HLQs by major application such as checking, savings, mortgage loans, investments, ATM, etc. An insurance company might have applications such as life, auto, personal insurance, major medical, corporate accounts, etc. Remaining Characters = Project Name or Code Number
Note: This might have been chosen because a project code might exist for programming, engineering, documentation, and accounting, but it does not imply that one must do it that way. It has been the experience of the author that similar project codes would not be common.
A more natural idea is to have the actual project name following the first character. For example, the remaining portion of the "E" project could be the code name of the machine being designed, while the "P" project could be the name of the programming application or product name. Some examples might be:
- E3090M150
- E3090M200
- E3090M300
- E3090M400
- E3090M500
- E3090M600
- E3380K
- E3380J
- E3380D
- E3380S
- E3990M3
- E3990M2
The above example would allow various filtering techniques the flexibility of recognizing different sets of data easily. Some examples of this follow:
- HLQ = E* -- All of the Engineering Data
- HLQ = E3090* -- All of the 3090 CPU Family of Designs
- HLQ = E3380* -- All of the 3380 DASD Family of Designs
- HLQ = PF* -- All of the Programming Financial Area
- HLQ = PP* -- All of the Production Control Applications
- HLQ = *1234 -- All of the "1234" projectHLQ > $* -- all of the non-TSO data
Relative Importance
This level of qualification might indicate things like:
- Production Data
- Development Data
- Test Data
In general, it would be important to be able to recognize the distinction between production data and test data. Other types of levels could be:
- Master Data
- Update Data
- Work Data
File Contents
This level of qualification should state what the data is. For example, an application strips some information out of the master data base and builds a work file for subsequent processing. This file contains the employee id number and his job code. One might then call this file the "employee- job code file". Other examples might include such things as:
- Telephone call log (TELPHLOG)
- Parts inventory file (PRTINVEN)
- Parts unit cost file (PRTUCOST)
- Payroll file (PAYROLL)
- Checking account transaction file (CHKXACTN)
- CADAM circuit design file (CKTDESGN)
- Heat dissipation statistics file (HEATSTAT)
- Simulation result file (SIMRSLTS)
- Program source (PGMSRCE)
- Life insurance account file (LIFEACCT)
- User's Manual script file (USMANUAL)
- Input manufacturing file (INPMANUF)
- Transportation bill of lading file (XPRTBILL)
All of the above examples describe what the data is. There should be a unique character code for each data set type within a given application. This concept is demonstrated with each of the examples above.
Note: Eight characters have been used as the standard for the data set type in all of the above examples. This is probably a reasonable number of characters, although not mandatory. It may not even be a good idea to make file names too readable, such as, a file called MASTER.PAYROLL.WEEKLY. This might be just too tempting to your average system hacker. A better choice might be MR.PY.WK.
User Name
This qualifier should allow the end creator to assign their own unique name to identify the particular set of a certain type of data. For example, with a master circuit file, there would be a distinction between MYPART and YOURPART, or MYPART1 versus MYPART2. Some other examples of these are:
- Part number (#0135678)
- Print program (PRTPROGM)
- New York Area (NEWYORK)X15 Model (X15MODEL)
- Geological site #458 (GEO#458)
- Branch office #57 (BROFC057)
The intent of this level of qualification is to uniquely identify one piece of a type of data from another piece of the same type of data. For example, the distinction between "TELPHLOG.NEWYORK" versus "TELPHLOG.CALIF" or between "PGMSRCE.MYPGM1" versus "PGMSRCE.MYPGM2". This should use the entire qualifier (i.e., all of the eight characters).
Data Set Level
Another level of qualification that is sometimes useful in application areas is the level of the piece of data. Some examples of this are listed below:
- Design level or version or release level (e.g., for engineering, programming, documentation)
- Change number, an arbitrary number to indicate a constantly increasing number for subsequent improvements on a piece of data.
- Cyclic level (e.g., yearly, monthly, weekly, daily)
Things Not To Include In The Data Set Name
There are certain pieces of information that should never be part of the data set name. The general category of this data is that information which is very likely subject to change. This type of qualification usually doesn't add anything meaningful in terms of identifying the data for storage management reasons. Some examples of this are shown in subsequent sections.
Department Number
This is a piece of information that is sure to change either due to re-organization, or movement of projects or individuals.
Application Location
If this application ever got moved to another site, then all of the data sets would have to be renamed. In some cases, it could be important to name the data by a location name, for example, CHICAGO. This could be perfectly acceptable in certain cases. Suppose there were two telephone directory files, "TELPHDIR.CHICAGO" and "TELPHDIR.PODUNK". This would be okay. That's because it is very unlikely that either Chicago or Podunk would ever move.
Application workload is another story. For example, if an application currently runs in Chicago, or Podunk, one might be tempted to include this as part of the data set name for disaster/recovery purposes. If the chance of moving these applications to run in a different location, then it would be a bad idea to imbed it into a name.
Management Criteria
This type of generic criteria that some people recommend is the notion of identifying disaster/recovery data, or vital records, etc. within the data set name. In general, it is not a good idea to imbed management criteria within the data set name, since it is mainly driven by the technology at hand. They may also change for business reasons (e.g., a change in the state laws).
If either the technology or the business reason changes, then the data set would have to be renamed to match it. One should simply name the data for what it is and keep its management policy separate.
Therefore, it is not a good idea to specify qualifiers of "DR" or "VITALREC" etc. By always naming the data for what it is, the policy for managing it can be kept separately without ever needing to rename the data.
Output Device Type
This is a general class of information that is also based on a certain level of technology. As technology improves, one may very well decide to change the way it is managed. For example, one might put a data set qualifier of MICROFIC to indicate that this data set should be put out to microfiche -- whereas somewhere in the future, one could envision this same piece of data being put to a high-speed link which is connected to some automated high-capacity storage device.
Qualifiers such as "TAPE" or "DISK" or "T3480" or "T3490" should never be coded in data set names. This type of qualifier is destined for change as newer device technologies get created.
Expiration Date
The EXPDT and RETPD allocation keywords associated with the data set are another form of management criteria in that they specify the purge date for data. These keywords clearly put storage management in the hands of the end users.
To change the policy, one must change the date associated with the data set. This is just as bad as forcing the application to go back and re-figure the new management criteria. It represents a policy of sorts and therefore, should be separated from the name itself. This type of information should be allowed to change without actually changing the name of the data.
Access Method
At one point in time, many installations adopted a policy of distinguishing VSAM data from non-VSAM data. The reality behind this standard was that there were many functions that were not supported for VSAM and this was an easy way to recognize that data.
The main reason for this distinction was because of old VSAM catalogs and the "ownership" of the volume and data by VSAM. In order to avoid these problems, one had to separate by catalog, the VSAM data from the non-VSAM data. We have come a long way since then and this notion is no longer needed with the use of ICF catalogs. There should be NO distinction because of access method.
Job Name
Some installations have used the name of the job which created this data set. This is certainly a piece of information which is very likely to change. It usually says very little about what this piece of data actually is, since the job usually creates all sorts of data set types. It is not a good practice to include this information as a part of the data set name.
Common Applications -- Naming Conventions
This section will focus on some of the common MVS applications that tend to put out a fair amount of data on the system and the suggestions for an associated naming convention.
TSO Naming Conventions
VSAM Data Set Naming Conventions
DB2 Naming Conventions
Generation Data Sets
TSO Naming Conventions
TSO certainly has a very recognizable data set naming convention. The rules are fairly simple and easy to understand:
- Three levels of qualification
- userid.dsname.dstype
- Standard set of data set types (e.g., CLIST, FORT, PLI, CNTL, etc.)
All of the TSO functions, commands, etc. and also the ISPF/PDF functions tend to complement this naming convention. Therefore, for an ease of use and a transportability issue, it is generally a good idea to maintain this convention.
Some applications run with production-type data under TSO using the standard JCL PROCs, CLISTs, PANELS, etc. that would be used for the normal production data. For example, consider a programming application that had a naming convention of:
library.dstype.project.version.release
where,
- library: (PROD, DEVL, or TEST)
- dstype: (SOURCE, MACRO, LOAD, OBJ, JCL, etc.)
- project: (APPLIC1, APPLIC2, . . .)
- version: (V1, V2, . . .)
- release: (R1, R2, . . .)
It would certainly be acceptable for unit testing on certain data to be done under a TSO userid such that the userid would be substituted for the "library" and all of the remaining qualifiers would stay the same. The key point here is the usability of the system and the need to manage it differently based on what set of data this actually is.
VSAM Data Set Naming Conventions
Many companies have decided that it is a good idea to associate all of the VSAM components with the base cluster by some recognition pattern. Some of the reasons behind this have to do with billing, and also with the usability of doing catalog locates, etc. to find all of the associated components easily with available software technology.
The normal standard that has been adopted in most cases is the first portion of the name being the cluster name and the component name of DATA, INDEX, or AIX name. For example, consider a VSAM cluster of X.Y.Z that was a KSDS with two AIXs which were also keyed data sets.
Also, assume that there were two path names defined over the alternate indexes. The set of names would be:
- X.Y.Z
- X.Y.Z.DATA
- X.Y.Z.INDEX
- X.Y.Z.PATH1
- X.Y.Z.AIX1
- X.Y.Z.AIX1.DATA
- X.Y.Z.AIX1.INDEX
- X.Y.Z.PATH2
- X.Y.Z.AIX2
- X.Y.Z.AIX2.DATA
- X.Y.Z.AIX2.INDEX
DB2 Naming Conventions
The standard data set naming convention for all DB2 data sets is the following:
hlq.DSNDBx.dbname.tblspacename.I0001.A00n
where,
- DSNDBx is
-
- DSNDBC -- cluster
- DSNDBD-- data component
- DSNDBI -- index component
- dbname -- data base name (user selected)
- tblspacename -- table space name (user selected)
- I0001 -- hard-coded constant
- A00n -- where "n" is system generated for extensions to the table space
Although the DB2 naming convention is certainly distinguishable, it is difficult to associate different management policies for different sets of data within a particular data base. For example, in a programming application, one can easily imagine different management policies for things like (SOURCE, MACRO, SCRIPT, etc.) versus things like (OUTLIST, LISTING, TESTLIST, LIST, ASMLIST, etc.).
Not all data within an application has the same management criteria --in the case of DB2 applications, it is difficult to place distinguishable characteristics within the data set name so that ACS Routines can easily ascertain one piece of data from another. The only alternative to this is to place a very restrictive convention on the qualifiers that can be specified.
For example, one could break down the table space name by having positional characters represent different levels of qualification for the data. The following is one possibility:
- 1st character: P -- production, T -- test
- 2nd character: R -- report, I -- intermediate file, M -- master DB, etc.
- 3rd character: W -- weekly, D -- daily, M -- monthly
- 4th-8th characters: table space name
Until this restrictive naming convention is lifted, the only other alternative is to use the available free characters to distinguish the data within the data base application.
Generation Data Sets
The notion of a GDG (Generation Data Group) is that for a given name,say "A.B.C", there could be many generations. Each generation data set GDS) would have the following name form:
A.B.C.GnnnnV00
where,
nnnn = next number in sequence (may wrap around).
The only naming convention difference here is that the application loses one level of qualification, mainly the lowest level. Other than that, there is no real distinction between GDSs and other data set names. It is generally not a good idea, however, to use the generation to indicate a different type of data. For example, the idea of the "+1" version being a report, the "+2" version being an intermediate file, etc. The first portion of the name should state what the data is and the generation should represent a later level or generation of that data. A better idea would be to have a separate GDG for reports, intermediate files, etc.
(c) International Business Machines Corporation 1998
IBM Storage Systems Division
5600 Cottle Road
San Jose, CA 95193
www.ibm.com/storage
Printed in the United States
4-98
All Rights Reserved
The following are trademarks or registered trademarks of the IBM Corporation in the United States, other countries, or both: IBM, DFSMS, DFSMShsm, DFSMSdfp and MVS/ESA
Other product names are trademarks or registered trademarks of their respective companies.
References in this publication to IBM products, programs, or services do not imply that IBM intends to make them available in all countries in which IBM operates.
|