Section 1 - Installation and Configuration (5%) - Describe how to properly install and configure DataStage EE
- Describe users and groups
- Describe the environment (e.g., dsenv, ODBC)
- Setup database connectivity
- Describe OS configuration/kernel
- Describe USS configuration
- Identify required components for server
- C++ compiler
- Identify the tasks required to create and configure a project to be used for EE jobs.
- Project location
- Assign DataStage EE roles
- Environment defaults
- Given a configuration file, identify its components and its overall intended purpose.
- Project location
- Assign DataStage EE roles
- Environment defaults
- List the steps necessary to start/stop DataStage EE properly.
- netstat -a|grep ds
Section 2 - Metadata (5%) - Demonstrate knowledge of Orchestrate schema.
- Distinguish internal data type (Orchestrate schema) vs external data type
- Describe how to set extended properties for table definition
- Import metadata using plug-ins vs orchdbutil
- Explain nullable mapping rules (e.g., source vs target)
- NLS data types
- Identify the method of importing metadata.
- Flat sources (e.g., sequential file, Orchestrate schema, ODBC, etc.)
- COBOL CopyBook
- XML
- Given a scenario, demonstrate knowledge of runtime column propagation.
- Usage
- Impact on stage mapping and target databases
Section 3 - Persistent Storage (10%) - Given a scenario, explain the process of importing/exporting data to/from framework (e.g., sequential file, external source/target).
- Explain use of various file stages (e.g., file, CFF, fileset, dataset) and where appropriate to use
- If USS, define the native file format (e.g., EBCDIC, VSDM)
- Given a scenario, describe proper use of a sequential file.
- Read in parallel (e.g., reader per node, multiple files)
- Handle various formats (e.g., fix vs variable, delimeted vs nondelimited, etc.)
- Describe how to import and export nullable data
- Explain how to identify and capture rejected records (e.g., log counts, using reject link, options for rejection)
- Given a scenario, describe proper usage of CFF (native not plug-in).
- Explain how to import data from a file that has multiple varying record types (e.g., COBOL CopyBook, EBCDIC to ASCII)
- Describe proper usage of FileSets and DataSets.
- Explain differences and similarities of FileSet and DataSet (e.g., header data file segments, internal DS vs external format (FS))
- Determine which tools can be used to manage FileSets and DataSets (GUI and CLI)
- Describe use of FTP stage for remote data (e.g., how to parallel, plug-in vs enterprise).
- Restructure stages (e.g., column import/export)
- Identify importing/exporting of XML data.
- XML stage options and usage for XPATH and XLS
Section 4 - Parallel Architecture (10%) - Given a scenario, demonstrate proper use of data partitioning and collecting.
- Partitioning goals (e.g., data distribution, meeting data grouping requirements)
- Explain the differences and usage of round robin, entire, same, and hash partitioning
- Explain the differences between auto, ordered, and sort collector
- Identify partitioning type, parallel/sequential by analyzing a DataStage EE screen shot
- Explain the differences between partitioning keys and sorting (stage) keys through a scenario where these do not match
- Demonstrate proper use of partitioning methods based on given business requirements and DataStage EE technical requirements
- Explain the differences between a funnel state and a collector
- Describe input of partitioning and re-partitioning in an MPP/cluster environment
- Given a scenario, demonstrate knowledge of parallel execution.
- Given a job design and configuration file, provide estimates of the number of processes generated at runtime
- Explain the purpose and use of resource and node pools
- Given a source DataSet, describe the degree of parallelism using auto and same partitioning
Section 5 - Databases (15%) - Given a scenario, demonstrate proper selection of database stages and database specific stage properties.
- DB2
- Based on DB2 connectivity requirements, select appropriate stage (e.g., DB2 API, DB2 Enterprise)
- List environment variables needed to successfully run a DB2 job
- Teradata
- Based on functional requirements, select appropriate Teradata stage (e.g., TD Enterprise, TD MultiLoad (target only), TD API)
- Requested sessions and total sessions
- DRS
- ODBC
- Informix
- Sybase
- Oracle
- Remote Server
- Partition table
- Given a scenario using Oracle, load target and proper use of index mode and $API_ORA_LOAD_OPTIONS
- Identify source database options.
- Identify which stages/options read in parallel
- Explain the degree of parallelism for Oracle Enterprise and DB2 Enterprise
- Identify the use of "sparse" vs "normal" lookup
- When to use (e.g., 1:100)
- How to construct sparse lookup and SQL statements
- Given a scenario, demonstrate knowledge of target database options.
- Articulate benefits and limitations of using parallel load method (e.g., constraints, speed, exclusive locks, etc.)
- Explain the differences between upsert and load methods on target database stages
- Describe how to control restart of DB target (separate the "L" from ETL)
- Use OPEN, CLOSE, create temp, Load, SELECT FROM... INSERT INFO, DROP
- Identify the impact of RCP when target has fewer columns than job flow
- Separate update and insert records before target upsert
- Given a scenario, describe how to design EE ETL jobs that will extract data from DB2 (or any DBMS), combine with data from another source and load to another DBMS target.
- Demonstrate knowledge of working with NLS database sources and targets.
Section 6 - Data Transformation (15%) - Given a scenario, demonstrate knowledge of default type conversions, output mappings, and associated warnings.
- Demonstrate appropriate use of modify for NULL handling, string trim, non-default conversions
- Given a scenario, demonstrate proper selections of Transformer stage vs other stages.
- Copy stage/output mapping
- No BASIC Transformer
- Filter vs Transformer constraint (e.g., filter vs interpreted)
- Given a scenario, describe Transformer stage capabilities (including stage variables, link variables, DataStage macros, constraints, system variables, link ordering, @PART NUM, functions).
- Explain NULL handling within parallel transformer (e.g., reject rule, NULL functions)
- Demonstrate the use of Transformer stage variables (e.g., to identify key grouping boundaries on incoming data
- Use a Transformer only to generate a sequence of numbers in parallel
- Identify processes to add functionality not provided by existing DataStage stages (e.g., wrapper, buildops, user def functions/routines).
- Given a scenario, demonstrate no closed loop scenario when updating a reference table.
Section 7 - Combining and Sorting Data (10%) - Demonstrate knowledge of Join, Lookup, and Merge stages.
- Explain the differences between Lookup, Join, and Merge stages and demonstrate when to use each
- Explain link (e.g., input/output/reject) requirements
- Explain sorting requirements
- Explain full outer join usage
- Demonstrate understanding of link ordering with JOIN and MERGE stages
- Lookup - continue (also outer JOIN) with non-nullable input columns
- Explain what happens with duplicate input column names on JOIN and MERGE
- Given a scenario, demonstrate knowledge of SORT stage.
- Describe the differences between parallel SORT, SORT, and sequential SORT
- Describe the differences between unique option in osrt, RemDup stage
- Demonstrate understanding of SORT key column properties (e.g., don't sort, previously sorted)
- Demonstrate understanding of framework inserted sorts
- Examine score
- Set $APT_SORT_INSERTION_CHECK_ONLY
- Distinguish between SQL ORDER BY vs parallel sort
- Explain NLS collation sequences
- Given a scenario, demonstrate understanding of Aggregator stage.
- Memory impact of sort vs hash
- Demonstrate use of Aggregator properties
- Count vs calculation
- Explicitly create output columns and map
- Take output of Aggregator and join with detail records
- Describe the proper usage of change capture/change apply.
- Separate delta rows (e.g., inserts vs updates)
Section 8 - Automation and Production Deployment (10%) - Given a scenario, articulate the change control process.
- "Dev to QA to prod"
- Explain the differences between compile and force compile
- Use of protected projects
- Import/Export
- Including compiled Transformer
- Managing custom components (e.g., buildops, wrappers)
- USS deployment topics (e.g., remote deployment, how to deploy)
- Identify the use of the dsjob command line utility.
- Explain how a DataStage job may be run using a third party scheduler (e.g., autosys, control U, CRON, etc.)
- Explain how performance run statistics from a job run can be captured
- Capture log records to a text file (XML format)
- Explain how to use parameters to simplify the creation and maintenance of job designs.
- Explain the processes necessary to run multiple copies of the source (job multi-instance)
- Given a scenario, demonstrate the ability to use job sequencers (e.g., exception hunting, re-startable, dependencies, passing return value from routing, parameter passing and job status)
- Create different designs using Job Sequencer
Section 9 - Monitoring and Troubleshooting (10%) - Given a scenario, demonstrate knowledge of parallel job score.
- Show how it is captured and interpret its results
- Identify and interpret its contents
- Number of processes
- Number of nodes
- Number of DataSets
- Combined operators
- Parallel/Sequential execution
- Inserted buffer and sort operators
- Given a scenario, identify and define environment variables that control EE with regard to added functionality and reporting.
- Articulate ability to identify runtime metadata vs design metadata
- APT_PM_SHOW_PIDS, etc.
- Given a process list, identify conductor, section leader, and player processes.
- Given a scenario, identify areas that may improve performance (e.g., buffer size, repartitioning, config files, operator combination, etc.).
Section 10 - Job Design (10%) - Demonstrate knowledge of shared containers.
- Reuse using RCP
- Using usage analysis and multi-job compile to recompile jobs that use a modified shared container
- Given a scenario, describe how to minimize SORTS and repartitions.
- Demonstrate knowledge of creating restart points across jobs using intermediate DataSets.
- Given a scenario, demonstrate proper use of standards.
- Naming conventions for stages and links
- Differences between annotation and description annotation
|