Data Definition Analysis

Objectives

Data definition analysis is key to most redevelopment planning efforts. Analysts attempting to assess a system-wide migration, integration or redesign effort without adequate knowledge of that system's data usage is at a major disadvantage. Specific objectives of this task include:

· Inventory and cross reference system-wide data usage at the Input/Output (I/O) level

· Determine data usage quality across each system

· Extract "primary" data elements required to recreate this system or group of systems

· Assess inter-system data usage in support of integration or related redevelopment objectives

· Determine level of data usage redundancy and/or inconsistency

· Assess data usage impact on system maintainability and redevelopment

· Provide key input to Year 2000 or similar field & record size expansion planning efforts

· Highlight data stores and elements impacted by physical data conversion or field length expansion projects

Entrance Criteria

The entrance criteria for the data definition analysis task are listed below.

· Completion of environmental analysis for all systems and sub-systems of interest

· Access to program/Copy source code, run JCL /tables and data dictionaries if available

· Access to physical data stores for all systems and sub-systems of interest

· Completion of Environmental Counts and Scores on Technical Assessment Form 003A

· Requirement to assess system-wide data usage

Roles/Skills

The personnel and skill requirements necessary to meet the data definition analysis task objectives are identified below.

· Data Definition Analyst

- Knowledge of how to assess cross system data definitions including producing and interpreting tool

· Current Systems Expert

- Knowledge of existing application programming environment, data organization, usage, standards and conventions

· Metric Analyst

- Ability to assess & record data definition metrics

· Systems Programmer

- Ability and security access to system libraries

· Repository Administrator

- Ability load and manage open system repository within scope of this project

Input Requirements

The system components and related inputs required to initiate and complete the data definition analysis task are listed below.

· Program, Copy (or Include), JCL and on-line table definition source code

· Environmental analysis component cross reference results in report, repository or other accessible format

· Application data dictionaries (required if it is vehicle for accessing Copy/Include members)

· Application data stores

· Completed environmental analysis, Technical Assessment Form 003A

· Technical Assessment Form 003C and maintenance/redevelopment Rating Summary Form 008A

Optional: If LTM being used for this project.

· Populated repository model, using LTM (see technology section of this task), during environmental analysis

Tool/Technology Support

Technologies supporting the data definition analysis task include data definition analyzer, date impact analyzer, physical data analyzer, open systems repository, spreadsheet and word processing tools. These tools are used to represent information as required by this task.

Data definition analyzer

It is nearly impossible to perform large scale data definition analysis without automated tools. In an IBM MVS or similar mainframe environment, comparing record layouts across numerous systems and program libraries is very time consuming. The main role this tool plays within this task is to rapidly and automatically analyze logical data descriptions to provide the user with information about their level of redundancy and quality.

Tool attributes include physical definition cross reference ability, size and data structure comparison capability, control transfer summary analysis, in-context usage analysis, redundancy analysis and selective name usage analysis.

Simple data name tracing tools, that lack sophisticated cross systems tracking facilities, are inadequate and inappropriate for purposes of this analysis.

Date impact analyzer

It is nearly impossible to perform large scale date impact analysis without automated tools. In an IBM MVS or similar mainframe environment, identifying all potential date related records/fields is very time consuming. The main role this tool plays within this task is to rapidly and automatically identify how dates are used by programs and to provide a complete cross-reference of those components affected by a change - including source code, copybooks, files, JCL, CICS tables, etc.

Requirements include identifying date usage in programs (i.e. MOVEs, COMPAREs), element tracing throughout an application, support for COBOL 88 levels and Redefines and identifying indirect date references, synonyms and missing or obsolete components. Date impact analyzer tools should provide a set of default date search criteria that can be modified and expanded depending on company standards.

Physical data analyzer

As with data definition analysis, it is nearly impossible to perform data integrity analysis without an automated tool. The volume of data in even the smallest organization would preclude any more than a cursory manual effort. Access to a physical data analyzer is highly recommended.

Within this task, the role of the physical data analyzer is to automatically scan selected data and produce a sampling of defective data. There are three common methods for determining defective data: comparison of the data to user-defined rules, comparison of the data to system definitions (either hard-coded in the source or defined in the dictionary) and analysis of the data alone to discover inherent rules and dependencies.

Some tools review all the data in the selected data stores, while others sample the data in some pre-defined or random pattern. Some physical data analyzers can also produce management reports on the quality of the data.

Open systems repository

A repository provides an important, yet optional, capability to link data definitions to other definitions and to physical objects using a formal model. Use of a repository in this task requires that a baseline meta-data was established during enterprise-wide assessment.

For large systems or cross functional expansion efforts, this repository model can be maintained as a sophisticated mechanism for managing project efforts. Requirements include the ability to reflect system components as objects within the repository model and populate that model from a legacy environment.

A recommended legacy transition meta-model (LTM) that supports this analysis is shown in the Appendix section of the Comsys-TIM product. A secondary, and optional, requirement involves accepting an automated load format based on tools that parse and analyze legacy environments.

Spreadsheet

Spreadsheet tools offer a convenient format for recording much of the information gathered throughout this stage. Referenced Comsys-TIM Forms have been pre-loaded into certain Spreadsheet tools (see the step level tool guidelines for specific tools) to facilitate data entry and analysis. While highly desirable, Spreadsheets are not essential to this task.

Word processor

This is required to record analysis narrative and, in the absence of a Spreadsheet tool, metric results.

Task Steps

The data definition analysis task is comprised of the following task steps:

Perform System-Wide Data Definition Analysis
Perform Distributed Systems Data Definition Analysis
Perform Data Definition Complexity Analysis
Perform Homonym Analysis
Perform Field Size Expansion Analysis
Perform Distributed Field Expansion Analysis
Identify Procedural Date Utilization
Perform Physical Data Analysis
Assign Data Definition Metric Counts
Calculate Data Definition Metric Scores
Produce Data Definition Narrative Summary
Review Data Definition Analysis Results
Assess Multi-System Data Definition Usage