There are three critical success factors that each company needs to identify before moving forward with the issue of data quality:
- Commitment by senior management to the quality of corporate data
- Definition of data quality
- Quality assurance of data
Senior management commitment to maintaining the quality of corporate data can be achieved by instituting a data administration department that oversees data management standards, policies, procedures, and guidelines.
Defining Data Quality
Data quality is defined as being data that is complete, timely, accurate, valid, and consistent. The definition of data quality must describe the degree of quality required for each element loaded into the data warehouse.
The quality assurance of data refers to the verification of the accuracy and correction of the data, if necessary, and this may involve cleansing of existing data. Since no company can rectify all of its unclean data, procedures have to be put in place to ensure data quality at the source.
This task can only be achieved by modifying business processes and designing data quality into the system. In identifying every data item and its usefulness to the last users of this data, data quality requirements can be established. Increasing the quality of data as an after-the-fact task is five to ten times more costly than capturing it correctly at the source.
If companies want to use data warehouse for competitive advantage and reap its benefits, data quality is critical. Only when data quality is recognized as a corporate asset by every member of the organization will the benefits of data warehousing and CRM initiatives be realized.
Unreliable and inaccurate information in the data warehouse causes numerous problems. First and foremost, the confidence of the users in the validity and reliability of this technology will be severely impaired. Furthermore, if the data is used for strategic decision making, unreliable data affects the entire organization and will affect senior management’s view of the data warehousing project.
An excellent example of the damage that can be caused by inaccurate data occurred in the early 1980s when the banks had incorrect risk exposure data on Texas-based businesses. When the oil market slumped those banks that had many Texas accounts encountered significant losses. In other cases, manufacturing firms scaled down their operations and took actions to eliminate excess inventory. Because they had inaccurate market data, they had overestimated the inventory and sold off critical business equipment.
Erroneous data should be captured and corrected of before it enters the warehouse. Capture and correction are handled programmatically in the process of transforming data from one system to the data warehouse. An example might be a field that was in the lower case that needs to be stored in upper case. A final means of handling errors is to replace erroneous data with a default value. If for example, the date February 29 of a non-leap year has defaulted to February 28, there is no loss in data integrity.
Consistency of Data
In analyzing the characteristics of data required for data warehousing and data mining applications, the quality of the data is of extreme importance in a data storage project. The challenge for data managers is to ensure the consistency of data entering the system. In some organizations, data is stored in so-called ‘flat’ files and a variety of relational databases. Also, different systems designed for various functions contain the same terms but with different meanings.
If care is not taken to clean up this terminology during data warehouse construction, this results in misleading management information. The logical consequence of this requirement is that management has to agree on the data definitions for elements in the warehouse. Those who use the data in the short term and the long term must have input into the process and know what the data means.