Data Quality Control
The data pollution in the Critical data sources is a critical issue that may cause serious problems .Data quality problems may lead to erroneous conclusions in the flow of the system , decision support processes and furthermore, may cause to make decision for elimination of the data. Data Quality Control processes is a crucial issue in terms of the correct operation and to increase productivity .
Data Pollution: This expresses the decrease of the reliability level of the data due to various anomalies . The anomalies causing the data issues as follows:
Anomalies concerning validity
- Key inconsistencies,
- The differences in the ranges defined,
- Data Type Differences,
- Essential fields deficiency,
- Duplication,
- Format differences
- Spelling differences,
- The formation of unnecessary records,
- Cross -Field Validity
Consistency anomalies
- Inconsistency of data from different sources
Triton Data Cleaning Process
- Data Quality Control, and Anomaly Identification,
- Identification and development of the flow concerning cleaning process,
- The implementation of the cleaning process,
- Controls
Methods used in Triton Data Cleaning process,
- Decomposition,
- Record Match,
- De Duplication,
- Transformation
- Validation