Data quality and the consequential quality of information are key issues in any organization dealing with big data flows. Taking data quality seriously is a necessity, especially since it is already a part of standard regulatory requirements. Let us illustrate the need for data quality assurance and its integration into your company’s Enterprise Information Flow using a simple case of Solvency II. Solvency II is a European Union directive for insurance companies, primarily made to prevent insolvency, but it’s also focused on data quality management issues. And what about the fact that your company might be based in the United States or any other part of the world? 1) Solvency II concerns some U.S.-based insurance companies too. 2) There is nothing wrong with getting your data quality processes straight. Your company can only benefit from it.
Requirements of Solvency II
Article 82 – Data quality and application of approximations, including case-by-case approaches, for technical provisions Member States shall ensure that insurance and reinsurance undertakings have internal processes and procedures in place to ensure the appropriateness, completeness and accuracy of the data used in the calculation of their technical provisions. Where, in specific circumstances, insurance and reinsurance undertakings have insufficient data of appropriate quality to apply a reliable actuarial method to a set or subset of their insurance and reinsurance obligations, or amounts recoverable from reinsurance contracts and special purpose vehicles, appropriate approximations, including case-by-case approaches, may be used in the calculation of the best estimate.
That does not sound too bad, right? Those requirements are widely specified in relevant documents. Basically, you need to know/document/be able to prove:
This shows the necessity of connecting the employees who create and consume data, and also of educating them to be ready for errors. It also presents the necessity of using identical and exact metadata on both technological and business levels. Data quality has been employed very poorly in many companies, but theoretical research has been done properly. All the way back in 2002 Leo Pippino, Yang Lee and Richard Wang noted sixteen dimensions of data quality issues [pdf]:
There are also other classification procedures of issues with data quality, most notably with logical levels of use. Problems connected with low-quality data usage in different parts of organizations are listed in following table: