Call for Papers
Focus Theme: “Machine Learning and Data Analytics in Pervasive Health”
Guest editors: Nuria Oliver, Oscar Mayora, Michael Marschollek
Deadline: July 28, 2017
Initiates file download

Contact Person

Stefanie Kuballa

Managing Editor

Phone: +49 (0)711 - 2 29 87 88
Fax: +49 (0)711 - 2 29 87 65
send an Email

Archive (2016–2006)

mosaicQA – A General Approach to Facilitate Basic Data Quality Assurance for Epidemiological Research

Journal: Methods of Information in Medicine
Subtitle: A journal stressing, for more than 50 years, the methodology and scientific fundamentals of organizing, representing and analyzing data, information and knowledge in biomedicine and health care
ISSN: 0026-1270
Issue: 2017 (Vol. 56): Methods Open
Pages: e67-e73
Ahead of Print: 2017-04-29

mosaicQA – A General Approach to Facilitate Basic Data Quality Assurance for Epidemiological Research

M. Bialke (1), H. Rau (1), T. Schwaneberg (1), R. Walk (2), T. Bahls (1), W. Hoffmann (1)

(1) Institute for Community Medicine, Section Epidemiology of Health Care and Community Health, University Medicine Greifswald, Greifswald, Germany; (2) Institute for Community Medicine, Section GANI_MED, University Medicine Greifswald, Greifswald, Germany


medical data management, data quality assurance


Background: Epidemiological studies are based on a considerable amount of personal, medical and socio-economic data. To answer research questions with reliable results, epidemiological research projects face the challenge of providing high quality data. Consequently, gathered data has to be reviewed continuously during the data collection period. Objectives: This article describes the development of the mosaicQA-library for non-statistical experts consisting of a set of reusable R functions to provide support for a basic data quality assurance for a wide range of application scenarios in epidemiological research. Methods: To generate valid quality reports for various scenarios and data sets, a general and flexible development approach was needed. As a first step, a set of quality-related questions, targeting quality aspects on a more general level, was identified. The next step included the design of specific R-scripts to produce proper reports for metric and categorical data. For more flexibility, the third development step focussed on the generalization of the developed R-scripts, e.g. extracting characteristics and parameters. As a last step the generic characteristics of the developed R functionalities and generated reports have been evaluated using different metric and categorical datasets. Results: The developed mosaicQA-library generates basic data quality reports for multivariate input data. If needed, more detailed results for single-variable data, including definition of units, variables, descriptions, code lists and categories of qualified missings, can easily be produced. Conclusions: The mosaicQA-library enables researchers to generate reports for various kinds of metric and categorical data without the need for computational or scripting knowledge. At the moment, the library focusses on the data structure quality and supports the assessment of several quality indicators, including frequency, distribution and plausibility of research variables as well as the occurrence of missing and extreme values. To simplify the installation process, mosaicQA has been released as an official R-package.

You may also be interested in...


R. Haux 1, P. Knaup 2, F. Leiner3

Methods Inf Med 2007 46 1: 74-79


Original Article

M. Bialke (1), T. Bahls (1), C. Havemann (1), J. Piegsa (1), K. Weitmann (1), T. Wegner (2), W. Hoffmann (1)

Methods Inf Med 2015 54 4: 364-371