Digital Epidemiology

Data quality for e-health applications in epidemiological research

Department for Epidemiology, Helmholtz Center for Infection Research (HZI), German Center for Infection Research (DZIF), and Hannover Biomedical Research School (HBRS)

Modern epidemiological research is increasingly working with data from new mobile technical tools or sensor data. This indicates a change in the assessment of the trustworthiness of data sources. Now not only researchers or medical devices collect data, but the user also contributes with patient-reported outcomes or with automatically generated data outside the direct doctor-patient contact. This leads to limitations on the internal validity of such data. It is crucial to clean the data sets systematically and transparently, to apply feature engineering and to evaluate the data quality (DQ) prospectively. Only after that e.g. machine-learning (ML) algorithms should be applied to draw conclusions about a biomedical hypothesis.

Learning Objectives: Within HiGHmed, the HZI is responsible for the development of algorithms to detect outbreaks and infection clusters in German hospitals. Subsequently, the course focuses on three relevant topics:

  • Module I : Evaluation of Data Quality (starting in January 2019)
  • Module II : Methods of Statistical Learning (starting at the end of 2019)
  • Module III : Signal Detection (starting at the end of 2020)

Module I provides insights into the process of exploratory data mining with the application of the respective methods for the evaluation of DQ. Data completeness, data accuracy, and data currency will be explored within variables, observations, and time constructs. The students will combine all assessed DQ conflicts to develop a DQ report. This is a crucial step for feedback to stakeholders to give them the opportunity to improve their DQ. With the newly gained knowledge of data analysis and research communication, Module II explores ML-methods. Students learn methods of statistical learning to make predictions or to “uncover hidden insights” in the data. In Module III knowledge acquired in Modules I and II is applied to outbreak detection. Students will explore surveillance systems, methods to detect outbreaks or perform network analyses within hospitals.

Target Group: The blended course is designed for bachelor’s, master’s or PhD students, with a background in information technology, computer science or life science. MD-students are also invited. Basic knowledge in medical terminology and experience in data analysis is recommended.

Embedding: The physical presence of students in the classroom is required, materials are provided via an online platform. The course is taught in English. It is integrated into the PhD-Program Epidemiology of the HBRS and the HZI.