PASTA-4-PHT: A Pipeline for Automated Security and Technical Audits for the Personal Health Train

Abstract

Background: With the introduction of data protection regulations, the need for innovative privacy-preserving approaches to process and analyse sensitive data has become apparent. One approach is the Personal Health Train (PHT) that brings analysis code to the data and conducts the data processing at the data premises. However, despite its demonstrated success in various studies, the execution of external code in sensitive environments, such as hospitals, introduces new research challenges because the interactions of the code with sensitive data are often incomprehensible and lack transparency. Such interactions introduce potential threats to data integrity and expand the attack surface, exposing the system to risks including code injection, supply chain software vulnerabilities, and unauthorised runtime network communication. Results: To address this issue, this work discusses a Personal Health Train (PHT)-aligned security and audit pipeline inspired by DevSecOps principles, called Pipeline for Automated Security and Technical Audits for the Personal Health Train (PASTA-4-PHT). The automated pipeline incorporates multiple phases that detect vulnerabilities, such as unintentionally or intentionally introduced weaknesses in the code of the PHT, before its deployment. To thoroughly study its versatility, we evaluate PASTA-4-PHT in two ways. First, we deliberately introduce vulnerabilities into a PHT. Second, we apply our pipeline to five real-world PHTs, which have been utilised in real-world studies, to audit them for potential vulnerabilities. The controlled evaluation confirmed detection of all injected vulnerability types showing that the audit pipeline is effective. In the real-world audit of five Trains, the image analysis phase identified up to 35 critical vulnerabilities per Train, indicating that container images pose the most significant threat vector according to our evaluation. Conclusions: Our evaluation demonstrates that our designed pipeline successfully identifies potential vulnerabilities and can be applied to real-world studies. In compliance with the requirements of the General Data Protection Regulation (GDPR) for data management, documentation, and protection, our automated approach supports researchers using the Personal Health Train (PHT) in their data-intensive work and reduces manual overhead. PASTA-4-PHT can be used as a decision-making tool to assess and document potential vulnerabilities in code for data processing. The associated artefacts of this article, along with the pipeline configuration, are available online for adaptation and reuse. Ultimately, our work contributes to an increased security and overall transparency of data processing activities within the PHT framework.

Publication
BMC Medical Informatics and Decision Making
Sascha Welten
Sascha Welten
Placeholder Avatar
Karl Kindermann
Placeholder Avatar
Ahment Polat
Placeholder Avatar
Martin Görz
Placeholder Avatar
Maximilian Jugl
Placeholder Avatar
Laurenz Neumann
Placeholder Avatar
Alexander Neumann
Dr. rer. nat. Jan Pennekamp
Dr. rer. nat. Jan Pennekamp
Postdoctoral Researcher / Staff Scientist
Stefan Decker
Stefan Decker