Transformer-Based Integrative Patient Representations from Single-Cell RNA Data

Abstract

Single-cell RNA sequencing (scRNA-Seq) is a powerful tool to explore cellular heterogeneity in healthy and diseased states, yet its translation into clinical insights has been limited. To bridge the gap between detailed cellular analysis and broader patient-level representations usable for phenotyping, we introduce a novel transformer-based architecture capable of embedding single-cell data into meaningful patient-level embeddings. This approach utilizes a self-supervised learning phase to construct integrative patient representations, which are then refined using contrastive learning techniques. On a dataset covering 7 million cells across 1223 individuals with diverse disease states, we show that learned embeddings are meaningful representations for a variety of downstream analytical tasks. Here, our approach proves robust against unbalanced datasets and shows indications of learning similarities between related diseases, such as COVID-19 and flu.

Type
Publication
Learning Meaningful Representations of Life Workshop (LMRL '25)
Benedikt von Querfurth
Benedikt von Querfurth
Dr. rer. nat. Jan Pennekamp
Dr. rer. nat. Jan Pennekamp
Postdoctoral Researcher
Placeholder Avatar
Tore Bleckwehl
Rafael Kramann
Rafael Kramann
Klaus Wehrle
Klaus Wehrle
Head of Group
Sikander Hayat
Sikander Hayat