Methods for the integrated classification of ependymomas using computational pathology and omics data
Translated title
Methoden zur integrierten Klassifikation von Ependymomen mittels computergestützter Pathologie und omics Daten
Publication date
2024-10-02
Document type
Dissertation
Author
Advisor
Referee
Neumann, Julia
Granting institution
Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
Exam date
2024-09-30
Organisational unit
Part of the university bibliography
✅
DDC Class
500 Naturwissenschaften
000 Informatik, Information & Wissen, allgemeine Werke
Keyword
Hirtumor
Künstliche Intelligenz
Ependymom
Batch-Effekt
Abstract
With about ten million deceased patients per year, the diagnosis and therapy of cancer represents one of the most important medical challenges to date. In the central nervous system, a rare yet very relevant tumor entity are ependymomas, which affect patients of all age groups and present unique challenges for their diagnosis. In particular, they exhibit heterogeneous histomorphological and molecular characteristics, which are used along with other properties to define 10 ependymoma types. These types are associated with variable prognosis and clinical outcome of patients and their accurate diagnosis is hence crucial for patient-specific treatment decisions. While diagnoses of ependymomas were traditionally based on histomorphological patterns, neuropathologists nowadays manually integrate these patterns with other sources of information, in particular DNA methylation profiles. However, such DNA methylation data was found to be inconsistent with histological assessment for a fraction of cases and is additionally too expensive for worldwide use in routine diagnostics. Thus, the field requires a unified view on molecular and morphological analyses of ependymomas, e.g., via prediction of DNA methylation types from histological images.
Prospectively, further improvements for the diagnoses and treatment of ependymomas may arise from the additional consideration of protein profiles of the tumor. To date, however, measurement biases (batch-effects) and missing values prevent the integration and quantitative comparison of independently acquired proteome profiles and render novel and efficient data integration algorithms and classification algorithms necessary.
In this work, an interpretable method for the prediction of ependymoma DNA methylation types from histological whole-slide images is developed using self-supervised and multiple-instance learning approaches. The approach is characterized on spinal cord ependymomas from the University Medical Center Hamburg-Eppendorf and is found to outperform the diagnoses of experienced neuropathologists. Moreover, the algorithm generalizes to data from other medical facilities with human-grade performance. Further characterization studies demonstrate that the approach can be applied to other common ependymoma types and that it scales to large datasets. Seizing the interpretability of the algorithm, novel, morphological evidence of major DNA methylation types of ependymomas is extracted. In comparison to other studies, the presented approach is the first to use neural networks in order to provide a unified view on the molecular and histomorphological landscape of clinically relevant ependymoma types from multiple anatomical compartments.
With respect to the integration of proteomic datasets, a novel and computationally efficient algorithm for batch-effect correction of incomplete data is presented. In extensive parameter studies it is shown that, in comparison to existing approaches, the new algorithm offers improved tolerance to missing values as well as provides enhanced flexibility with respect to imbalanced data. It is demonstrated, that the method scales to large data integration tasks and can leverage the multi-core architecture of modern computers. The unique suitability of the method for the integration of proteomic and even transcriptomic data is demonstrated and the benefit of dataset integration for (diagnostic) classification algorithms is explored.
Finally, this work investigates how incomplete molecular data (e.g., from proteome analyses) can be used to additionally improve classification performance. To this end, it introduces a novel classification method based on average pairwise correlations, which is found to yield improved classification results compared to other correlation-based approaches and to allow for the combination of the results from the aforementioned, newly developed algorithms into an integrated approach to ependymoma diagnostics. The benefit of this integrated method over the independent consideration of histological images or proteome data is demonstrated.
In summary, this work is the first to present multiple, novel algorithmic approaches for the integrated classification of tumors. In particular, the presented methods allow to solve the unique diagnostic challenges of ependymomas by integration of proteomic and histological data. Prospectively, this work will allow researchers and clinical practitioners to obtain a better, integrated understanding of the histo-molecular characteristics for diseases under consideration and thus to improve their respective diagnoses and therapy.
Prospectively, further improvements for the diagnoses and treatment of ependymomas may arise from the additional consideration of protein profiles of the tumor. To date, however, measurement biases (batch-effects) and missing values prevent the integration and quantitative comparison of independently acquired proteome profiles and render novel and efficient data integration algorithms and classification algorithms necessary.
In this work, an interpretable method for the prediction of ependymoma DNA methylation types from histological whole-slide images is developed using self-supervised and multiple-instance learning approaches. The approach is characterized on spinal cord ependymomas from the University Medical Center Hamburg-Eppendorf and is found to outperform the diagnoses of experienced neuropathologists. Moreover, the algorithm generalizes to data from other medical facilities with human-grade performance. Further characterization studies demonstrate that the approach can be applied to other common ependymoma types and that it scales to large datasets. Seizing the interpretability of the algorithm, novel, morphological evidence of major DNA methylation types of ependymomas is extracted. In comparison to other studies, the presented approach is the first to use neural networks in order to provide a unified view on the molecular and histomorphological landscape of clinically relevant ependymoma types from multiple anatomical compartments.
With respect to the integration of proteomic datasets, a novel and computationally efficient algorithm for batch-effect correction of incomplete data is presented. In extensive parameter studies it is shown that, in comparison to existing approaches, the new algorithm offers improved tolerance to missing values as well as provides enhanced flexibility with respect to imbalanced data. It is demonstrated, that the method scales to large data integration tasks and can leverage the multi-core architecture of modern computers. The unique suitability of the method for the integration of proteomic and even transcriptomic data is demonstrated and the benefit of dataset integration for (diagnostic) classification algorithms is explored.
Finally, this work investigates how incomplete molecular data (e.g., from proteome analyses) can be used to additionally improve classification performance. To this end, it introduces a novel classification method based on average pairwise correlations, which is found to yield improved classification results compared to other correlation-based approaches and to allow for the combination of the results from the aforementioned, newly developed algorithms into an integrated approach to ependymoma diagnostics. The benefit of this integrated method over the independent consideration of histological images or proteome data is demonstrated.
In summary, this work is the first to present multiple, novel algorithmic approaches for the integrated classification of tumors. In particular, the presented methods allow to solve the unique diagnostic challenges of ependymomas by integration of proteomic and histological data. Prospectively, this work will allow researchers and clinical practitioners to obtain a better, integrated understanding of the histo-molecular characteristics for diseases under consideration and thus to improve their respective diagnoses and therapy.
Version
Accepted version
Access right on openHSU
Open access