Accessibility navigation

Multiprobabilistic prediction in early medical diagnoses

Nouretdinov, I., Devetyarov, D., Vovk, V., Burford, B., Camuzeaux, S., Gentry-Maharaj, A., Tiss, A., Smith, C., Luo, Z., Chervonenkis, A., Hallett, R., Waterfield, M., Cramer, R., Timms, J. F., Jacobs, I., Menon, U. and Gammerman, A. (2015) Multiprobabilistic prediction in early medical diagnoses. Annals of Mathematics and Artificial Intelligence, 74 (1-2). pp. 203-222. ISSN 1573-7470

Full text not archived in this repository.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

To link to this item DOI: 10.1007/s10472-013-9367-5


This paper describes the methodology of providing multiprobability predictions for proteomic mass spectrometry data. The methodology is based on a newly developed machine learning framework called Venn machines. Is allows to output a valid probability interval. The methodology is designed for mass spectrometry data. For demonstrative purposes, we applied this methodology to MALDI-TOF data sets in order to predict the diagnosis of heart disease and early diagnoses of ovarian cancer and breast cancer. The experiments showed that probability intervals are narrow, that is, the output of the multiprobability predictor is similar to a single probability distribution. In addition, probability intervals produced for heart disease and ovarian cancer data were more accurate than the output of corresponding probability predictor. When Venn machines were forced to make point predictions, the accuracy of such predictions is for the most data better than the accuracy of the underlying algorithm that outputs single probability distribution of a label. Application of this methodology to MALDI-TOF data sets empirically demonstrates the validity. The accuracy of the proposed method on ovarian cancer data rises from 66.7 % 11 months in advance of the moment of diagnosis to up to 90.2 % at the moment of diagnosis. The same approach has been applied to heart disease data without time dependency, although the achieved accuracy was not as high (up to 69.9 %). The methodology allowed us to confirm mass spectrometry peaks previously identified as carrying statistically significant information for discrimination between controls and cases.

Item Type:Article
Divisions:Interdisciplinary centres and themes > Chemical Analysis Facility (CAF)
Life Sciences > School of Chemistry, Food and Pharmacy > Department of Chemistry
ID Code:58185

University Staff: Request a correction | Centaur Editors: Update this record

Page navigation