Please use this identifier to cite or link to this item: http://hdl.handle.net/10889/9244
Title: Image analysis of HDL molecules for risk estimation of coronary heart disease and decision support
Authors: Kraus, Benedikt
Keywords: High Density Lipoproteins (HDL)
Decisions support
Coronary heart disease
Image analysis
Risk estimation
Statistical learning
Machine learning
Keywords (translated): Στεφανιαία καρδιακή νόσος
Υποβοήθηση διαγνωστικής
Καρδιακές παθήσεις
Ανάλυση εικόνας
Εκτίμηση ρίσκου
Στατιστική μάθηση
Abstract: A number of learning schemes has been applied and the results recorded. Counting all the different parameter sets as different learners a total of over 2800 learners was applied to each of the data sets. For the data set bayesw 79 learners had an accuracy over 80%. Most of them where of the SVM family (POLY: 35, PUK: 38, RBF: 1) whilst five where KNN classifiers. For the data set matchedw 248 learners yield accuracies over 80%. They belong to the families SVM (POLY 75 PUK: 75, RBF: 4), decision trees (J48: 7), rules (JRIP : 71, PART: 8) and KNN (8). For the data set bayes a total of 23 learners yield an efficiency over 90%. All of them belong to category of kernel machines and specifically to the polynomial kernel or to the Pearson universal function kernel. For the data set matched 19 learners yield an efficiency over 90% and here additionally to the PUK and POLY schemes the JRIP algorithm had a high accuracy. If one considers the efficiencies over 80% there are 200 learners of all except three types (see table 6) for the matched set and 109 for the bayes (all kernel machines and k nearest neighbours). This goes to show that there is room for improvement in the object detection for the HDL particles, to improve classification accuracy for CHD risk. Another reason why improvement in the HDL particle detection is crucial is the inclusion of the feature n_hoo (number of human observer objects) in many subsets (see tables 13 and 12 in the appendix). The inclusion of this feature is to be seen as problematic as it would generally not be available in an automated application for decision support and thus one cannot rely on it. However by improving the efficiency of HDL detection one can approximate the number in the feature n_ado and thus substitute it. Furthermore the classification of objects in the images was done using Matlab as the weka data mining suite proofed to restrictive for further processing of the image after classification. On the other hand Matlab does not have the versatility of weka when it comes to machine learning and thus it can reasonably be assumed that the classification method used for objects in the binary images is not optimal. This might be remedied by using a more integrated approach for both image processing and analysis and machine learning. So far classification of CHD risk was undertaken using only the EM image of the HDL particles. No clinical information like BMI, smoking, age, sex, etc. was used. Using clinical information might improve the results of classification. It should also be mentioned that the domains of high efficiency for the different learners have not yet been examined. This means that if there should be diversity among a sufficiently large number of learners a combination of them is to be considered to improve classification efficiency and make it more robust. The best results for each learner respectively for the different data sets are presented in tables 5,6,7 and 8. A summary of frequently appearing features in the optimal subsets as determined by univariate best first search for the criterion of classification accuracy as determined by tenfold cross validation is given in tables 9 and 10. That the frequencies are not quite congruent for is to be expected as the datasets are quite different due to the different methods of selecting the HDL particles, which are the basis for the following classification. It is worth noting that for the matched dataset in which it can be expected that the number of HDL particles is reflected most accurately, features that depend on that number (number_hoo, number_ado and hdl_concentration) seem to play a minor role in comparison with other features that might depend on a HDL quality like av_Eccentricity, std_unfiltered_min_intensity or std_Extent. If one was to design a automated system for decision support for similar images using these results and wanted to pinpoint the ’best’ classifier it would be the SVM classifier utilizing the Pearson Universal Kernel that is listed in 5. It has the highest classification accuracy and its feature subset 12g is with seven features just small enough to provide a Sample Feature Ratio SFR = 22=7 greater than three. However it has to be kept in mind that the chosen filtering method of the Laplacian of Gaussian filter with the specific parameters has become part of the dataset used for training of this classifier. And thus using one specific classifier with one specific dataset presupposes using the filtering method used to train the classifier. The methods developed in this study are far from being efficient or elegant however judging by the results it seems feasible to utilize EM images of HDL particles for risk estimation of coronary heart disease and further study seems to be justified.
Abstract (translated): Αυτόματη ανάλυση εικόνων ηλεκτρονικής μικροσκοπίας από σωματίδια HDL σε δείγματα αίματος νέων ανθρώπων που επιβίοσαν καρδιακό έμφραγμα. Η ανάλυση αυτή επιτρέπει την δημιουργία μεταβλητών για ταξινόμηση καρδιακού ρίσκου, η οποία πραγματοποιείται.
Appears in Collections:Τμήμα Ιατρικής (ΜΔΕ)

Files in This Item:
File Description SizeFormat 
Kraus(med).pdf5.36 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.