Overview
Modules
Report
 
Can Support Vector Machines Extract Predictive Power
from Single Nucleotide Polymorphisms?

 

 

Project Members:
 

Trevor Cohen
Hui Nar Quek
Chani Weinreb

 
  Project Advisors: Dr Christina Leslie
Dr Victoria Haghighi
   

Overview

Many believe that the enumeration of the Single Nucleotide Polymorphisms (SNP) that account for much of human genetic variation will profoundly enhance our ability to understand and treat disease. However, as the bulk of common diseases have complex multi-factorial etiologies it is difficult to quantify the influence of individual SNPs. Much work in this area has attempted to use statistical methodologies to link particular SNPs to specific diseases. Our literature search did not reveal any publications in which machine-learning techniques were applied to SNP analysis. We explore the degree to which Support Vector Machines (SVM) can be used to predict a phenotypic trait using combined SNP data across candidate causative genes. Specifically, we investigate the accuracy of an SVM in classifying cases against controls for high LDL to HDL ratio using only their SNP profiles across genes associated with HDL cholesterol metabolism. Our results demonstrate the limited ability of SVM to classify this data set. The range of our recorded accuracy was from 48.78% – 61.13%.

.