|
Overview
Many believe that the enumeration of the
Single Nucleotide Polymorphisms (SNP) that account for
much of human genetic variation will profoundly enhance
our ability to understand and treat disease. However, as
the bulk of common diseases have complex multi-factorial
etiologies it is difficult to quantify the influence of
individual SNPs. Much work in this area has attempted to
use statistical methodologies to link particular SNPs to
specific diseases. Our literature search did not reveal
any publications in which machine-learning techniques were
applied to SNP analysis. We explore the degree to which
Support Vector Machines (SVM) can be used to predict a
phenotypic trait using combined SNP data across candidate
causative genes. Specifically, we investigate the accuracy
of an SVM in classifying cases against controls for high
LDL to HDL ratio using only their SNP profiles across
genes associated with HDL cholesterol metabolism.
Our results demonstrate the limited ability
of SVM to classify this data set. The range of our
recorded accuracy was from 48.78% – 61.13%.
.
|
|