|
The field that is currently called Bioinformatics (see link for alternative
definitions of this research area) emerged recently as a merger of several distinct
scientific disciplines (Figure 1):
Structural biology
Statistical genetics
Statistical evolutionary biology
Genomics-oriented computer science
Genomics-oriented statistics
Biochemical kinetics analysis
Medical informatics
Structural biology deals with the physical properties and geometry of
the large molecules found in the living cells. Deoxyribonucleic acid (DNA),
ribonucleic acid (RNA), and proteins are responsible for the majority of functions
associated with perpetuation and sustaining life; thus, these molecules received
a lion's share of attention of the research community. The Holy Grail pursued
by certain structural-biology researchers is ab initio prediction of three-dimensional
structure of a macromolecule from the latter's primary structure -- that is,
from nucleic or amino acid sequence data. This problem seems to be somewhat
more tractable for RNA and DNA molecules, but is thus far unsolved for proteins.
Statistical genetics deals with probabilistic mathematical models of
inheritance of human features. One of the practical goals of this field is providing
tools for mapping genes. In this context, mapping means guessing where on a
chromosome a gene of interest is situated based on statistical analysis of the
co-occurrence of various observed traits in human families. The rationale for
the technique is provided by the observation that the probability of a recombination
between two genes on the same chromosome is roughly reversely proportional to
the distance between those genes.
Recombination is a genetic event that occurs prior to maturation of gametes
in the majority of nonbacterial species. In the process of
recombination, each pair of homologous chromosomes that came from two parents
exchanges regions, such that each oocyte or spermatozoid receives a single set
of all genes that are randomly sampled from two parents. The place on the chromosome
where the DNA string from the mother joins the DNA string from the father the
called the recombination breakpoint. In other words, if two genes sit very close
to each other on a chromosome, it is improbable that we will observe a recombination
that has a breakpoint between these two genes; too frequent co-occurrence of
traits corresponding to linked genes can be registered by statistical analysis.
A more general purpose of statistical genetics is understanding mechanisms of
inheritance in living creatures.
Statistical evolutionary biology has goals related to those of statistical
genetics but on a different timescale. Where statistical genetics deals with
mutations, individuals, and populations over a few generations, evolutionary
biology deals with accepted mutations (i.e., substitutions or mutations that
occur in every individual of a population; a mutation usual occurs only in a
portion of a population), species, and changes in genes and chromosomes over
millions or billions of generations or years. The goal of the field is to perform
mathematical inference regarding the most likely history of rearrangements in
the genomes of ancestral species that led to the present-day species.
Genomics-oriented computer science deals
with hard problems in analysis of experimental data produced by genomics community.
For example, one of the currently popular techniques for deciphering whole genomes
is called shotgun sequencing. This barbaric technique starts with random fragmentation
of a genome into small (i.e., manageable by current methods) subsequences. The
resulting small subsequences (usually hundreds of nucleotides long) are sequenced
(i.e., converted into a series of letters "A," "C," "G,"
and "T" on a computer disk through expensive wet-lab wizardry). The
result is a huge collection of small sequences that need to be "assembled"
back into a complete genome through a process that bears striking resemblance
to assembly of a jigsaw puzzle. Automated solution of such puzzles is one of
many problems about which computer scientists permanently resident in genomics
worry. In the past few years, computer scientists also became interested in
other aspects of biology that require computation (including structural and
evolutionary biology, and statistical genetics), and contributed to these areas
substantially. Even computer scientists who work exclusively within biology
still love to prove theorems that cross their way, but hate to write user-friendly
programs.
Genomics-oriented statistics percolates through each of the other fields
listed. Probability theory is essential for the majority of applications, and
the participation of professional statisticians in Bioinformatics research is
invariably valuable.
Biochemical kinetics analysis was considered an esoteric field just
a few years ago; interest; bloomed tremendously when it became clear that simulation
of complex cellular machinery in silico would be of major importance for drug
discovery and theoretical work. The field is now well defined, with its own
specialized conferences, journals, and graduate-level university programs.
Medical informatics, the most recent entrant in our list of specialties,
deals with computational issues relevant to analysis of patient records, outputs
of medical instruments, computer interfaces that facilitate interactions between
physicians and patients, patient monitoring and education, decision support,
quality control, guideline development and adherence, protocol implementation,
outcomes-based care, and many others. Given that the biggest promise of Bioinformatics
lies in revolutionizing health care by developing "designer drugs"
that are fine-tuned to the needs of individual patients, research on the marriage
of medical informatics and bioinformatics has a great future.
|