Bioinformatics Predecessors [ Definitions |  Predecessors | Myths ]
 

The field that is currently called Bioinformatics (see link for alternative definitions of this research area) emerged recently as a merger of several distinct scientific disciplines (Figure 1):

Structural biology
Statistical genetics
Statistical evolutionary biology
Genomics-oriented computer science
Genomics-oriented statistics
Biochemical kinetics analysis
Medical informatics

Structural biology deals with the physical properties and geometry of the large molecules found in the living cells. Deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and proteins are responsible for the majority of functions associated with perpetuation and sustaining life; thus, these molecules received a lion's share of attention of the research community. The Holy Grail pursued by certain structural-biology researchers is ab initio prediction of three-dimensional structure of a macromolecule from the latter's primary structure -- that is, from nucleic or amino acid sequence data. This problem seems to be somewhat more tractable for RNA and DNA molecules, but is thus far unsolved for proteins.

Statistical genetics deals with probabilistic mathematical models of inheritance of human features. One of the practical goals of this field is providing tools for mapping genes. In this context, mapping means guessing where on a chromosome a gene of interest is situated based on statistical analysis of the co-occurrence of various observed traits in human families. The rationale for the technique is provided by the observation that the probability of a recombination between two genes on the same chromosome is roughly reversely proportional to the distance between those genes.

Recombination is a genetic event that occurs prior to maturation of gametes in the majority of nonbacterial species. In the process of recombination, each pair of homologous chromosomes that came from two parents exchanges regions, such that each oocyte or spermatozoid receives a single set of all genes that are randomly sampled from two parents. The place on the chromosome where the DNA string from the mother joins the DNA string from the father the called the recombination breakpoint. In other words, if two genes sit very close to each other on a chromosome, it is improbable that we will observe a recombination that has a breakpoint between these two genes; too frequent co-occurrence of traits corresponding to linked genes can be registered by statistical analysis. A more general purpose of statistical genetics is understanding mechanisms of inheritance in living creatures.

Statistical evolutionary biology has goals related to those of statistical genetics but on a different timescale. Where statistical genetics deals with mutations, individuals, and populations over a few generations, evolutionary biology deals with accepted mutations (i.e., substitutions or mutations that occur in every individual of a population; a mutation usual occurs only in a portion of a population), species, and changes in genes and chromosomes over millions or billions of generations or years. The goal of the field is to perform mathematical inference regarding the most likely history of rearrangements in the genomes of ancestral species that led to the present-day species.

Genomics-oriented computer science deals with hard problems in analysis of experimental data produced by genomics community. For example, one of the currently popular techniques for deciphering whole genomes is called shotgun sequencing. This barbaric technique starts with random fragmentation of a genome into small (i.e., manageable by current methods) subsequences. The resulting small subsequences (usually hundreds of nucleotides long) are sequenced (i.e., converted into a series of letters "A," "C," "G," and "T" on a computer disk through expensive wet-lab wizardry). The result is a huge collection of small sequences that need to be "assembled" back into a complete genome through a process that bears striking resemblance to assembly of a jigsaw puzzle. Automated solution of such puzzles is one of many problems about which computer scientists permanently resident in genomics worry. In the past few years, computer scientists also became interested in other aspects of biology that require computation (including structural and evolutionary biology, and statistical genetics), and contributed to these areas substantially. Even computer scientists who work exclusively within biology still love to prove theorems that cross their way, but hate to write user-friendly programs.

Genomics-oriented statistics percolates through each of the other fields listed. Probability theory is essential for the majority of applications, and the participation of professional statisticians in Bioinformatics research is invariably valuable.

Biochemical kinetics analysis was considered an esoteric field just a few years ago; interest; bloomed tremendously when it became clear that simulation of complex cellular machinery in silico would be of major importance for drug discovery and theoretical work. The field is now well defined, with its own specialized conferences, journals, and graduate-level university programs.

Medical informatics, the most recent entrant in our list of specialties, deals with computational issues relevant to analysis of patient records, outputs of medical instruments, computer interfaces that facilitate interactions between physicians and patients, patient monitoring and education, decision support, quality control, guideline development and adherence, protocol implementation, outcomes-based care, and many others. Given that the biggest promise of Bioinformatics lies in revolutionizing health care by developing "designer drugs" that are fine-tuned to the needs of individual patients, research on the marriage of medical informatics and bioinformatics has a great future.