Mathematical Problems in DNA Sequence Analysis and Applications

Olgica Milenkovic
Engineering Center
University of Colorado

Genomic data analysis is currently one of the fastest growing scientific fields and the focal research problem of a large expert group in molecular biology, mathematics, computer science and coding theory. Genomic data analysis does not only provide partial answers to complex biological problems such as the evolution pathway of our specie or genetic disease treatment, but it also introduces some new research topics in applied mathematics, information and coding theory, as well as electrical and biological engineering in general.

The goal of this talk is to introduce several mathematical problems arising from the area of molecular biology, DNA compression and DNA computing, and to show how these can be approached by using well developed techniques borrowed from statistics, combinatorics, information and coding theory. In this context, we will discuss some new ideas and results regarding:

  1. Random Boolean Function Networks (RBFN) for modeling patterns of gene interactions and their connection to codes on graphs; the treatment of this subject involves some concepts from dynamical systems theory and error-control coding theory.
  2. Statistical DNA analysis techniques and DNA distance measures with application to wavelet or grammar-based DNA compression; the treatment of this subject is based on ideas from classical source coding and fractal sequence analysis.
  3. Coding for DNA computing, including the design of DNA codes with constant GC-content and the reverse-complement property, as well as codes for DNA microarrays; the treatment of this subject is based on classical results from combinatorics and algebraic coding theory.

This is a joint work with B. Vasic, University of Arizona, Tucson.