The many books that have been published on bioinformatics tend toward either of two extremes: those that feature computational details with a great deal of mathematics, for computational scientists and mathematicians; and those that treat bioinformatics as a giant black box, for biologists. This is the first book using comprehensive numerical illustration of mathematical techniques and computational algorithms used in bioinformatics that converts molecular data into organized biological knowledge.

lysogenic cycle by integrating its DNA to the circular bacterial chromosome and have its genome co-replicated with the host genome. In its integrated form, the viral genome is called a provirus (or prophage). Not all dsDNA viruses have the lysogenic cycle. Phage DNA Phage λ virion Infect other cells Bacterial cell membrane Lysogenic Lytic Phage DNA integrated to the host genome and known as prophage Host DNA Figure 4-1. Schematic illustration of the lytic and lysogenic cycles of the phage

bacterial species would be inefficient in replicate their DNA because few building blocks are available. Below we develop a more formal argument to show that genomic AT% of bacterial species is indicative of cellular AT availability. It is important to know something about the cellular environment inside a bacterial cell, especially from a phage perspective. It would be a fatal mistake if a phage squeezes its AT-rich genome into a bacterial host with few A and T available. The importance of the

PS = 4, update Base 1 2 3 4 A 0 1 1 1 (a) C 1 1 0 1 G 1 0 1 1 T 1 1 1 0 NEG2: GGCC, PS = 2, update (b) POS1: ACGT, PS = 2, no update (c) POS2: GCGC, PS = 2, no update (d) A C G T 0 1 0 1 1 1 -1 1 1 -1 1 1 1 0 1 0 A C G T 0 1 0 1 1 1 -1 1 1 -1 1 1 1 0 1 0 A C G T 0 1 0 1 1 1 -1 1 1 -1 1 1 1 0 1 0 We can proceed with POS1 and POS2 sequences, but both have PS = 2 and, according to the rules of updating W, no change should be made. At this point, no input sequence will lead to

structure is more likely to continue across this site instead of switching to some other secondary structures. We have now covered two of the three major tasks associated with HMM, i.e., train a HMM and reconstruct the sequence of hidden states. We now deal with the last task, i.e., computing the probability of the observed sequence of symbols by the forward algorithm (Rabiner, 1989). 6. Hidden Markov Models 127 3.4 Forward algorithm The forward algorithm is for computing the probability of

278, 279, 230, and 248, respectively. The 4×6 matrix, occupying the last six columns in Table 7-1, will be referred to as C matrix. The C matrix is tabulated from the 29 random motifs whereas the C0 vector is tabulated from nucleotides outside of the motifs. Thus, the sum of the first, second, third and fourth rows should be equal to FA, FC, FG and FT, respectively. Also note that each of the six columns in the C matrix should add up to 29. Table 7-1. Site-specific distribution of nucleotides