MUTCOMP / MUTPROF 27.03.00 I have written two programs for the comparison of mutation profiles. The first one, MUTPROF, is a Monte-Carlo- Markov-Chain implementation of the hypergeometric test for sparse 3-dimensional contingency tables. It is designed in the spirit of Fisher's exact test and allows consideration of different categories of mutation. The format of the input file is as follows: 1.line: 3 integer numbers, seperated by spaces, namely NPOS the number of nucleotide positions (max 200) NPRO the number of profiles (max 4) NCAT the number of categories per profile (max 3). Positions where none of the profiles has a mutation should be excluded since these provide no information to distinguish profiles. Thus, 200 positions should do for most comparisons. If not, profiles have to be split up into smaller chunks. The maximum number of categories is 3 which suffices to encode the actual nucleotide change. lines 2 to NPOS+1. The observed numbers of mutations at each position, seperated by spaces. One line is entered per position. Within a line, enter the categories of the 1. profile first, then the categories of the 2. profile etc. 10 2 3 9 15 18 11 15 12 17 15 10 15 11 7 13 13 4 11 7 3 13 5 7 6 8 10 8 8 6 8 9 12 4 9 13 10 12 15 9 10 12 13 20 11 12 15 13 12 14 9 15 10 7 11 10 4 15 9 9 6 3 9 The other program, MUTCOMP, compares relative mutation rates. The program requires two input files, each containing the nucleotide substitution matrix for a given profile. The format of the input files is as follows. 4 lines, 5 columns. All entries are separated by spaces. The first four columns contain, as integer numbers, the observed frequencies of different substitutions. The 5. column contains the overall frequency of the nucleotide for the respective line. This can be an integer (i.e. absolute number) or a decimal real (i.e. relative frequency). The order T,C,A,G is used for coding and is referred to in the output. In principle, however, the order does not matter. 0 24 17 2 1234 22 0 16 8 1452 21 13 0 9 1322 14 7 3 0 945 The output gives several statistics and the corresponding p values. The statistics in the first column compare the overall mutability of the respective nucleotide between the two profiles. To this end, the observed frequency is divided by the expected frequency (5. column of input), and the four resulting values are normed so that they sum up to unity. The statistic is the difference between the two values determined for each nucleotide in the two profiles. The other three statistics in each line compare the relative rates of each individual substitution event. These are estimated dividing the observed frequency of a cell by the sum taken over the respective line. The statistic is the difference between the individual cell values obtained for each profile. All p values are determined by bootstrapping over the whole mutation sample (5000 simulations). Best wishes MK