FINDSIRE 13.04.99 The program FINDSIRE has emerged from a large scale paternity testing project performed on the rhesus macaques of Cayo Santiago (Nuernberg et al. 1998, Am J Primatol 44, 1-18). The software allows to identify mothers or sires by means of the comparison of a large number of potential parents, typed at single locus DNA markers, with a given infant or parent-infant duo. Potential parents with more than a certain number of mismatches in their DNA profiles are excluded from parenthood whilst those not excluded are reported for further analysis. Information regarding the markers employed is logged in a file called the marker file. This file has the following format: 1. Line; NMARK, a single positive integer equal to the total number of markers tested. Following NMARK lines; the designation of each marker. 5 D2S367 D8S601 D20S206 D6S493 D12S67 All genotype data are logged in a file called the data file. The data file also contains the demographic information necessary to define filters for the exclusion exercise. One line is expected per animal. In each line, the data format is as follows (please note that none of the identifiers mentioned must contain any blanks; leading blanks or blanks following identifiers are, of course, allowed and may be necessary to fill up columns; also remember that all identifiers are case-sensitive): Column 1-5; an identifier of the cohort to which the animal belongs (e.g. the birth cohort). Column 6-10; the identifier of the animal itself. Column 11; the sex of the animal (m, f, u). Column 12-19; the date of birth in the format YYYYMMDD. This information must be included for all animals, otherwise the program will terminate with a DATE ERROR. Column 20-27; the date of death in the format YYYYMMDD, if available, if unavailable, leave blank. Column 28-32; the identifier of the mother, if available. Column 33-37; the identifier of the sire, if available. Column 38; an identifier of the group to which the animal belongs. This information is required for all animals. Use a dummy if no groups are defined. Column 39-42; the identifier(s) of up to four groups for which the respective animal can be a parent, leave blank if not applicable. rest of line; 2*NMARK integers denoting the genotypes of the respective animal at NMARK markers (two zeros are logged for unknown genotypes). 1995 01E m1994122619970117X91 II 0601 0601 170 174 260 268 141 145 128 136 1996 01F m1996032519980127I70 II 0 0 166 182 260 266 145 149 0 0 1992 02A f19920110 G68 I 0 0 166 174 260 266 141 141 136 136 1994 02C m19940117 D92 N60 II 0601 1503 174 178 262 268 141 149 128 136 1995 02D m1994120119970115D98 II 0 0 162 174 260 266 0 0 0 0 1995 02E m1994121019970115N41 II 1801 1802 166 166 268 328 141 145 136 144 1992 08A m19920127 C12 II 0601 0601 170 170 262 268 141 141 128 136 1994 08C f19940217 V60 I37 I 0601 1806 170 170 256 260 137 141 128 128 1995 08D f1994122819970124O96 K85 I 1801 1802 162 170 260 266 133 149 128 136 In order to allow the program to distinguish between confirmed and newly detected parent/offspring mismatches (and to mark them accordingly in the output file), a mismatch file may be invoked. In this file, enter one line per mismatch, comprising the marker designation and the identifiers of infant and parent, separated by at least one blank. D20S476 75D G17 D12S67 75D G17 D14S255 S81 D92 D5S1470 14D O27 For every non-excluded parent, the program provides the probability of not excluding an unrelated animal from the same parenthood, allowing for the genotypes of infant and parent of opposite sex (if avaliable), and considering the markers at which the potential parent has been typed and shows no mismatch. For these calculations to be feasible, the program requires the allele frequencies of all loci included in the marker file. These data are logged in the allele frequency file which has the format given below. The allele frequency file must include all markers and alleles occurring in the data and marker file, but it can be more comprehensive. top line; NMARK2, an integer indicating the number or markers included in the allele frequency file. Repeat the following entries NMARK2 times, once per marker: 1. line; NALL, number of alleles 2. line; marker designation next NALL lines; a positive integer for allele designation and the frequency of that allele 3 4 DQB 601 0.500 1801 0.100 1806 0.150 1802 0.250 4 D2S367 141 0.400 139 0.350 115 0.150 143 0.100 3 D8S601 228 0.350 226 0.250 224 0.400 The execution of FINDSIRE is directed by a master file of the format given below. 1. line; name and location of the data file 2. Line; name and location of the output file. This is the name of the file containing the list of non-excluded parents. A second file will be generated automatically under the same name and location, but with filename extension ".mgp". This MGP ("mismatched genotypes of parent") file contains all mismatched genotypes for potential parents with less than MAXMIS2 mismatches (see below). 3. line; name and location of the marker file 4. line; name and location of the mismatch file (leave a blank line if no such file is required) 5. line; name and location of the allele frequency file 6. line; identifier of the cohort of infants for which parents are to be identified (leave a blank line if parents should be identified for all animals in the database). 7. line; a single character identifying the group of infants for which parents are to be identified. A group identifier has to be entered here. Only animals with the same character occuring in columns 39-42 of the respective data file entry will be considered as potential parents. 8. line; eight integers, separated by blanks, namely FINDS, = 0, if males are to be checked for sirehood, and = 1, if females are to be checked for motherhood. If the parent of the non-contrived sex is known and included in the data file, this information is used to facilitate the exclusion exercise. SUPPR, = 0, if mismatches due to homozygosity of a potential parent should be considered, and = 1, if such mismatches should be ignored MAXMIS, the maximum number of mismatches allowed for a potential parent MAXMIS2, the maximum number of mismatches allowed for a potential parent to be included in the MGP file MININF, the mininum number of loci at which an infant must be typed MINSIR, the minimum number of loci at which a potential parent must be typed MINAGE, the minimum difference in days between the dates of birth of an infant and a potential parent. If dates of birth are not available, simply enter, in the data file, an identical dummy date (e.g. 20000101) for all animals. Setting MINAGE=-1 ensures that all animals of the right sex will be considered as potential parents for all infants. MAXDEATH, the maximum time allowed to have elapsed between the death of a potential parent and the birth of an infant. c:\works\rhes_01.dat c:\works\ex1993i.res c:\works\mark_1.dat c:\works\mima_1.dat c:\works\allf.dat 1993 I 0 1 2 1 5 5 1250 200 The name of the master file is either included in the DOS statement (e.g. C:>findsire mast_01.dat) or is requested by the program during execution. The output file includes a header reporting the parameters used in the analysis. Below this header, each case is reported in the form of the infant, its parent of the non-contrived sex (if available), and all non-excluded potential parents being allocated one line. In each line, the identifier of the animal is followed by the number of markers at which it has been typed. Then, one integer is present per marker where 0 = not typed, 1 = homozygous, 2 = heterozygous. Negative numbers mark mismatching genotypes. Below the infant and its parent of non-contrived sex a line is included marking mismatches between the two as either confirmed ("c") or not confirmed ("!"). Each line for a non-excluded potential parent ends with a real number. This figure is the probability of not excluding an unrelated animal from parenthood, using the markers at which the actual candidate parent did not show a mismatch. The lower this value, the higher the confidence in the parenthood of the animal tested. FINDSIRE version 4/99 (search for sires) Cohort = 1995 Group = I Number of markers tested = 16 (from c:\works\affen\marker.dat) Minimum number of genotypes required for infant = 5 Minimum number of genotypes required for parent = 5 Maximum number of mismatches allowed = 1 Minimum age of parent at infant birth (in days) = 1250 Maximum time since death of parent (in days) = 200 ------------------------------------------------------- 00D 10 2 0 0 1 1 2 2 1 1 1 1 2 0 0 0 0 S54 12 2 0 0 2 1 2 2 -2 2 1 1 2 0 2 0 2 : ! 3 215 A61 9 2 1 2 0 2 0 0 2 2 0 1 1 0 2 0 0 0 5.3405E-04 I78 14 2 1 2 2 2 2 -2 2 2 1 2 2 0 2 0 2 1 1.8922E-05 L70 5 0 2 0 0 1 0 0 0 2 2 -2 0 0 0 0 0 1 8.0828E-02 14D 10 2 0 0 2 1 2 1 2 2 2 2 1 0 0 0 0 O27 11 2 0 0 2 2 1 1 2 2 2 2 -1 0 2 0 0 : c 3 216 S76 16 -1 2 2 1 2 2 2 1 2 2 1 2 2 2 2 2 1 9.1114E-05 O08 5 -2 2 0 0 2 0 0 0 0 2 2 0 0 0 0 0 1 3.1052E-01 L70 5 0 2 0 0 1 0 0 0 -2 2 2 0 0 0 0 0 1 3.1052E-01 In the MGP file, the genotypes of infant and the parent of non-contrived sex are followed by the mismatching genotype of the non-excluded potential parent. An MGP file for the above example would look as follows. 00D S54 . D20S476 132 148 128 148 - I78 128 144 . D5S820 185 185 185 185 - L70 189 197 14D O27 . D12S67 116 124 112 116 - L70 112 128 . DQB 605 1806 601 1806 - O08 601 1802 . DQB 605 1806 601 1806 - S76 601 601 Best wishes MK