Overview:
The program CoaCC.c simulates a case-control study using a 
coalescent framework. It assumes a haploid sample of cases 
and a second haploid sample of controls. Of these two samples 
the genealogy is generated, dependent on the user-specified population history.
From this genealogy a distribution of marker-haplotypes 
is generated by allowing for marker-mutation and recombinations 
between marker and gene as well as between markers. The difference 
between the allelic distributions in the case and the control-sample 
(LD) is calculated as d-value.  
The simulation is repeated several times to get the distribution 
of d-values, as a function of  the simulation conditions.

The marker map can be configured, so that it simulates both flanks 
of the disease locus. It is possible to place  up to 5 markers at each 
side of the disease locus. Those markers are numbered

   5 - 4 - 3 - 2 - 1 - 0 - 1 - 2 - 3 - 4 - 5
Flank 2 	disease locus		Flank 1

As long as option -L is not set, the alleles of each marker are set to have 
identical frequencies and all markers are in mutual linkage equlibrium.
		
The simulation is described in detail in S.Zoellner and A. Von Haeseler,
"A coalescent approach to study linkage disequilibrium between SNPs". 
It appeared in Am.J.Hum.Genet., 66:615-628,2000. Please cite this paper 
if You are using the program in a publication.
 
Technical detail:
This C-program has be written by Sebastian Zoellner
(zoellner@eva.mpg.de) for a LINUX system, distribution Red
Hat 6.0. If You have any questions, contact me.

It can be compiled using a cc CoaCC.c -lm -o program.name
command.

Application:

usage is program.name [option 1] [option2] etc.

example: program.name -pc -e0.001 -f2

If options are invoked, that require specification of more than one 
parameter, these are requested after the start of the  program.

options:
-a		All possible two-marker haplotypes are considered 
		as markers and the LD is calculated for each.
-b              In each run the marker that shows the highest d-value 
		is stored. These best markers
		are jointly analyzed. Also the position of
		the best marker is stored and its
		distribution shown.
-c		allows the input of a complex
		phenotype-genotype interaction as
		described in our paper.
-d              All haplotypes generated are preserved in the same
		output file as the analysis.
		Using this option may result in a large output file.
-e<distance>    defines the genetic distance (as
		recombination probability) between markers such that 
		the distance between 
		neighboring markers is equal. The distance
		between marker 1 and the disease
		gene (0) is half the defined distances.
		If different distances between  markers is required, 
		use the option -r.
-f<number>      defines whether a marker map on 1 or both(2)
 		flanks of the disease locus is simulated  
-h/-H           shows some help 
-l<number>     	sets the number of runs,i.e. the number of
		times the simulation is repeated. All repeats 
		are analyzed jointly. Maximal number is 100000.
-L              enables input of haplotype
		frequencies for all possible marker-haplotypes.
		With this option markers in mutual LD can be 
		simulated.   
-M<probability> sets mutation probability of all markers.
		If no value is given, each mutation 
		probability can be defined separately at 
		the start of the program.
-m<rate>        sets the mutation rate per generation of the disease 
		gene from 
		control to case state. Back-mutations are not possible.
-n<number>      determins the number of markers that are
		simulated on each flank  
-o<name>        set file name of the output file 
-p<model>      	selects population model. <model> can be set
		to c for CONST, r for REC-EXP and o for
		OLD-EXP as specified in our paper. If no model is 
		specified, the
		population parameters are asked for at the
		start of the program.
-r             	allows for the input of variable 
		recombination frequencies  between any two adjacient loci.
-S            	triggers silent mode  
-s              allows for the specification of sample
		sizes different than the default   



Default Values:
     
-c		no complex phenotype-genotype interaction 
-e<distance>    distance =0.0001 (10kb)
-f<number>      number =1
-l<number>     	number = 1000
-m<rate>        rate = 0.000001
-n<number>      number =1 
-o<name>        name = out_file
-p<model>      	no default, model has to be specified.
-s              default is 200 cases and 200 controls 
-M<probability> probability=0.000001

Output:

The name of the output file is defined by the -o option. At the
beginning of the file, all simulation conditions are summarized.
In the results section the average d-values generated by the simulation
for  each marker are shown, as well as its variance
and its standard deviation. The discretized distribution of d-values
 is displayed afterwards.
For flank 1, marker 1 the number of un-informative simulations is
stated as well. These are the simulations that were
repeated because this marker was not polymorphic in the
sample.
If option -a was invoked the same statistics are given
for all possible two-marker haplotypes. 
If option -b is set, the statistics for the best marker are then
given, followed by the distribution of its location.  
