back

# Generalized Venn Diagrams

*Supplementary Information*
## Evolutionary strategy for error function optimization

### Constant mutation rate

For non-adaptive mutation rates the mutation parameter is critical for the convergence
of the optimization process as can be seen in Figure I.
For these simulations the data set
1b-2_reguliert_genes.txt,
1b-2_reguliert_summary.txt
imported with the settings *minimum total genes* >= 100, *p-value* <= 0.05.

**Figure I:**(Fixed mutation rate, `ib-2_reguliert`)
Median, lower and upper quartile of the error values (y-axis) for 100 simulations are shown
for different mutation rates (x-axis). The mutation rate of 0.02 was optimal for the given
data set (46 elements in 7 groups). An upper bound of 500 simulation steps was used.
The stopping criterion was set to 80.

### Self-adapting mutation rate

For the self-adapting strategy each individual has a different mutation rate
for each parameter pair (x and y position of the polygons).
The convergence of the self-adapting evolutionary strategy is only weakly
affected by changes of the meta mutation parameter (see Figure II).
The mean mutation rate of the whole generation converges automatically to an optimum (Figure III and IV).

**Figure II:**(Self adapting mutation rate, `ib-2_reguliert`)
Median, lower and upper quartile of the error values (y-axis) for 100 simulations are shown for
different meta parameters (x-axis). The result is stable against these.

**Figure III: **(Self adapting mutation rate, , `ib-2_reguliert`)
The mean mutation rate of the whole population (100 individuals) is shown for each optimization step (generation).
The mean mutation rate converges to about 0.02 which was the optimum for the fixed strategy (compare Figure I).

**Figure IV: **(Self adapting mutation rate, `ib-2_reguliert`)
Distribution of the mean mutation rate over 100 simulations with the self-adapting strategy.
Each simulation results in one value (mean over all mutation rates of that population in the terminal generation).

### Simulation over different data sets

Additionally the behaviour of both optimization methods was evaluated over different (GoMiner) data sets.
For each data set the median, lower and upper quartiles of the error values
(the error values are shifted to 1 to correct for the logarithmic plot) over 100 simulations
are shown in the diagram below.
The following parameter settings were used:
- Blue bars: Constant mutation rate 0.02
- Red bars: Self-adapting mutation rate with meta parameter tau = 0.5

For every data set the self-adapting algorithm resulted in smaller error values than the
non-adaptive algorithm. The p-values of the one-sided Wilcoxon rank sum test (exact p-values) were all
below 6.28e-5 (Bonferroni correcting a significance level of 0.05 results in an individual significance level of 0.0015 for the given data).

**Figure V**: Lower/upper quartiles and the medians for the fixed mutation rate simulation (blue)
and the self-adapting simulation (red) are shown for 17 different data sets.
The uncorrected p-values are shown (Wicoxon one-sided test) on the top of the simulation pairs.
Zoom in [0,3],
[0,0.02].

## Application examples (gene expression data)

In the first example
genes differentially expressed between a specialized mesenchymal cell type (stellate cells)
and normal skin fibroblasts (minimum total: 100; max p-value: 0.05) were analyzed.
The following picture shows the analysis tree of GoMiner (click on
the graphics to enlarge).

Although a total of
9 GO categories were reported to be significantly overrepresented among the changed genes,
VennMaster analysis revealed that these categories strongly overlap and form a single large cluster of cell
surface/extracellular matrix related categories which can be seen in the following figure (click to enlarge).

Genes overexpressed (A) or underexpressed (B) in pancreatic ductal carcinoma as compared to normal pancreatic
duct cells (minimum total: 50; max p-value: 0.05): VennMaster analysis identifies distinct clusters
of biologically relevant GO categories overrepresented among the overexpressed and underexpressed genes, respectively.

back

Hans A. Kestler
Last modified: Mon Oct 11 15:25:57 CEST 2004