Generalized Venn Diagrams

Supplementary Information

Evolutionary strategy for error function optimization

Constant mutation rate

For non-adaptive mutation rates the mutation parameter is critical for the convergence of the optimization process as can be seen in Figure I. For these simulations the data set 1b-2_reguliert_genes.txt, 1b-2_reguliert_summary.txt imported with the settings minimum total genes >= 100, p-value <= 0.05.

Figure I:(Fixed mutation rate, ib-2_reguliert) Median, lower and upper quartile of the error values (y-axis) for 100 simulations are shown for different mutation rates (x-axis). The mutation rate of 0.02 was optimal for the given data set (46 elements in 7 groups). An upper bound of 500 simulation steps was used. The stopping criterion was set to 80.

Self-adapting mutation rate

For the self-adapting strategy each individual has a different mutation rate for each parameter pair (x and y position of the polygons). The convergence of the self-adapting evolutionary strategy is only weakly affected by changes of the meta mutation parameter (see Figure II). The mean mutation rate of the whole generation converges automatically to an optimum (Figure III and IV).

Figure II:(Self adapting mutation rate, ib-2_reguliert) Median, lower and upper quartile of the error values (y-axis) for 100 simulations are shown for different meta parameters (x-axis). The result is stable against these.

Figure III: (Self adapting mutation rate, , ib-2_reguliert) The mean mutation rate of the whole population (100 individuals) is shown for each optimization step (generation). The mean mutation rate converges to about 0.02 which was the optimum for the fixed strategy (compare Figure I).

Figure IV: (Self adapting mutation rate, ib-2_reguliert) Distribution of the mean mutation rate over 100 simulations with the self-adapting strategy. Each simulation results in one value (mean over all mutation rates of that population in the terminal generation).

Simulation over different data sets

Additionally the behaviour of both optimization methods was evaluated over different (GoMiner) data sets. For each data set the median, lower and upper quartiles of the error values (the error values are shifted to 1 to correct for the logarithmic plot) over 100 simulations are shown in the diagram below. The following parameter settings were used:
  1. Blue bars: Constant mutation rate 0.02
  2. Red bars: Self-adapting mutation rate with meta parameter tau = 0.5
For every data set the self-adapting algorithm resulted in smaller error values than the non-adaptive algorithm. The p-values of the one-sided Wilcoxon rank sum test (exact p-values) were all below 6.28e-5 (Bonferroni correcting a significance level of 0.05 results in an individual significance level of 0.0015 for the given data).

Figure V: Lower/upper quartiles and the medians for the fixed mutation rate simulation (blue) and the self-adapting simulation (red) are shown for 17 different data sets. The uncorrected p-values are shown (Wicoxon one-sided test) on the top of the simulation pairs. Zoom in [0,3], [0,0.02].

Application examples (gene expression data)

Example 1

In the first example genes differentially expressed between a specialized mesenchymal cell type (stellate cells) and normal skin fibroblasts (minimum total: 100; max p-value: 0.05) were analyzed.

The following picture shows the analysis tree of GoMiner (click on the graphics to enlarge).

GoMiner example 1

Although a total of 9 GO categories were reported to be significantly overrepresented among the changed genes, VennMaster analysis revealed that these categories strongly overlap and form a single large cluster of cell surface/extracellular matrix related categories which can be seen in the following figure (click to enlarge).

VennMaster example 1

Example 2

Genes overexpressed (A) or underexpressed (B) in pancreatic ductal carcinoma as compared to normal pancreatic duct cells (minimum total: 50; max p-value: 0.05): VennMaster analysis identifies distinct clusters of biologically relevant GO categories overrepresented among the overexpressed and underexpressed genes, respectively.


Hans A. Kestler
Last modified: Mon Oct 11 15:25:57 CEST 2004