A two-stage scheme for visual object recognition based on selective attention

Günther Palm, Ulrich Kaufmann, Rebecca Fay

We present a two-stage scheme for visual object recognition: First a window of attention is determined in a picture by means of low resolution colour and shape information. Then high resolution visual features (like edges, corners or T-junctions) are extracted from this window and used in a trained neural network (hierarchically organized RBF network) for object recognition.
This two-stage process refelects some properties of human or monkey vision (eye-movements guided by visual attention, high resolution processing in the fovea, decreasing resolution towards the periphery) and helps to save computational power and perform sophisticated object recognition in real-time.
We are presently applying this scheme in soccer-playing robots (our RoboCup team) and in the MirrorBot project, where a robot has to grasp different kinds of fruit. In these scenarios we can select important features for a top-down guidance of the first (attention) process. We will present the selected windows and the recognition performance for various pictures. This will demonstrate the importance of top-down selection of saliency-features in practical applications.
We want to compare our findings to measurements of human eye-movements in demanding sensory-motor virtual-reality tasks.