A more pragmatic approach is to combine a classical plan generation process with the plan's execution. The execution component continuously reports the planning system about the progress being made, including action failures and unexpected world changes. The transferred information enables the planning system to adapt (partly) committed plans or to generate new (sub-)plans. All of the presented approaches agree, that this is neccessary for the flexibility and robustness of the overall system. Another benefit is the way domain models are created: Failures do not have to be foreseen explicitly.
Surprisingly, this technique was not adopted in many planning and execution systems. Only few up-to-date planners are capable of continuous interaction with their execution clients.
As a rule, the control of autonomous hard- or software agents is done today by reactive systems. We find architectures with symbolic represenations like reactive action packages in RAP[9] and that used in the procedural reasoning system PRS[16]. Also very popular in Robotics is SAPHIRA[29] with its fuzzy controllers and derivates of the subsumption architecture[5], which are characterized by layers of competence.
All these approaches have in common the notion of behaviours of the execution components. The term behaviours addresses fixed pre-compiled patterns of actions which are selected depending on the actual situation of the executor. As a first step towards the integration of a deliberative planning process into such a plan execution environment, there are systems where the executors send ``off-line'' queries to the built-in planners, e.g. [10], [15] or [30].
An alternative system design puts the execution control into the planner itself. Such systems typically generate plans to perform the entire task and monitor their realisation afterwards. One of the first planners following this approach was PLANEX[8]. It uses the STRIPS planning engine and controls plan execution by monitoring state changes at execution time. In case of an execution failure, it continues to execute independent, unaffected parts of the plan. If no such independent sub-plan is left, a new (complete) plan is generated. A more recent representative of this technique is the execution agent of O-Plan[6].
Many researchers argued that repair plans have to be provided to enhance performance. In the case of execution failures, they allow to complete the original task, starting from the current failed state. Unfortunately, planners in most frameworks typically get unspecific feedback information like ``action x has failed''. This means that computational expensive ``from-scratch'' re-planning has to be done. To make re-planning more tractable, some systems provide their execution layer with reactive plan repair capabilities, e.g. [31].
In [20] planning problems are regarded as specifications for complex actions. The generation of plans corresponds to their execution in form of logic programs.
IPEM [1] (Integrated Planning, Execution and Monitoring) was the first system to integrate non-linear planning and plan execution. Partial plans are completed stepwise using classical non-linear techniques. After every such step a sequence of executable actions is determined. Their execution leads to a new current state which is the basis for the next completion step in the planning sub-system.
The Continuous Planning and Execution Framework CPEF [21] is a system with emphasis on the interaction of planning and execution in an asynchronously working architecture. It relies on the planning system SIPE-2 [33] and PRS [16] for execution management. In this framework plans can be generated to arbitrary levels of refinement and then be manipulated at runtime by the executor component. Manipulating means translating into sequences of executable actions or repairing of flaws.
The SAGE-system [17] tightly integrates plan execution into the planning process. This is done by extending the non-linear planner UCPOP [26] by two new types of flaws: un-executed and executing. The latter is introduced to express the fact that actions are not instantaneous. An action cannot be executed until all of its preconditions have been satisfied and achieved by executing preceeding actions. After an action is marked executable, SAGE delays its execution as long as possible in order to avoid committing prematurely to a partially constructed plan. The planner runs continously and returns results as soon as they are obtained.
SAGE provides an advanced re-planning capability. It introduces user-defined and domain-specific failure handlers which remove a failed portion of the plan and update the domain model accordingly. This avoids repeating failures during re-planning and re-executing.
The reactive planning system BURTON [34] is developed for autonomous spacecraft control. It uses compiled temporal logical descriptions of state transition models of the various spacecraft components and generates from them reliably correct instruction sequences. It is the first system to use consequently symbolic representation of the application domain down to the lowest level of plan execution.
The New Millennium Remote Agent NMRA [25], the predecessor of Burton, is the first AI planning system to be used in an autonomous spacecraft . The system controls Deep Space One, which has been launched in October 1998 to explore the asteroid 1992KD. NMRA does not have the elaborated modelling approach fo BURTON yet, but although some modules of the probe are controlled exclusively by an on-board execution module, the high-level control is the planning system itself and has been auccessfully tested several times during its mission.
It is of particular importance for robot planning systems to keep knowledge about the environment consistent and up to date. In this context abstract and artificial looking cost-models known from agent-literature get very concrete and obviously mission-critical. The costs for obtaining information are directly recognisable and understandable: It may take a robot many actions to perceive knowledge about its environment. Furthermore, some of the sensing actions may be irreversible. An example for information gathering is moving to a distant location and then using an optic sensor to check an object's presence. But the reader may also think of performing some chemical experiment to determine the concentration of a volatile toxin.
Two basic methodologies will be discussed in the following two sections: we will refer to them as Planning for Sensing and Sensing for Planning.
.
The semantics of action nodes and conditional branches are typically that of possible worlds in an epistemic modal logic. Unknown domain variables induce several successor worlds, one for each possible value. Sensory actions allow to separate the successor worlds, i.e. in terms of the logic to ``know'' that an expression has a given value.
An example: The planner inserts a sensing action for an unknown boolean-valued information in the current partial plan. This creates two conditional branches, representing a partitioning of the successor worlds: One in which the value of the fact is known to be true, and one in which it is known to be false. Execution of this sensing action enables the agent to know at execution time in which partition of the possible worlds it is and to pursue the correct plan path.
The logic based planner presented in [12] can be viewed as the implementation of the methodology described above. Actions are modelled in an epistemic description logic framework [11]. It distinguishes between moving actions and sensing actions. The moving actions are formalised as in classical planning by domain constraints and precondition, effect and frame axioms. The sensing actions are using additional axioms involving epistemic operators. Therefore, the formalism can express that after performing a sensing action an agent knows whether some property holds or not, as well as that the sensing itself has side effects.
Related to this point of view is the one taken up in SAGE [17]. It is based on UCPOP [26] and serves as a query planner for the SIMS information mediator [2], which aims at providing flexible and efficient access to large numbers of information sources. The domain is modelled in a KL-ONE type language to determine the relevant sources for a given query, i.e. the contents and characteristic features of a database or knowledge source are represented in a description logic. SAGE incorporates a mechanism called run-time variables [1]. These place holders are located in effects of operators to collect values returned by their actions at execution time. Once the values are bound to such variables by successfully executed sensing actions they can be used by the plan execution component in the following plan steps. SAGE delays working on any open condition that involves such run-time variables, and waits for sensed information to update its domain model.
A very important feature of SAGE is its ability to introduce ``unforced'' additional sub-goals to queries. Thus the system is actively seeking for new information [18], in order to adapt its formal cost model for information sources. This mechanism enables SAGE to compare query plans and to prune the sets of databases relevant to subsequent sub-goals more effectively.
PUCCINI [13] is another partial order planner that performs sensing. PUCCINI serves like its well known predecessor XII as a planning engine for an Internet soft-bot which has correct but massively incomplete information about its environment. The algorithms are based on the idea that a precondition is true either because it is observed to be true or it has been made true by an action. The authors introduce observational links that do not induce additional ordering constraints: this enables preconditions to be supported by prior observation or later verification.
The action representation language SADL [14] is used, a derivate of ADL [24]. It supports universally quantified information goals and universally quantified, conditional, and observational effects. The effects are world state changes and world state reports, the latter assign values to run-time variables (see above). The use of a three-valued logic allows to model uncertainty explicitly, which is again described in terms of a possible world semantics.
The authors introduce three annotations for literals in preconditions and goals. One indicates classical goals with the intention to achieve them ``by whatever means possible''. If they contain run-time variables the agent has to learn the variable's truth value . The second annotation is used to describe so-called maintenance goals, i.e. fluents that must not be changed. The last annotation indicates to sense a fluent at the time the goal is introduced in the plan. Together with the first annotation this can formalise tidiness goals like ``restore the initial value''. Any un-annotated literal is marked to depend only on the state of the world and not on the agent's knowledge.
An important aspect of this work is the identification of a special, but large class of problem domains, wherein the presented formalism is regarded tractable, because actions can be encoded without knowledge preconditions: the knowledge-free Markov domains. In these domains the actions' effects only depend on the state of the world at the time of plan execution. Therefore, all knowledge sub-goals will be of the same form: The agent needs to know the value of a fluent exactly at the time the action is to be executed, and it does not matter if the agent affects the fluent by side effects of sensing.
CASSANDRA [28] is also based on the UCPOP [26] algorithm and is able to handle uncertainty about actions' outcomes. As in classical conditional planning, secondary preconditions represent context-dependent effects of the actions. Together with ``unknown'' preconditions this models uncertain effects. It introduces condition-action rules called decision steps and information gathering steps. The decision steps make the agent's choices explicit which conditional branches in the plan it should follow. The decisions are based on the knowledge provided by the execution of information gathering steps. Therefore, decision preconditions are treated as knowledge goals and are added to plans each time new sources of uncertainty are encountered. The rule base is built up incrementally during the planning process from those uncertainties that are used to establish preconditions in the given conditional branches.
CASSANDRA's labelling system for uncertainties in a plan is similar to that of PUCCINI (see above). Positive labels are propagated to denote plan elements that contribute to goal achievement in the current conditional branch and negative ones that for those which prevent goal achievement. Unlabelled elements mark ``context neutral'' actions which may or may not be executed on this path. Threat resolution is done in a classical manner by promotion, demotion, separation via non-codesignation, and preservation, i.e. generation of a sub-goal to disable the threatening effect by placing it in a separate conditional branch (conditioning in [27]). In contrast to most other conditional planners, CASSANDRA is able to reunify the proper conditional branches in partial plans.
Sensory Graph-Plan SGP [32] is an extension of the well-known GRAPH-PLAN system [4]. It handles planning problems with uncertain initial states and with actions that combine causal and sensory effects. The underlying semantics is again that of possible worlds which the system tries to make distinguishable for an plan-executing component. To this end, specific epistemic formulas are used to express that an agent knows in which world it currently is.
Each action has the usual causal effects plus zero or more observational effects denoted by arbitrary logical expressions composed of GRAPH-PLAN propositions. Such effects return one bit of information when executed: True if the logical expression is true in the world immediately prior to execution, and False otherwise. Consequently, action instances are linked to preconditions in two separate conditional contexts. This situation is mirrored in a layered planning graph with each layer representing one possible world.
Another planner that has been extended to handle contingencies and
information acquisition is the probabilistic planner BURIDAN
[19]. The resulting system [7], allows
actions with causal and informational effects. The input is a
probabilistic distribution over the initial states, a goal expression,
a set of probabilistic action descriptions
and a probability threshold. The
system produces a contingent plan that makes the goal expression true
with a probability of at least the threshold. Again, the possible
worlds produced by observations and uncertain action effects are
divided in equivalence classes by a context-based labelling mechanism.
Each action in the plan is annotated with a context label which is
generated by observations and inherited by previous steps. Labels
determine under which circumstances an action should be performed.
The methodologies used in this system differ in some way from conventional techniques. In particular, all operations on the plan aim at increasing the probability for goal satisfaction. Threat resolution in this context is done not only by the classical promotion and demotion mechanisms: Confrontation adopts the triggers of ``benign'' consequences as goals which has the effect of decreasing the probability of the threatening consequence. Conditioning is a technique from classical conditional planning [27]. It ensures the threatening step never to be executed in the same execution trace as the producer or consumer of the threatened link. This technique is used within branches that connect information producing actions to subsequent actions, indicating which observation labels of the first permit execution of the second: The system can separate the context of the threatening step from the context of either the link's producer or consumer.
In general, search space for conventional conditional planning explodes for relatively small ranges of the contingencies. There are many examples, we give a simple one here: The robot has to get the right key for a door identifiable by an in advance unknown 5-digit number. Even a few doors make this problem combinatorically intractable. A pragmatic approach in this situation would abstract from the key's identification and solve the contingency by ``looking at'' the door lock at execution time. Then it could plan how to get the key - of course assumed, that this can all be done without getting stuck in dead-ends.
But there is made another assumption by all systems that is not trivial at all and very hard to relax: The sensory input is given as logical predicates. The question how to build symbolic information out of sensor readings is a wide field of research but out of the scope of this document.
In [3] the authors use constraint satisfaction techniques to determine the value of preconditions in a UCPOP-like planner [26]. The CSPs are formulated to ensure consistency of the partially ordered plan-sequences during the planning process.
The system incorporates interactive constraints for acquiring unknown domain values at run-time. Such binary constraints are treated as follows: If no variable is associated with a domain, then the constraint is suspended. If both are already associated, then the constraint is propagated as in a classical CSP. In case of only one variable being ``unknown'' knowledge acquisition is performed which leads either to some characterisation of the unknown variable's domain or to a failure. Finally, both variables can be instantiated and the boolean value of the constraint can be calculated.
BUMP [22] is an agenda-based planning system that follows a more classical planning paradigm. Sensing actions are represented in the same way as non-sensory processes. The preconditions of such sensor operators can be used for set-up actions. Following a STRIPS-style representation, Add-lists contain at least the sensed information and Delete-lists refer to additional side-effects of the sensors.
BUMP uses special dummy constants for values that will be acquired by sensor readings. Any subsequently processed goals that refer to one of these constants will be deferred until the executor has obtained the reading. Execution begins as soon as all goals in the partial plan have been either solved or deferred. When selecting the actions for execution the controller prefers sensor processes over other parallel plan steps. Once a requested sensor reading is obtained, BUMP is restarted immediately with the new information. The new plan is returned, and the process proceeds until BUMP has found a complete plan.
In this context the authors discuss the problems in finding a suitable and efficient sensing strategy: The planner has to select the appropriate goals for deferal at the appropriate time. Therefore, they present an execution cost-model for plans with sensing actions and its empirical evaluation [23].
This document was generated using the LaTeX2HTML translator Version 99.1 release (March 30, 1999)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 2 -dir RP-Roadmap -local_icons -no_footnode RP-roadmap.tex
The translation was initiated by Bernd Schattenberg on 2000-07-18