Sapienza - Università di Roma
Dottorato di Ricerca in Ingegneria Informatica
XXI Ciclo – 2009
Adaptive Process Management
in Highly Dynamic and Pervasive Scenarios
Massimiliano de Leoni
Sapienza - Università di Roma
Dottorato di Ricerca in Ingegneria Informatica
XXI Ciclo - 2009
Massimiliano de Leoni
Adaptive Process Management
in Highly Dynamic and Pervasive Scenarios
Thesis Committee
Reviewers
Prof. Tiziana Catarci
Prof. Giuseppe De Giacomo
Dr. Massimo Mecella
Dr. Alfredo Gabaldon
Prof. Jan Mendling
Author’s address:
Massimiliano de Leoni
Dipartimento di Informatica e Sistemistica Antonio Ruberti
Sapienza Università di Roma
Via Ariosto 25, I-00185 Roma, Italy
e-mail: [email protected]
www: http://www.dis.uniroma1.it/∼deleoni
Ringraziamenti
Ora che questa tesi è completata ed un altro passo della mia vita è stata
fatto, non posso non tornare indietro e ripercorre tutti questi anni da quando,
appena laureato di 1◦ livello, la Prof. Catarci mi propose come assistente alla
didattica al Dr. Mecella. Dissi di sı̀, e da allora inizio questa avventura...
Quindi non posso che dare un sentito ringraziamento alla Prof. Catarci, che
ha permesso che tutto questo avesse inizio e continuasse fino ad oggi. Correva
l’anno 2003 e allora cominciò la collaborazione, sebbene inizialmente solamente
per la didattica.
Un immenso ringraziamento va al Dr. Massimo Mecella. Senza Massimo
non avrei mai potuto scrivere questa tesi e crescere umanamente e professionalmente cosı̀ tanto. Massimo è stato più di quello che il suo ruolo lo avrebbe
portato a fare.. E’ stato anche un amico, e un supporto nei momenti di sconforto durante tutti gli anni da dottorando. Grazie! Grazie! Grazie!
Molti ringraziamenti vanno anche a Prof. De Giacomo, per il suo scientifico supporto e per il tempo che mi ha dedicato; egli è stato un prezioso mentore di molti degli aspetti toccati in questa tesi. Desidero inoltre ringraziare
Dr. Sardina, una persona umanamente veramente squisita, che è stato molto
disponibile e pronto ad aiutarmi quando c’era da realizzare concretamente le
tecniche sviluppate in questa tesi. Senza di lui, SmartPM non sarebbe stato
mai realizzato.
Inoltre, non posso non esprimere la mia gratitudine al Prof. ter Hofstede
che mi ha accolto per 6 mesi nel suo gruppo di ricerca e per il tempo che
mi ha dedicato. Durante il purtroppo breve periodo passato lı̀, sono riuscito
a crescere professionalmente molto più di quanto avrei sperato di fare. Un
saluto va anche all’Australia che mi è rimasta nel cuore e sarà per sempre la
mia seconda patria...
Molte grazie ai revisori esterni per i loro commenti sul contenuto e la
presentazione di questa dissertazione. Molte grazie a tutti i collaboratori e i
tesisti che sono stati di supporto negli anni nello sviluppo dei diversi aspetti
considerati in questa tesi; un grosso abbraccio a tutti gli amici e a tutti i
colleghi nel Dipartimento di Informatica e Sistemistica. Non voglio nominare
nessuno in particolare per evitare che mi dimentichi di qualcun’altro, e non
v
sarebbe giusto..
Inoltre voglio ringraziare Sara: ella ha iniziato avventura con me e mi ha
incoraggiato durante tutto il percorso; purtroppo il suo “ruolo” nel frattempo
è cambiato per ragioni più grandi di noi.
Desidero esprimere poi la mia riconoscenza ai miei genitori, Pierfrancesco
e Maria Rosa, a mio fratello Fabrizio, che, nonostante non approvassero la mia
scelta, mi hanno comunque dato supporto e non mi hanno “messo il bastone
tra le ruote”.
Per ultima, ma non per ordine di importanza, voglio ringraziare la mia
amata Mariangela. Ella è arrivata da poco nella mia vita, ma quanto basta
per accenderne la luce, quella luce che piano piano si era spenta.
Acknowledgements
Now, that this thesis is completed and another step of my life has been walked,
I cannot prevent myself from looking behind and going back over all these years
from when, just bachelor graduated, Prof. Catarci proposed me to be teaching
assistant to Dr. Mecella. I accepted, and from them this adventure began.
Therefore, I wish to thank Prof. Catarci, who has allowed all of this to begin
and keep still going on. It was year 2003 and my collaboration started, even
though iniatially only for teaching purposes.
I need to thank Dr. Massimo Mecella infinitely: without him I could never
have written this thesis, nor be growing up humanely and professionally so
much. Massimo has been more than his role would have led up to do... He has
been also a friend as well as and a support in the moments of discouragement
during the years of the Ph.D. program. Thanks! Thanks! Thanks so much!
I need to say “thanks” to Prof. De Giacomo, as well, for his scientific
support and for the time he devoted me. He has been a precious mentor for
the topics touched on in thesis. I wish also to thank Dr. Sardina, a really
exquisite person, who has been definitely promptly helpful when I had to
realize concretely the techniques conceived in the thesis. Without him, I could
never develop concretely SmartPM.
Moreover, I cannot prevent myself from expressing my gratitude to Prof.
ter Hofstede, who hosted me in his research group, devoting a lot of his time
to me. During the (unfortunately) short time there, I could grow up much
more than I hoped to do. A lovely hug is also for Australia, which is in my
heart of hearts and will be forever my second country.
I wish to express my thanks to the external referees for their valuable comments on the content and the presentation of this dissertation. Thanks very
much to all collaborators and Master/Bachelor students that have been contributing in the development of many practical aspects of this thesis; a lovely
hug to all of my friends and to my colleagues of Dipartimento di Informatica
e Sistemistica. I am not willing to name explicitly anyone to avoid forgetting
any, as that would not be fair.
Furthermore, I wish to thank Sara; she started this adventure with me
and supported me along this path; unfortunately, her “role” has meanwhile
vii
changed for some reasons greater than us.
I wish to show my appreciation to my parents, Pierfrancesco e Maria Rosa,
my brother Fabrizio, who all, although they were not approving my choice,
supported me anyway “without throwing a spanner in the works”.
Last, but not least, I wish to thank my beloved Mariangela. She has
entered recently into my life, but enough to turn on its light, which little by
little were going off.
Contents
1 Introduction
1.1 Problem Statement . . . . . . . .
1.2 Original Contributions . . . . . .
1.3 Publications and Collaborations .
1.4 Outline of the Thesis . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 Rationale
1
1
3
7
11
13
3 Literature Review
3.1 Process Modelling Languages . . . . . . . . . . .
3.1.1 Workflow Nets . . . . . . . . . . . . . . .
3.1.2 Yet Another Workflow Language (YAWL)
3.1.3 Event-driven Process Chains (EPCs) . . .
3.1.4 π-calculus . . . . . . . . . . . . . . . . . .
3.1.5 Discussion . . . . . . . . . . . . . . . . . .
3.2 Related Works on Adaptability . . . . . . . . . .
3.3 Case Handling . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
21
21
25
26
27
30
33
39
4 Framework for Automatic Adaptation
4.1 Preliminaries . . . . . . . . . . . . . . . . .
4.2 Execution Monitoring . . . . . . . . . . . .
4.3 Process Formalisation in Situation Calculus
4.4 Monitoring Formalisation . . . . . . . . . .
4.5 A Concrete Technique for Recovery . . . . .
4.6 Summary . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
45
49
53
57
59
63
5 The SmartPM System
5.1 The IndiGolog Platform . . . . . . . . . . . . . . . . . . . .
5.1.1 The top-level main cycle and language semantics . .
5.1.2 The temporal projector . . . . . . . . . . . . . . . .
5.1.3 The environment manager and the device managers
5.1.4 The domain application . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
65
66
67
70
71
72
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5.2
5.3
5.4
5.5
5.6
The SmartPM Engine . . . . . . . . . . . . . . . . . .
5.2.1 Coding processes by the IndiGolog interpreter .
5.2.2 Coding the adaptation framework in IndiGolog
5.2.3 Final discussion . . . . . . . . . . . . . . . . . .
The Network Protocol . . . . . . . . . . . . . . . . . .
5.3.1 Protocols and implementations . . . . . . . . .
5.3.2 Testing Manets . . . . . . . . . . . . . . . . . .
5.3.3 Final Remarks . . . . . . . . . . . . . . . . . .
Disconnection Prediction in Manets . . . . . . . . . .
5.4.1 Related Work . . . . . . . . . . . . . . . . . . .
5.4.2 The Technique Proposed . . . . . . . . . . . . .
5.4.3 Technical Details . . . . . . . . . . . . . . . . .
5.4.4 Experiments . . . . . . . . . . . . . . . . . . .
The OCTOPUS Virtual Environment . . . . . . . . .
5.5.1 Related Work . . . . . . . . . . . . . . . . . . .
5.5.2 Functionalities and Models . . . . . . . . . . .
5.5.3 The OCTOPUS Architecture . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . .
6 Adaptation of Concurrent Branches
6.1 General Framework . . . . . . . . . . . . . .
6.2 The adaptation technique . . . . . . . . . .
6.2.1 Formalization . . . . . . . . . . . . .
6.2.2 Monitoring-Repairing Technique . .
6.3 An Example from Emergency Management
6.4 A summary . . . . . . . . . . . . . . . . . .
7 Some Covered Related Topics
7.1 Automatic Workflow Composition .
7.1.1 Conceptual Architecture . . .
7.1.2 A Case Study . . . . . . . . .
7.1.3 The Proposed Technique . . .
7.1.4 Final remarks . . . . . . . . .
7.2 Visual Support for Work Assignment
7.2.1 Related Work . . . . . . . . .
7.2.2 The General Framework . . .
7.2.3 Fundamentals . . . . . . . . .
7.2.4 Available Metrics . . . . . . .
7.2.5 Implementation . . . . . . . .
7.2.6 The YAWL system . . . . . .
7.2.7 The User Interface . . . . . .
7.2.8 Architectural Considerations
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
72
75
83
90
92
92
94
101
102
104
105
112
114
116
118
120
125
126
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
129
130
131
131
136
142
145
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
in PMS
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
149
151
152
153
161
165
166
168
169
171
172
176
177
178
180
7.3
7.2.9 Example: Emergency Management . . . . . . . . . . . . 184
7.2.10 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . 190
A summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8 Conclusion
193
A The Code of the Running Example
197
Chapter 1
Introduction
1.1
Problem Statement
Nowadays organisations are always trying to improve the performance of the
processes they are part of. It does not matter whether such organisations are
dealing with classical static business domains, such as loans, bank accounts
or insurances, or with pervasive and highly dynamic scenarios. The demands
are always the same: seeking more efficiency for their processes to reduce the
time and the cost for their execution.
According to the definition given by the Workflow Management Coalition1 , a workflow is “the computerised facilitation of automation of a business
process, in whole or part”. The Workflow Management Coalition defines a
Workflow Management System as “a system that completely defines, manages and executes workflows through the execution of software whose order
of execution is driven by a computer representation of the workflow logic”.
Workflow Management Systems (WfMSs) are also known as Process Management Systems (PMSs), and we are going to use both of them interchangeably
throughout this thesis. Accordingly, this thesis uses many times word “process” is place of word “workflow”, although the original acceptation of the
former is not intrinsically referring to its computerised automation.
The idea of Process Management Systems as information systems aligned
in a process-oriented way was born in late 80’s with the aim of improving the
process performances. And PMSs are still growing in importance since the
demand of efficiency and effectiveness is more and more crucial in a highly
competitive world.
PMSs improve efficiency, while providing a better process control [46, 136].
The use of computer systems avoids process executions to be improvised and
guarantees a more systematic process execution, which finally translates to an
1
http://wfmc.org
1
2
CHAPTER 1. INTRODUCTION
overall improvement of the response time.
In this thesis we are not dealing with classical business scenarios, which it
has been extensively researched on, but we turn our attention to highly dynamic and pervasive scenarios. In pervasive scenarios, information processing
is thoroughly integrated with the physical environment and its objects. As
such, people cannot carried out activities remotely, but they need to interact
actively with the environment and make physical changes to it. Pervasive scenarios comprise, for instance, emergency management, health care or home
automation (a.k.a. domotics). The physical interaction with the environment
increases the frequency of unexpected contingencies with respect to classical
scenarios. Being pervasive scenarios very dynamic and turbulent, PMSs should
provide a higher degree of operational flexibility/adaptability to suit them.
According to Andresen and Gronau [3] adaptability can be seen as an
ability to change something to fit to occurring changes. Adaptability is to be
understood here as the ability of a PMS to adapt/modify processes efficiently
and fast to changed circumstances. If processes were not adapted, they could
not be carry out in the changed environment.
In pervasive settings, efficiency and effectiveness when carrying on processes are a strong requirement. For instance, in emergency management saving minutes could result in saving injured people, preventing buildings from
collapses, and so on. Or, pervasive health-care processes can cause people’s
permanent diseases when not executed by given deadlines. In order to improve effectiveness of process execution, adaptation ought to be as automatic
as possible and to require minimum manual human intervention. Indeed, human intervention would cause delays, which might not be acceptable.
The main concern of this thesis is to research for improving the degree
of automatic adaptation to react to very frequent changes in the execution
environment and fit processes accordingly.
Let us consider a scenario for emergency management where processes
show typical a complexity that is comparable to business settings. Therefore,
it worthy using a PMS to coordinate the activities of emergency operators
within teams. The members of a team are equipped with PDAs and are coordinated through the PMS residing on a leader device (usually an ultra-mobile
laptop). In such a PMS, process schemas (in the form of enriched Activity
Diagrams) are defined, describing different aspects, such as tasks/activities,
control and data flow, tasks assignment to services, etc. Every task is associated to a set of conditions which ought to be true for the task to be performed;
conditions are defined on the control and data flow (e.g., a previous task has
to be completed or a variable needs to be assigned a specific range of values).
Devices communicate with each other through ad hoc networks. A Mobile Ad
hoc NETwork (manet) is a P2P network of mobile nodes capable of communicating with each other without an underlying infrastructure. Nodes can
1.2. ORIGINAL CONTRIBUTIONS
3
communicate with their own neighbors (i.e., nodes in radio-range) directly by
wireless links. Non-neighbor nodes can communicate as well, by using other
intermediate nodes as relays that forward packets toward destinations. The
lack of a fixed infrastructure makes this kind of network suitable in all scenarios where it is needed to deploy quickly a network, but the presence of access
points is not guaranteed, as in emergency management [91].
The execution of the emergency management process requires such devices
to be continually connected to the PMS. However, this cannot be guaranteed:
the environment is highly dynamic and the movement of nodes (that is, devices
and related operators) within the affected area, while carrying out assigned
tasks, can cause disconnections and, thus, unavailability of nodes. From the
collection of actual user requirements [35, 66, 67], it results that typical teams
are formed by a few nodes (less than 10 units), and therefore frequently a simple task reassignment is not feasible. Indeed, there may not be two “similar”
services available to perform a given task.
Adaptability might consist in this case to recover the disconnection of a
node X, and that can be achieved by assigning a task “Follow X” to another
node Y in order to maintain the connection. When the connection has been
restored, the process can progress again.
1.2
Original Contributions
The definitions of adaptability currently available in literature are too generic
for our intends. This thesis comes up with a more precise definition of process
adaptability which stems from the the field of robotics and agent programming [31] and is adapted for process management.
Adaptability can be seen as the ability of the PMS to reduce the gap of
the virtual reality, the (idealized) model of reality that is used by the PMS to
deliberate, from the physical reality, the real world with the actual values of
conditions and outcomes. For instance in the aforementioned scenario about
emergency management, in virtual reality PMS assumes nodes to be always
connected. But in physical reality when nodes are moving, they can lose a
wireless connection and, hence, may be unable to communicate.
The reduction of this gap requires sufficient knowledge of both kinds of
realities (virtual and physical). Such knowledge, harvested by the services
performing the process tasks, would allow the PMS to sense deviations and to
deal with their mitigation.
In theory there are three possibilities to deal with deviations:
1. Ignoring deviations – this is, of course, not feasible in general, since the
new situation might be such that the PMS is no more able to carry out
the process instance.
4
CHAPTER 1. INTRODUCTION
2. Anticipating all possible discrepancies – the idea is to include in the
process schema the actions to cope with each of such failures. This can
be seen as a try-catch approach, used in some programming languages
such as Java. The process is defined as if exogenous actions cannot occur,
that is everything runs fine (the try block). Then, for each possible
exogenous event, a catch block is designed in which the method is given
to handle the corresponding exogenous event. As already touched on
and widely discussed in Chapter 3, most PMSs use this approach. For
simple and mainly static processes, this is feasible and valuable; but,
especially in mobile and highly dynamic scenarios, it is quite impossible
to take into account all exception cases.
3. Devising a general recovery method able to handle any kind of exogenous
events – considering again the metaphor of try/catch, there exists just
one catch block, able to handle any exogenous events, included the
unexpected. The catch block activates the general recovery method to
modify the old process P in a process P 0 so that P 0 can terminate in the
new environment and its goals are included in those of P . This approach
relies on the execution monitor (i.e., the module intended for execution
monitoring) that detects discrepancies leading the process instance not
to be terminable. When they are sensed, the control flow moves to the
catch block. An important challenge here is to build the monitor which
is able to identify which exogenous events are relevant, i.e. that prevent
processes from being completed successfully, as well as to automatically
synthesize P 0 during the execution itself.
This thesis aims at achieving adaptability by using the third approach,
which seems to be the most appropriate when dealing with scenarios where
the frequency of unexpected exogenous events are relatively high.
After an investigation of possible techniques which can be used for automatically adaptation, we focussed our attention to well-established techniques
and frameworks in Artificial Intelligence, such as Situation Calculus [119] and
automatic planning. Those techniques were born to coordinate robots and
intelligent agents, i.e. in application fields that are far from the main topic of
this thesis. Therefore, their applicability to process management has required
a significant effort in terms of conceptualisation and formalisation. Then, we
have proposed a proof-of-concept implementation, namely SmartPM, which
is based on the IndiGolog interpreter developed at University of Toronto and
RMIT University, Melbourne. The use of an available platform born for coordinating robots has raised critical issues when used to integrate generic automatic services and humans. And solving these issues has required a tight
collaboration with the conceivers and developers.
1.2. ORIGINAL CONTRIBUTIONS
5
Actions are modeled in IndiGolog [121], a logic-based language used for
robot and agent programming. Fluents denoting world properties of interest are modeled in SitCalc as well as pre- and post-conditions of actions
are. Such formalisms enable to reason over exogenous events and determine
(i) when such events are able to invalidate the execution of certain processes
and (ii) how to recovery from them and take the original process back to the
right track. Specifically, when a certain deviation is sensed that makes deviate
the physical reality from the virtual one, we make use of planning mechanisms
to find and enact a set of activities thus recovering from such a mismatching.
The first framework proposed is able to deal with any well-structure processes with no restrictions (see Chapter 4). Then, we have later a second
framework that, from the one side, is more efficient. But, from the other
side, it poses some restrictions on the structure and the characteristics of the
processes and, hence, it cannot be always used (see Chapter 6).
In sum, the contribution of this thesis to the field of automatic process
adaptability is manifold:2
• The collection of actual requirements by users acting in such pervasive
and dynamic scenarios. Requirement collections guarantee that the resulting system is really useful for end users [66, 67, 23, 22, 35, 24].
• The analysis of existing work within the topic of adaptability (a.k.a.
flexibility), exception handling and process modelling in order to analyze
and systematize available modelling languages and approaches to process
adaptability.
• The evaluation of possible alternative approaches. We tried other approaches which are valuable but partly fail when dealing with unexpected
deviations. Finally, we move beyond the borders of the process management field, yielding to agent and robotic programming. By such analysis
and evaluation, we have been also able to give a precise characterization
of the notion of process adaptability in term of gap between the virtual
and physical reality [36, 7, 34].
• The conceptualisation and formalisation of a first set of techniques for
automatic adaptation of any well-structured process [37]. In order to
achieve that, we provide some sub-contribution:
– The definition of a precise semantic for defining formally the process structure and the activity conditions. These semantics has
been obtained tailoring Situation Calculus and IndiGolog to process
management. Formalising processes using Situation Calculus and
2
The references below concern papers of the candidate addressing such topics
6
CHAPTER 1. INTRODUCTION
IndiGolog has required a significant effort, since such formalisms are
not intended for that.
– The formalization of the concept of equivalence of two processes
through bisimulation. A process P running in an environment E is
said to be equivalent to a process P 0 running in an environment E 0
if P achieves the same goals as P 0 when P is executed in E and P 0
in E 0 .
– The effort of taking the adaptability issue to the problem of finding a plan to recover from discrepancies in order to eliminate the
mismatching between the physical and the virtual reality.
– The formal proof of the correctness and completeness of the proposed approach.
• The development of SmartPM, a proof-of-concept implementation of the
adaptation framework that is based on the IndiGolog interpreter developed at University of Toronto and RMIT University, Melbourne [39].
The use of a platform specifically intended for robot and agent programming has required a tight collaboration with the conceivers and developers to tailor it to process management. The aim of such an implementation has been to demonstrate the practical feasibility and effectiveness of
the approach beyond the formal proof of soundness. For the sake of testing in a context of mobile ad-hoc networks, we have provided also other
contributes, specifically to the field of mobile networking. Specifically:
– The conception and development of a proper manet emulator,
namely octopus, which overtakes some issues significant in our
testing. Section 5.5 describes octopus and motivates its conception. [28]
– The development of a proper manet layer that is really working on
low-profile devices. Many implementations are in theory available
but, in fact, either they do not work on low-profile or they are
partially fledged (see Section 5.3). [14]
– The development of some sensors able to sense deviations. Specifically, we have developed a module that is able to predict node
disconnections before they actually happen. [38, 41]
• The conception of a second technique which aims at overcoming some
of limitations of the first framework. It results to be more efficient in
dealing with recovery plans since it is able to stick individually the parts
which are affected by discrepancies without having to block the whole
process. On the other hand, this approach is applicable over more restrictive conditions of the structured and the characteristic of processes. [33]
1.3. PUBLICATIONS AND COLLABORATIONS
7
We have also contributed on other topics of the field of process management, more in general. These topics address other challenging issues concerning pervasive scenarios. Specifically:
• The formalisation of a first step towards distributing the process orchestration among the different devices of the involved services/participants
as well as towards synthesizing the process specification on the basis of
available services. Indeed, in pervasive scenarios any device may fall
down in any moment because of the environment, including the device
hosting the engine. The sole way to avoid the engine to be a single point
of failure is to distribute the orchestration and the coordination among
all available devices. In addition, processes often might be only provided
as template and their concrete instance are created when on the basis of
the available services the process has to be enacted [53].
• The conceptualisation and the implementation of an innovative “client”
tool to distribute tasks to process participants in a way they are aided
when choosing the next task to work on. This tool aims to overcome
current limitations of worklist handlers of the state-of-the-art in Processaware Information Systems. These worklist handlers typically show a
sorted list of work items comparable to the way that e-mails are shown in
mail agents. Since the worklist handler is the dominant interface between
the system and its users, it is worthwhile to provide a more advanced
graphical interface that uses information about work items and users as
well as about process cases which are completed or still running. The
worklist handler proposed aims to provide process participants with a
deeper insight in the context in which processes are carried out. This
way, participants can be assisted with the selection of the next work
item to perform. The approach uses the ”map metaphor” to visualise
work items and resources (e.g., participants) in a sophisticated manner.
Moreover, depending on the ”distance notion” chosen, work items are
visualised differently. For example, urgent work items of a type that suits
the user are highlighted. The underlying map and distance notions may
be of a geographical nature (e.g., a map of a city or an office building),
but may also be based on the process design, organisational structures,
social networks, due dates, calenders, etc. [42]
1.3
Publications and Collaborations
The following publications have been produced while researching this thesis:
• M. de Leoni, F. De Rosa, M. Mecella
“ MOBIDIS: A Pervasive Architecture for Emergency”
8
CHAPTER 1. INTRODUCTION
In Proceedings of the 15th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE 2006), University of Manchester, UK, June 26th -28th, 2006. AWARDED AS “BEST
PAPER” OF DMC 2006 WORKSHOP.
• T. Catarci, M. de Leoni, M. Mecella, M. Angelaccio, S. Dustdar et
al.
“WORKPAD: 2-Layered Peer-to-Peer for Emergency Management through
Adaptive Processes”
In Proceedings of The 2nd International IEEE Conference on Collaborative
Computing: Networking, Applications and Worksharing (COLLABORATECOM 2006), Atlanta, Georgia, USA, November 17th - 20th, 2006.
• M. de Leoni, A. Marrella, F. De Rosa, M. Mecella, A. Poggi, A.
Krek, F. Manti
“Emergency Management: from User Requirements to a Flexible P2P Architecture”
In Proceedings of the 4th International Conference on Information Systems for
Crisis Response and Management (ISCRAM’07 ), Delft, the Netherlands, May
13th-16th, 2007.
• F. D’Aprano, M. de Leoni, M. Mecella
“ Emulating Mobile Ad-hoc Networks of Hand-held Devices. The OCTOPUS
Virtual Environment”
In Proceedings of the ACM Workshop on System Evaluation for Mobile Platform: Metrics, Methods, Tools and Platforms (MobiEval ) co-located with Mobisys 2007, Puerto Rico 11-14 June 2007
• M. de Leoni, M. Mecella, R. Russo
“A Bayesian Approach for Disconnection Management”
In Proceedings of the 16th IEEE International Workshops on Enabling
Technologies: Infrastructures for Collaborative Enterprises (WETICE-2007),
GET/INT, Paris, France, June 18-20, 2007
• T. Catarci, M. de Leoni, M. Mecella, S. Dustdar, L. Juszczyk et al.
”The WORKPAD P2P Service-Oriented Infrastructure for Emergency Management”
In Proceedings of the 16th IEEE International Workshops on Enabling
Technologies: Infrastructures for Collaborative Enterprises (WETICE 2007),
GET/INT, Paris, France, June 18-20, 2007
• G. De Giacomo, M. de Leoni, M. Mecella, F. Patrizi
“ Automatic Workflow Composition of Mobile Services”
In Proceedings of the IEEE International Conference on Web Services (ICWS
2007 ), Salt Lake City, USA, July, 2007.
• M. de Leoni, M. Mecella, G. De Giacomo
“Highly Dynamic Adaptation in Process Management Systems through Execution Monitoring”
1.3. PUBLICATIONS AND COLLABORATIONS
9
In Proceedings of the 5th International Conference on Business Process Management (BPM 2007 ), Brisbane, Australia, 24-28 September 2007.
• M. de Leoni, F. De Rosa, M. Mecella, S. Dustdar
“ Resource Disconnection Management in MANET Driven by Process Time
Plan”
In Proceedings of the First International ACM Conference on Autonomic Computing and Communication Systems (AUTONOMICS’07 ), Rome, Italy, 28-30
October 2007.
• T. Catarci, M. de Leoni, M. Mecella, G. Vetere, S. Dustdar et al.
”Pervasive and Peer-to-Peer Software Environments for Supporting Disaster
Responses”.
“IEEE Internet Computing” Journal – Special Issue on Crisis Management January 2008
• M. de Leoni, S. R. Humayoun, M. Mecella, R. Russo
”A Bayesian Approach for Disconnection Management in Mobile Ad-hoc Network”
”Ubiquitous Computing and Communication” Journal - March 2008
• G. Bertelli, M. de Leoni, M. Mecella, J. Dean
Mobile Ad hoc Networks for Collaborative and Mission-critical Mobile Scenarios: a Practical Study
In Proceedings of the 17th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE 2008), 23-25
June 2008,Rome, Italy.
• M. de Leoni, A. Marrella, M. Mecella, S. Valentini, S. Sardina
”Coordinating Mobile Actors in Pervasive and Mobile Scenarios: An AI-based
Approach”
In Proceedings of the 17th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE 2008), 23-25
June 2008,Rome, Italy.
• M. de Leoni, W. M. P. van der Aalst, A.H.M. ter Hofstede
”Visual Support for Work Assignment in Process-aware Information Systems”
In Proceedings of the 6th International Conference on Business Process Management (BPM 2008 ), Milan, Italy, 1-4 September 2008.
• T. Catarci, F. Cincotti, M. de Leoni, M. Mecella, G. Santucci
”Smart Homes for All: Collaborating Services in a for-All Architecture for
Domotics”
In Proceedings of the 4th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom’08 ), Orlando, USA, 13-16 November 2008
• D. Battista, A. De Gaetanis, M. de Leoni et al.
”ROME4EU: A Web Service-based Process-aware Information System for
Smart devices”
10
CHAPTER 1. INTRODUCTION
In Proceedings of the International Conference on Service Oriented Computing
(ICSOC 2008 ), Sydney, Australia, 1-4 December 2008.
• M. de Leoni, Y. Lésperance, G. De Giacomo, M. Mecella
”On-line Adaptation of Sequential Mobile Processes Running Concurrently”
In Proceedings of the 24th ACM Symposium on Applied Computing (SAC09 )
8-12 March, 2009, Honolulu, Hawaii, USA. Special Track ”Coordination Models,
Languages and Applications”
• S. R. Humayoun, T. Catarci, M. de Leoni, A. Marrella, M. Mecella,
M. Bortenschlager, R. Steinmann
”Designing Mobile Systems in Highly Dynamic Scenarios. The WORKPAD
Methodology.”
Springer’s International Journal on Knowledge, Technology & Policy, Volume
22, Number 1 / March, 2009.
• S. R. Humayoun, T. Catarci, M. de Leoni, A. Marrella, M. Mecella,
M. Bortenschlager, R. Steinmann
”The WORKPAD User Interface and Methodology: Developing Smart and Effective Mobile Applications for Emergency Operators”
In Proceedings of 13th International Conference on Human-Computer Interaction (HCI International 2009 ), 19-24 July, 2009, San Diego, USA. Session
“Designing for Mobile Computing”.
• F. Cardi, M. de Leoni, M. Adams, W. M. P. van der Aalst, A.H.M.
ter Hofstede
Visual Support for Work Assignment in YAWL
In Proceedings of the Demonstration Track of 7th International Conference on
Business Process Management (BPM 2009), September 2009, Ulm, Germany.
To appear.
The work described in Section 7.2 has been mostly produced during an
internship of Mr. Massimiliano de Leoni at the BPM Group of the Faculty
of Information Technology of Queensland University of Technology, Brisbane
(Australia). His visit commenced on September 17th, 2007 and ceased on
April 07th, 2008 and was supervised by Prof. Arthur H. M. ter Hofstede,
co-leader of this group.
The implementation of the adaptation framework has been developed in
cooperation with Dr. Sebastian Sardina, research assistant at the Agent Group
of the RMIT University, Melboune, Australia. In particular, Mr. de Leoni was
visiting the group from December 7th, 2008 to December 17th, 2008, with the
aim of solving the last details of the proof-of-concept implementation.
Mr. Massimiliano de Leoni has also co-chaired a workshop on Process
Management for Highly Dynamic and Pervasive Scenarios (PM4HDPS) held
in Milan on September 1st, 2008 in conjunction with the 6th International Conference on Business Process Management (BPM’08).3 The workshop aimed at
3
Web site: http://pm4hdps.deleoni.it
1.4. OUTLINE OF THE THESIS
11
Figure 1.1: Outline of the Thesis and relationship among Chapters
providing a forum to draw attention to Highly Dynamic and Pervasive settings
and to exchange the latest individual research and development ideas. The
valuable outcomes are summarized in [37].
1.4
Outline of the Thesis
Figure 1.1 diagrams the structure of this Thesis document. Specifically:
• Chapter 2 illustrates in detail the rationale behind the need of the new
approach to process adaptability that this Thesis deals with. In particular, it highlights why the approaches currently proposed fail when
dealing with Highly Dynamic and Pervasive Scenarios.
• Chapter 3 surveys the literature and describes the works, the systems
and the techniques that have been already proposed in the field of process
adaptation. Specifically, it compares the choice of IndiGolog as modelling
language with respect to the other languages that are nowadays used by
various Process Management Systems. Moreover, it discusses the levels
of support for process adaptability/flexibility and exception handling
in several of the leading commercial products and in some academic
12
CHAPTER 1. INTRODUCTION
prototypes. Finally, it concludes stating the inappropriateness of Case
Handling as approach to manage the performance of pervasive processes.
• Chapter 4 shows a first approach to handle unexpected exogenous events,
and to recovery process instance executions when exogenous events make
impossible their termination.
• Chapter 5 describes the most salient points of the concrete implementation based on the IndiGolog platform developed by the University of
Toronto and RMIT University.
• Chapter 6 illustrates a more efficient adaptability technique but under
more restrictive conditions with respect to the one proposed in Chapter 4.
• Chapter 7 introduces some research topics related to the process management in pervasive scenarios. The first deals with the problem of
synthesizing a process schema according to the available services and
distributing the orchestration among all of them. The second touches
the topic of supporting process participants when choosing the next task
to work on among the several ones they can be offered to.
• Chapter 8 conclude the thesis, surveying the outcomes and sketching
future improvement in the field of the process adaptation.
Chapter 2
Rationale
Over the last decade there has been increasing interest in Process Management
Systems (PMS), also known as Workflow Management System (WfMS). A
PMS is, according to the definition in [46], “a software that manages and
executes operational processes involving people, applications, and information
sources on the basis of process models”.
PMSs are driven by process specifications, which are some computerized
models for the processes to be enacted. The model defines the tasks (also
referred to simply as activities) that are part of the processes, as well as their
pre- and post-conditions. Pre-conditions are typically defined on the so-called
control and data flows. Indeed, the control flow defines the right sequence of
task executions: some tasks can be assigned to members for performance only
when others have been already completed. The data flow specifies how the
values of process variables change/evolve over time as well as which variables
specific tasks are allowed to read and/or write. Process specifications can
define some decision points to choose one branch among alternative ones; such
choices are driven by some formulas over process variables. These formulas are,
then, evaluated at run-time by taking into account the actual variable values.
When processes need to be running, instances are created, which possess their
own copies of the variables defined.
In the PMS literature, instances are often referred to as cases. To be more
precise, tasks are never executed. Tasks are defined inside the process schema.
When process schemas are instantiated in cases, tasks are instantiated, as
well. A work item is a task instance inside a case and is created as soon as the
case reaches to the corresponding task in the schema. Work-items represent
the real pieces of work that participants execute. For instance, if there exists
a task “Approve travel request” for a flight-booking process, a possible work
item might be “Approve travel request XYZ1234” of case “Flight booking
XYZ1234”. It is worthy noting that many work items referring to the same
13
14
CHAPTER 2. RATIONALE
task may be instantiated for a single case. Unless needed, we do not distinguish
throughout this thesis between the concept of tasks/activities and work items,
bearing anyway in mind that a difference does exist.
At the heart of PMSs there exists an engine that manages the process routing and decides which tasks are enabled for execution by taking into account
the control flow, the value of variables and other aspects. Once a task can be
assigned, PMSs are also in charge of assigning tasks to proper participants;
this step is performed by taking into account the participants “skills” required
by single tasks as well as their roles in their respective organisations. Indeed,
a task will be assigned to all of those participants that provide every skill
required or have a certain organisation role.
Human participants are provided with a client application, often named
, which is intended to receive notifications of task assignments. Participants
can, then, use this application to pick from the list of assigned tasks which
one to work on as next.
SmartPM, the adaptive Process Management System conceptualised, formalised and developed in this thesis work, abstracts from the possible participants that it can coordinate. We name them generically services. SmartPM
provides a client interface that services can invoke in order to communicate
for data exchange and to coordinate the process execution. We assume communication to be one-way, which means services send request to SmartPM and
close the communication without standing by for a prompt response. When
the response is ready, SmartPM will be in charge of contacting the service and
informing on the response. When SmartPM is communicating with the client,
it assumes services to provide well-known and established interfaces, which
SmartPM uses to send back responses. Therefore, services have to provide
these interfaces, either directly if services are built for SmartPM, or by implementing a specific wrapper if services are legacy, an handler that provide the
proper interfaces to SmartPM and internally transform the messages in the
form that legacy services are able to understand.
We envision two classes of services. The first class includes the automatic services, i.e. those which can execute tasks with no human intervention,
whereas the second comprises the human services. For the second class of
human-based services, we envision a client application, named in literature
work-list handler, that acts as service. From the one side, it handles the
communication with the SmartPM engine, receiving notifications of tasks assignment and informing upon task completion, as any service would do. From
the other side, it is equipped with a Graphical User Interface to inform the
human users of the task which she has to work on as next. The human users
are the real executor of the work that the service is supposed to perform.
15
Process Management for Highly Dynamic and Pervasive Scenarios:
Why current solutions do not work properly.
Nowadays, Process Management Systems (PMSs) are widely used in many
business scenarios, e.g. by government agencies, by insurance companies, and
by banks. Despite this widespread usage, the typical application of such systems is predominantly in the context of static scenarios, instead of pervasive
and highly dynamic scenarios. Nevertheless, pervasive and highly dynamic
scenarios may be configured as complex as business scenarios. Therefore, they
could also benefit from the use of PMSs. Some examples of Highly Dynamic
and Pervasive scenarios are:
Emergency situations. Several devices, robots and/or sensors must be coordinated in accordance with a process schema (e.g., based on a disaster
recovery plan) to cope with environmental disasters.
Pervasive healthcare. The purpose is to make healthcare available to anyone, anytime, and anywhere by removing location, time and other constraints while increasing both the coverage and quality of healthcare.
Ambient intelligence. In this vision, devices/robots work in concert to support people in carrying out their everyday life activities, tasks and rituals
in an easy, natural way using information and intelligence that is hidden
in the network connecting these devices. Devices and robots are intelligent agents that act and react to external stimuli. Domotics, sometimes
also referred to as Home Automation, is a specialised application area in
this field.
In classical PMSs applied to business scenarios, the procedure for handling
possible run-time exceptions is generally subject to acknowledgement by the
person responsible for the process. This authorization may be provided at
run-time for handling deviations caused by a single exceptional event. Or,
conversely, it is possible that the person gives the “go-ahead” for all exceptions
in a certain class, defining the correct protocol they should be handled by. In
any case, the adaptation is manual and requires human intervention.
Conversely, the thesis addresses pervasive and dynamic scenarios, which
are characterized by being very instable. In such scenarios, unexpected events
may happen, which break the initial condition and makes the executing processes unable to be carried on and terminate successfully. These unforeseen
events are quite frequent and, hence, the process can often be invalidated. Deviations are frequent events and often, due to deadline constraints, they must
be handled very quickly. For instance, in scenarios of the management of an
occurred earthquake, offering first aid to injured victims ought to be as fast
as possible. Indeed, saving minutes might result in saving people’s life. Such
16
CHAPTER 2. RATIONALE
a requirement rules out waiting for a person’s acknowledgement: adaptation
must be as automatic and autonomic as possible.
From the surveys in Section 3, it results that all major commercial PMS
and academic prototypes are unable to automatically synthesize a recovery
plan/process to deal with exogenous events, unless event handlers were foreseen and designed at design-time. This is feasible in classic mostly-static
scenarios where exogenous events occur quite rarely. Sometimes manual adaptation or automatic for pre-planned event classes is even mandatory since, as
argued before, handling deviations may require either a proper authorization
or a specific protocol to exist.
This thesis work deals with the issue of devising a set of techniques that
can be beneficial for Process Management Systems; in such a way PMSs can
handle any exogenous events, even unforeseen, and create proper recovery
plans/processes. Then, these techniques have been concretely implemented
in SmartPM, an adaptive Process Management System that is specifically intended for pervasive scenarios.
The user requirements and consequences on task SmartPM life-cycle
The SmartPM system is under development in the context of the Europeanfunded project called WORKPAD, which concerns devising a two-level software infrastructure for supporting rescue operators of different organisations
during operations of emergency management [23]. In the context of this
project, the whole SmartPM system has been devised in cooperation with
real end users, specifically “Protezione Civile della Calabria” (Civil Protection and Homeland Security of Calabria). Indeed, the rest of this thesis will
explain the various introduced techniques through examples stemming from
emergency management. But its exploitation comprises many other possible
pervasive scenarios (such as those described above). According to the HumanComputer Interaction methodology different prototypes have been proposed
to users who fed back with comments [66, 67]. At each iteration cycle the
prototype has been refined according to such feedbacks till meeting finally the
complete users’ satisfaction.
From the analysis with final users, we learnt that processes for pervasive
scenarios are highly critical and time demanding as well as they often need to
be carried out within strictly specified deadlines. Therefore, it is unapplicable
to use a pull mechanism for task assignment where SmartPM would assign
every task to all process participants qualified for it, letting them decide autonomously what task to execute as next. Consequently, SmartPM aims at
improving the overall effectiveness of the process execution by assigning tasks
to just one member and, vice versa, by assigning at most one task to members.
Moreover, these processes are created in an ad-hoc manner upon the occur-
17
Figure 2.1: The life-cycle model in SmartPM
rence of certain events. These processes are designed starting from provided
templates or simple textual guidelines on demand. In the light of that, these
processes are used only once for the specific setting for which they were created; later, they will not be used anymore. Moreover, process participants
are asked to face one situation and, hence, they take part in only one process
simultaneously.
Taking into account the considerations above, the SmartPM life-cycle
model, depicted in Figure 2.1, is specialized with respect to those of other
PMSs [120]:
1. When all pre-conditions over data and control flow holds, the SmartPM
engine assigns the task to a service, human or automatic, that guarantees
the highest effectiveness. The task moves to the Assigned state.
2. The service notifies to SmartPM, when the corresponding member is
willing to begin executing. The task moves to the Running state.
3. The service begins executing it, possibly invoking external applications.
4. When the task is completed, the service notifies to SmartPM. The task
moves to the final state Completed.
18
CHAPTER 2. RATIONALE
Chapter 3
Literature Review
The idea that Information Systems have to be aligned in a process-oriented had
its root in the 1970s. Nowadays, such systems are often referred to as Workflow
Management System (WfMS) or Process Management System (PMS).
The competition in a globalized world has become in the last decade really
harder than in the past and, hence, PMSs are gaining more and more momentum. As a consequence, from the one side, many software companies have
developed commercial PMSs. From the other side many scientific research
groups have focused (and are still focusing) their efforts to come up with new
ideas to improve certain aspects and to provide new features for the next PMS
generations.
In order to provide an effective process support, PMSs should capture
real-world processes adequately by avoiding any mismatch between the computerised processes and those in reality. With this intend, several models have
been proposed for representing real processes in a form that they can represent
as many aspects of real processes as possible as well as they are manageable
by software systems.
Any PMS envisions the figure of the Process Designer who is in charge
of modelling business processes by communicating with business domain experts. Process Designers could neither have a strong theoretical background
nor be computer scientists. Many proposed process models tried to leverage the necessity of representing real processes precisely and of being easily
comprehensible and manageable by non-theoretical people. Section 3.1 gives
an overview of the most used formalisms for process modelling from which
it results many of them lack in their theoretical foundations. The process
adaptability framework proposed in this thesis requires a strong reasoning on
the process model to recognise, for instance, when adaptation is needed or
to automatically synthesize the recovery plan. That is why we are using IndiGolog, a logical programming language used in robotics, which has a strong
19
20
CHAPTER 3. LITERATURE REVIEW
Case Handling
Approach
(Section 3.3)
WebSphere
Workflow
State of Art
Adaptability
SAP
Workflow
Academic and
Industrial PMS
(Section 3.2)
YAWL
GraphGraph- based
Languages
Modelling languages
for adaptation
(Section 3.1)
ADEPT
BPMN
…
YAWL
Workflow
Nets
PiPi-calculus
Figure 3.1: Overview of the chapter structure
theoretical basis on SitCalc.
Another aspect of PMSs when dealing with real processes is to provide
enough adaptability to realign processes when exogenous events produce deviation. Section 3.2 illustrates how such adaptability, often also referred to as
flexibility, is achieved by many PMSs as well as new techniques and approaches
to deal with deviations. Unfortunately, the most of other approaches require
experts in charge of manually adapting processes whenever needed. That is
applicable in traditional business domains where exceptional events are infrequent. Manual adaptations may be even mandatory in some cases (e.g., when
the recovery requires the explicit authorisation of responsible unit heads). It is
not feasible in highly dynamic and pervasive scenarios when exogenous events
(and, hence, recovery plans) are really frequent.
A different approach to deal with flexibility is Case Handling that focuses
mainly on cases, running instances of processes. The Case Handling approach
poses less constraints on the case executions and, hence, deals intrinsically
better with providing adaptability. But being driven by artifacts, its applicability is limited in many pervasive scenarios. Case Handling is driven by the
artifacts produced by cases. In many pervasive scenarios it is not always possible to represent every process outcome as a well-defined artifact. Section 3.3
discusses better these points.
3.1. PROCESS MODELLING LANGUAGES
3.1
21
Process Modelling Languages
The frameworks for automatic adaptation proposed in this thesis are based
on a strong reasoning and on other key features that the languages currently
proposed for process modelling do not enable. While their are valuable in other
context, they seem to be inappropriate in the light of certain requirements of
the adaptation techniques proposed in this thesis.
Firstly, appropriate languages for our techniques need to be characterized
by sound and formal semantics. Indeed, activities pre- and post-conditions
need to be specified in a formal and unambiguous way, thus allow process
management systems to reason about the successful completion of process
instances. Secondly, appropriate languages need to enable both structural and
semantic soundness: processes are not only needed to complete but they have
to carry out obtaining the outcomes they have been designed for. Moreover,
appropriate languages should model non-atomic execution of activities: the
techniques proposed for execution monitoring and recovery should be able to
check activities even while they are executing. It is insufficient to model only
before and after the execution. Moreover, we rely on planning features: in
order the techniques to be feasible in practice, languages for which planners
are unavailable are inappropriate. Finally, execution monitoring concerns the
state; event-based languages should not be considered, preferring the statebased ones. Indeed, when using event-based languages, the state is implicit
and making it explicit would require an additional step, which needs to be
repeat continuously.
This section is meant to discuss the most used languages for modelling
processes showing their inappropriateness in the light of the aforementioned
requirements. Sections 3.1.1- 3.1.4 highlights such languages, where 3.1.5 discusses their pros and cons in the light of the requirements as above.
3.1.1
Workflow Nets
The most widely used language for defining process specifications are Workflow
nets [131, 136]. Workflow nets allow one to define unambiguous specifications,
formally reason on them as well as to check for specific properties.
The Workflow net language is a subclass of the well-know Petri Nets [108,
136]. Petri nets consist of places, transitions, and direct arcs connecting places
and transitions. Petri nets are bipartite graphs in the sense that two places
or two transitions cannot be directly connected. There is a graphical notation
where places are represented by circles, transitions by rectangles, and connects
by direct arcs. Tokens are used to represent the dynamic state and reside on
certain places. Each place may contain several tokens: their number and locations inside places identify the correct status;. Figure 3.2 shows an example of
22
CHAPTER 3. LITERATURE REVIEW
Figure 3.2: An example of Petri Net
Petri Net where places and transitions are respectively depicted as circles and
rectangles. The black dots on the places represent tokens and their location.
An input place of a transition t is such that it has an outgoing arc toward
t, and vice versa an output place of t has ingoing arcs from t. A certain
transition t is said to fire if for each input place one token is removed and one
token is placed in each output place. Of course, a transition can fire only if it
is enabled, that is each input place contains at least one token.
In the context of process management, transitions represent activities and
their firing represent their execution. Places and connecting arcs represent
the process instance state as well as the process constraints. For instance, in
the Petri Net above, the two tokens’ location identify that transitions Send
Acknowledgement and Request and check payment are enabled. Therefore, the
corresponding activities are ready to be assigned to participants and executed.
For the sake of brevity, here we introduce formally only an extension by
Kurt Jensen [70], named coloured Petri net, which is better tailored to process
management. Coloured Petri Nets introduces the association of “colours” to
tokens. Data types associated to tokens are called colour sets, where a colour
set of a token represents the set of values that tokens may have. Like in programming languages data values of a certain type are associate to variables,
in coloured petri nets colours of a certain colour set are associated to tokens.
Colours are meant to hold application data, including process instance identifiers. Places may have a different colour set, since some additional data can
become available while tokens are passing through the net (i.e., activities are
executed).
A coloured Petri net is a tuple (Σ, P, T, A, N, C, G, E, I) where:
• Σ is a finite set of non-empty types, called colour sets
• P is a finite set of places
• T is a finite set of transitions
3.1. PROCESS MODELLING LANGUAGES
23
• A is a finite set of arc identifiers, such that P ∩ T = P ∩ A = T ∩ A = ∅
• N : A → (P ×T )∪(T ×P ) is a node function mapping each arc identifier
to a pair (startn ode, endn ode) of the arc.
• C : P → Σ is a colour function that associates each place with a colour
set.
• G : T → BoolExpr is a guard function that maps each transition to a
boolean expression BoolExpr over the token colour.
• E : A → Expr is an arc expression that evaluates to a multi-set over the
colour set of the place
• I is an initial marking of the colour Petri Net, the initial position of
possible tokens with their respective values.
In coloured Petri nets, the enabling of a certain transition is determined
not only by the existence of tokens on the input places but also by the values
of the colour sets of such tokens. A transition is enabled if the guard function
for that transition is evaluated as true and the arch expression is satisfied.
When a transition fires, the respective tokens are removed from the input
places and others are placed in the output places guided by, respectively, the
arc expression of the ingoing and outgoing edge.
In order to represent the dynamic status of Colour Petri Nets, there exists
a function marking which returns, for each place p ∈ P and for each possible
colour value v ∈ C(p), the number of tokens in p with value v:
Let be P N = (Σ, P, T, A, N, C, G, E, I). For all pi ∈ P , let σpi be s.t.
C(pi ) = σi . For all pi there exists a function Mpi : σpi → N. A marking
function for P N is defined as follows:
½
Mp (q) if σpi = C(pi ) ∨ q ∈ σpi
M (p, q) =
0
otherwise
Petri nets should have specific structural restrictions in order to be properly
used for process management. In that case, they are named workflow nets:
A Petri Net P N = Σ, P, T, A, N, C, G, E, I) is called workflow net iff the
following conditions hold:
• There is a distinguished place place i ∈ P , named initial place, that has
no incoming edge.
• There is a distinguished place place o ∈ P , named final place, that has
no outgoing edge.
• Every place and transition is located on a firing path from the initial to
the final place.
24
CHAPTER 3. LITERATURE REVIEW
Papers [131, 132] has studied the problem of checking the soundness. Indeed, a process definition is said to be sound if any run-time execution of its
cases may not lead to situations of deadlock (the process is not completed but
no activity can be executed) or livelocks (the process cycles executing infinitely
always the same activities and never terminates). In those papers soundness
is defined as follows:1
Definition 3.1 (Soundness). Let P N = (Σ, P, T, A, N, C, G, E, I) be a
Workflow Net with initial place i and final place o. P N is structurally sound
if and only if the following properties hold:
Termination. For every state M reachable from i there exists a firing sequence leading from M to o:
∗
∗
∀M, i → M ⇒ M → o
Proper termination. State o is the only state reachable from state i with at
least one token in place o:
∗
∀M, i → M ∪ M ≥ o ⇒ M = o
No dead transitions. Each transition t ∈ T can contribute to at least one
process instance:
∗
t
∀t ∈ T, ∃M, M 0 , i → M → M 0
In some cases designers are only interested in checking whether a process
specification allows to reach each defined activity for some execution. When
the final state is reached, there can be tokens left in the net, maybe stuck in
deadlock situations. For these concerns, the soundness criterion appears to be
too restrictive. In the light of this, paper [43] has introduced the notion of
Relaxed Soundness:
Definition 3.2 (Relaxed Soundness). Let P N = (Σ, P, T, A, N, C, G, E, I)
be a Workflow Net with initial place i and final place o. P N is relaxed sound
if and only if each transition participates in at least one legal process instance
starting from the initial state and reaching the final one:
∗
t
∗
∀t ∈ T, ∃M, M 0 : i → M → M 0 → o
1
The state of a workflow net is here defined in term of the associated marking function.
If ∃q ∈ C(o), M (o, q) ≥ 0, then M ≥ o. In addition, if M ≥ 0 and ∀p ∈ P \ {o} holds, then
M =o
3.1. PROCESS MODELLING LANGUAGES
25
Figure 3.3: Basic nodes of the YAWL’s extended workflow nets (from [133])
3.1.2
Yet Another Workflow Language (YAWL)
Yet Another Workflow Language (YAWL) [133] has been developed in order to overcome the lack of a single language that supports all control
flow patterns [134]. It is currently used as modelling language by The
YAWL Language is used by the homonymous Process Management System (see Section 7.2.6 for further details.) Process specification are defined
in YAWL through so-called extended workflow nets composed by nodes of
the types in Figure fig:YAWLnet. An extended workflow net is a tuple
(C, i, o, T, F, split, join, rem, nof i) such that:
• C is a set of conditions
• i ∈ C and o ∈ C are the initial and final condition
• T is a set of tasks, s.t. C and T are disjoint.
• F ⊆ (C \ {o} × T ) ∪ (T × C \ {i}) ∪ (T × T ) is a flow relation such that
every node in C ∪ T is on a direct path from i to o.
• split : T 6→ {And, Xor, Or} is a partial mapping to assign a split behaviour to tasks.
• join : T 6→ {And, Xor, Or} is a partial mapping to assign a join behaviour
to tasks.
• rem : T 6→ 2T ∪ C\{i,o} specifies the possible subpart of a extended workflow net is cleaned when a certain task.2
2
Formalism 2S is meant to denote the power set of S
26
CHAPTER 3. LITERATURE REVIEW
• nof i : T 6→ N × N∞ × N∞ × {dynamic, static} is a partial function
that specifies the number of instance of each task (minimum,maximum,
threshold for continuation) and whether the instance creations is dynamic or static.3
Extended workflow nets are a flavour of workflow net which is able to handle:
Multiple instances. YAWL is able to enable concurrently multiple instances
of specific tasks. The exact number may be determined at run-time
according to some variables/conditions evaluated on the process instance
that multi-task is part of.
Advanced Synchronization Pattern. YAWL handles some patterns in a
more natural way than workflow nets (such as or split/join). Workflow
nets are able to specify most of them even if they need to use artefices
that require complex and prolix definitions.
Non-local Firing Behaviour. Workflow nets can determine whether a
transition can or cannot fire on the basis of the sole input places. YAWL
can enable activities considering tokens on other places as well as it allows transitions to delete tokens [146] through the definition of function
rem
It allows also to divide the extended workflow net in sub-nets, which are made
independent of the main net they are integrated in; therefore, sub-nets can be
reused in different specifications. The YAWL’s execution semantics of activities are well-defined state transitions systems. Every atomic task is actually
the sequence of four transitions: (i) task instance active; (ii) enabled but not
yet running; (iii) currently executing; (iv) completed. Moreover it allows to
define so-called composite tasks, which are links to other extended workflow
nets. Composite tasks facilitate the modularisation of complex specifications
and make easier reading those existing.
3.1.3
Event-driven Process Chains (EPCs)
Event-driven Process Chain (EPC) is a rather informal notation developed as
part of an holistic modelling approach named the ARIS framework [82, p. 35].
There are several formalisations of the EPC syntax as the original paper
introduces EPC in an informal way. Here we specifically use the definition
given in [96]:4
A tuple EP C = (E, F, C, l, A) is an Event-driven Process Chain if:
3
Formalism N∞ identifies the set of natural numbers plus the infinite
The EPC syntax has been also extended with the data and resource perspective, i.e.
process participants and data objects manipulated by activities. But here we do not consider
worthy describing such extensions.
4
3.1. PROCESS MODELLING LANGUAGES
27
• E, F, C are disjoint, finite, non-empty sets;
• l : C → {and, or, xor};
• A ⊆ (E ∪ F ∪ C) × (E ∪ F ∪ C);
Elements of E, F, C are respectively named events, functions and connectors. Mapping l assigns to each connector a specific type, representing the or,
and, xor semantics.
Moreover, some conditions have to hold:
• Graph (K, C) has to denote a connected graph;
• Every function has exactly one incoming and one outcoming edge;
• There exists at least one start and one end event. Start events are denoted by having exactly one outgoing edge and no ingoing edge. Viceversa, end events have no outgoing edge and one ingoing edge;
• Each event that is not start or end has got one incoming and one outcoming event;
• Each event can be followed only by functions and each function only by
events. Events can be followed by multiple functions (and functions by
multiple events) if there are intermediate connectors.
• Events cannot be followed by an or or xor split node.
3.1.4
π-calculus
One of the main problems of Workflow Nets is that they have no suitable methods to compose several nets by concurrent operators. The concurrency can
be anyway obtained by clever artifices. Unfortunately such artifices make the
model more complex with consequences of the formal verification, which becomes more difficult. By using such artifices, verification of a large model may
be computationally infeasible. The use of π-calculus overcomes the problem:
it provides tools for building high-level system by composing its sub-systems
using concurrency operators.
The π-calculus was introduced by Milner [100]. so as to represent concurrent mobile systems and their interactions. The term mobility refers to the
way in which process execution evolves. Milner began studying how computer
processes are embodied in computer systems and networks. He observed that
computer processes merge together elements for computing and for communicating. As result, processes are made known only through the data exchanged.
For instance, CPU computations are shown to external components for the information stored into the registers.
28
CHAPTER 3. LITERATURE REVIEW
The syntax of π-calculus
π-calculus is a CCS flavour and, as CCS, is based on the concept of name:
channels to make communicate different sub-systems are named as well as
variables and data are. The important improvement with respect to CCS is
that π-calculus does not distinguish among the names of the different elements.
Therefore, it is possible to send through channels a name representing another
channel. The receiver can, then, parameterise the communication channel on
the basis of the name returned. In π-calculus everything is considered as
a process that exchanges data with other processes exclusively by channels.
Specifically, here we are referring to to polyadic π-calculus , an extended version
that allows to send and receive tuple of values through channels. The logic
conjunction points between processes and channels are named ports.
In this section, processes are always uppercase where names are lowercase. Moreover m
e = (m1 , m2 , . . . , mn ) refers to any sequence of names. The
following constructs are the basic of π-calculus:
The input prefix. Process a(~x).P receives the sequence ~x of names on the
port a; then, it behaves as P .
The output prefix. Process a(~x).P sends the sequence ~x of names on the
port a; then, it behaves as P .
The summation. Process P1 + P2 behaves in a way that either P1 or P2 is
performed. The choice is nondeterministic and works similarly to the
nondeterministic choice between actions of ConGolog and IndiGolog.
The composition. Process P1 | P2 performs both process P1 and P2 . Moreover, both are performed in parallel
Q and can communicate with each
other by channels. Abbreviation m
i=1 (Pi ) = P1 |P2 | . . . |Pm denotes the
composition of m processes.
The restriction. Process (νy)P behaves like P but where y is a so-called
restricted name. That is to say y cannot be a channel for communicating
with the external environment (for example other processes).
The matching. The [x = y].P process behaves like P if x and y are the same
name. Otherwise, it behaves like the 0 process, that is the process doing
nothing
The replication. The !P process behaves like the one obtained by reexecuting process P an arbitrary deal of times.
Moreover, expression P [~a/~b] in the π-calculus refers to the process obtained
from P by substituting each name ai ∈ ~a for each name bi ∈ ~b.
3.1. PROCESS MODELLING LANGUAGES
29
Modeling workflow using π-calculus
A first significant effort in modelling process in π-calculus is given in [44]. The
approaches to formally model processes by π-calculus share the idea everything
is a process: resources, activities, work lists and so on. The interaction between
process participants and the engine is also modeled in this way. In our opinion,
that fine granularity is not needed, but, rather, it causes the production of
specifications which are less readable.
Workflows an alternative approach, which produces specifications that are
more slender (and, hence, more readable) than what generated by the aforementioned approach. In addition, this approach seems to be more solid and
feasible as the paper introduces a mapping in π-calculus for several different
control-flow patterns. In [114], every activity is an independent π-calculus process and coming-before relationships are modelled by values read and written
on channel ports. The complete process definition for a basic activity A is:
def
A = x.[~a = ~b].τA .y.0
That means a process receives a trigger through port x mapping to an event
(e.g., the completion of a preceding activity). Then, the process makes a
certain comparison [~a = ~b], performs some internal work τA and, later, notifies
the completion writing on a certain channel port y. Of course, that is the
case of a single activity in a sequence. In general, an activity can be enabled
only after several complete (say m); in addition, the completion can enable o
subsequent activities. Therefore, supposing also n conditions to be checked,
the general formalisation for an activity A is the following:
def
A = {xi }m
ai = b~i ]}ni=1 .τA .{y}oi=1 .0
i=1 .{[~
In this way, all of basic control flow patterns can be mapped. A more comprehensive discussion and mappings is entrusted to paper [114]. Finally, the whole
process specification is built by composing all different nodes A1 , . . . , An :
def
P =
m
Y
Ai
i=1
As far as checking for soundness, Puhlmann [113] provides means to characterize different soundness properties, such as relaxed and classical, using
bisimulation equivalence. UppSala Universitet has developed independently
the MWB (Mobility Workbench [138]) for manipulating and analyzing any mobile concurrent systems described in π-calculus, including business processes.
30
CHAPTER 3. LITERATURE REVIEW
Language
Formal Structural
Soundness
Semantic
Soundness
Non-Atomic
Execution
Planning
Workflow
Net
YAWL
EPC
π-calculus
Graphbased
languages
Yes
Yes
No
No
Yes
No
Yes
Semi
formal
Yes
Partially
No
Yes
No
No
No
Partially
Yes
No
Yes
No
Early
Stage
No
No
No
No
State vs
Event
Based
State
State
Event
Event
Event
Table 3.1: A comprehensive comparison
3.1.5
Discussion
Table 3.1 summarizes the assessment made in the light of the requirements
described at the beginning of this section. A analysis of the results assessed is
given below, where every language is discussed separately taking requirements
into account. As pointed out by the table, no language addresses all the
features that the framework proposed in this thesis require, including the
necessity of being based on a notion of state.
Workflow Nets. It is a sound formalism for representing business processes
which is formal enough to enable reasoning and process verification. Current
research directions in term of verification have been just limited to check the
structural soundness according to Definition 3.1. Such a checking does not
consider the actual environment where processes are enacted. As a consequence, when running, process instance may get stuck since some activities
might require certain environmental conditions that do not currently hold. No
work is currently trying to address such a kind of execution monitoring.
In theory, Workflow nets is suitable as process modelling language for the
adaptation framework in this thesis. Indeed, pre- and post-conditions can be
formally specified as well as it is precisely and unambiguously defined when a
certain state is final and how to pass from a state to another one.
But there are some drawbacks which limit its application:
1. When transitions fire, tokens are consumed from their input places and
others are put on the output places. These steps are atomic in the sense
that nothing can happen in the meanwhile (e.g., firing of other transitions). Considering a transition firing represents an activity performance, such atomicity is somehow in contradiction. During the activity
execution, events can happen and change the environment, and that
may cause started activities to be unable to complete. Our adaptation
framework has to be able to monitor even during the activity perfor-
3.1. PROCESS MODELLING LANGUAGES
31
mances. Workflow nets cannot be directly used, unless some artifices
are introduced, which would make explode the complexity of the model.
2. Algorithmically, it would allow designers to define processes, and probably also to monitor and recovery as there exist researches on Petri-Net
based planners (e.g., [63]). Nevertheless, IndiGolog allows, in addition,
one to encode the whole framework only by itself (see next chapter).
Indeed, the aspect of monitoring and recovering is directly modelled
through IndiGolog procedures in a very natural way. Workflow Net is
a “low-level” formalism and, as such, it cannot achieve easily the same
results. For instance, we could concretely code the whole framework
through the sole IndiGolog interpreter (see Chapter 5) whereas using
Workflow Nets would have required different parts to be developed using different languages, and additional effort would have been needed,
without gaining concrete benefits. It is also worthy saying that Petri-Net
based planning techniques are not as mature and efficient as those based
on logics. Indeed, as far as our knowledge, there exist no planners which
take input any form of PN coding.
YAWL. An extended workflow net (C, i, o, T, F, split, join, rem, nof i) can
be reduced to an usual workflow net (C, T, F ), obviously loosing the additional
features. It follows that most of the limitations of workflow nets also hold for
YAWL. The YAWL formalism overtakes only the limitation concerning the
modelling the temporal aspects of activity executions, since it models explictly
the different states in which an activity can be. On the other hand, there exist
no planners for YAWL.
EPC. Event-driven Process Chains allow process designers to model processes from a user-oriented perspective. The alternation of events and functions yields to process representations that may become very complex. That
is the reason why they are generally used to model processes at a high level,
where representations generally remain reasonably small. This high-level process specifications are meant to be read and evaluated by humans and cannot
serve as input for Process Management Systems.
As also argued in [74], the informal EPC’s nature cannot be directly translated into a proper semantics. Describing in details pre- and post-conditions
of functions (i.e., tasks/activities to perform) would result in huge specifications. Moreover, there are no standards to formalise conditions in a way they
can be used for reasoning, such as our adaptation framework would require.
As a matter of fact, most of the research work on EPCs is currently addressing to the problem of verifying the structural soundness of specifications. For
instance, Mendling et al. [97] make an analysis of 604 SAP reference models
32
CHAPTER 3. LITERATURE REVIEW
in order to look whether some of them contain structural errors. Specifically,
their correctness criterion is based on relaxed soundness. Indeed, EPCs are
argued as being frequently used to capture the expected behaviour without
considering unwilling execution leading to deadlocks or livelocks. Mendling et
al. [96] define a new enriched formal definition of EPCs which enables to check
for the structure soundness.
Apart from the verification of formal correctness at design-time, a process
represented in EPC has to be executed at run-time, achieving the outcomes
it has been devised for. There exists no research work for verifying whether
such processes achieve the expected results in the actual real-world scenario.
We are confident to state that would be in any case hard since it is difficult
to enter into the activity semantics since pre- and post-conditions of activities
are not formally represented in a proper way.
π-calculus. it is a formal and sound formalism to represent business processes and reason on it. As for other formalisms, most of research work
has been focused on verifying the structural soundness of process specification. But, nothing is told on how to monitor in at run-time the progression
of running instances specified in π-calculus and check whether they can successfully terminate in the current state of the world. Moreover, π-calculus
is event-based: transitions are modeled explicitly where the states between
subsequently transitions are only modeled implicitly. That introduces many
critical issues. Firstly, it is difficult to monitor deviations since these concern
the gap between the state expected and that monitored. Secondly, planners
are generally based on state and, hence, it is needed to rebuild a certain definition of states from the message model of π-calculus. This step, which can
be made, requires additional effort without gaining a real benefit.
Graph-based languages. These drawbacks are also shared with most of
graph-based languages. Graph-based languages, which are not described in
detail in the following sub-sections, are a collective name for some languages
which are used by or, simply, meant for commercial or prototypal Process
Management Systems in order to define process specifications. This class
comprises, for instance, BPMN [105] or those used by AgentWork [101] or
AdeptFlex [117]. Process elements, such as activities, joins or splits, are represented by nodes that are connected by proper edges. These languages are
typically event-based, that represents a serious issues as said before, and typically allow designers to represent specifications in a more formal way than
EPC. However, these languages are anyway still too informal to check whether
deviations happened in the environment that require a recovery plan.
3.2. RELATED WORKS ON ADAPTABILITY
33
Process
Adaptation
Adaptation of
Process
Specifications
Manual
Adaptation
Migration of running
instances
Checking for
structural and semantic
soundness
Ad-hoc
Adaptation of
Single Instances
Manual
Adaptation
Automatic
Adaptation
Unplanned
Pre-planned
Our Approach
Figure 3.4: A taxonomy of the adaptation techniques
3.2
Related Works on Adaptability
This section discusses the levels of support for process adaptability/flexibility and exception handling in several of the leading commercial products and
some academic prototypes. Figure 3.4 shows a taxonomy of the adaptation
techniques. Changes to a process can be classified in two main groups: evolutionary and exceptional changes.
Orthogonally, there is the issue of verifying the soundness of the updated process specifications and/or of running instances adapted to occurred
changes. Whereas there are a lot of research on the structural soundness,
as widely discussed in Section 3.1, little work has been done on the semantic soundness of process changes [125, 87]. The most valuable approach is
implemented in ADEPT (see later in this section), but activity conflicts are
defined manually and not inferred automatically over the activity pre- and
post-conditions.
Evolutionary changes concern a planned migration of a process to an updated specification which, for instance, implements new legislations, policies
or practices in business organisations, hospitals, emergency management, etc.
Typically the inclusions of new evolutionary aspects are made manually by
the process designer. When dealing with process specification changes, there
34
CHAPTER 3. LITERATURE REVIEW
try
{
try
{
activity1;
activity2;
activity3; || subProcess();
}
catch(Disconnection) { ... }
catch(Devices Down) { ... }
catch(Exception1) { ... }
catch(Exception2) { ... }
catch(Exception3) { ... }
catch(Exception4) { ... }
Pre-planned
activity1;
activity2;
activity3; || subProcess();
}
catch(AnyException)
{ /* Generic method */ }
Unplanned
Table 3.2: A Java-like model for describing Automatic Adaptation
is the issue of managing running instances, and, possibly, making migrate such
instances to the updated specification. Simple solutions, such as aborting running instances or continuing with the old specification, may not be working
for obvious reasons. Aborting, when possible, would cause some work already
done to be lost, and using old specification may result in applying old legislation and, hence, would be inappropriate and impracticable. Casati et al. [19]
define a complete, minimal, consistent and sound set of modification primitives to modify specifications. This paper describes also the constraints under
which running instances can be migrated to new updated specifications. Unfortunately, it does cover the issue of applying automatically the changes, and,
hence, a domain expert is supposed to manually apply them. Weske [142] goes
beyond and provides a technique that is able to adapt running cases by adding,
deleting and moving activities in order to adhere to a new specification. This
technique has been, then, implemented for WASA [139]. Similarly to Casati
et al. [19], Weber et al. [140] suggest a set of change patterns (such as inserting, deleting or moving process fragments) that may be useful when modifying
specifications. The set proposed is wider than what is in [19]; in addition, the
paper reports how many of change patterns are actually implemented in the
most spread Process Management Systems.
On the other side, there are the exceptional changes which are characterised by events, foreseeable or unforeseeable, during the process instance
executions which may require instances to be adapted in order to be carried
out. Since such events are exceptional, process specifications do not need any
modifications. There are two ways to handling exceptional events. The adap-
3.2. RELATED WORKS ON ADAPTABILITY
35
tation can be manual : once events are detected, a responsible person, expert
on the process domain, modifies manually the affected instance(s). The adaptation can be automatic: when exceptional events are sensed, PMS is able to
change accordingly the schema of affected instance(s) in a way they can still
be completed.
Automatic adaptation techniques can be broken down in two further
groups: pre-planned or unplanned. Using pre-planned adaptation techniques, a
responsible person should foresee at design-time all possible exceptional events
and, for each of them, should define a proper handling strategy. This kind of
pre-planned approach is named Flexibility by design in [123]. The same paper
introduces also Flexibility by underspecification: in certain cases, the designer
may be aware of that certain exceptional events may occur, but the recovery
strategies cannot be known in advance but only defined at run-time. Several proposals have been made to define pre-planned policies, such as Control
ICN [16] or Event-Condition-Action Rules [20]. Unplanned adaptation techniques, conversely, do not require to anticipate all of the possible expected
events but there exists only one strategy, which is able to recover from any
deviation.
Table 3.2 is meant to clarify the differences between the two technique
groups by using the Java metaphor. The left-hand side represents pre-planned
adaptation where the process is put in a try block and there exists several
catches, one for each expected exceptional event. Each catch block implements the strategy for recovering from the corresponding event. The righthand side aims at describing Unplanned adaptation, where, by contrast, only
one catch exists, which describes the generic strategy to recovery from any
possible.
The remaining of this section is devoted to enumerate how some commercial products and academic prototypes address process adaptation. Table 3.2
summarises the comparison of existing approaches as far as concern ad-hoc
adaptation of single instances. The last row shows how SmartPM is envisioned in this categorisation. Rows having no checkmark refer to the PMSs
that do not allow to change directly running instances during the execution.
In those systems, ad-hoc adaptation is done undirectly: such PMSs allow
to modify specifications and corresponding changes are, then, propagated to
running instances.
36
CHAPTER 3. LITERATURE REVIEW
Product
YAWL
COSA
Tibco
WebSphere
SAP
OPERA
ADEPT2
ADOME
AgentWork
CBRFlow
WASA
SmartPM
Manual
Pre-planned
X
X
X
X
X
X
X
The right policy chosen at run-time
X
X
X
X
X
Unplanned
X
Table 3.3: Features provided by the most spread PMSs for ad-hoc adaptation
of single instances
Discussion. SmartPM can be classified as belonging to the group of adaptation strategies that are automatic and unplanned. We are not interested
in the problem of migrating process instances to updated models. Indeed,
such a problem is generally related to long-term processes; pervasive processes, such as those of emergency management or pervasive health-care, are
short-term as they complete relatively quickly. For instance, the process of
saving people under debris or of provide medical assistance to injured people
has to be carried out very quick to limit the risks for persons.
We cannot manage adaptation by pre-planned techniques or manually for
some reasons. Firstly, in pervasive scenarios, the environment is continually
changing and, therefore, events that require processes to be adapted are not
exceptional but very frequent. Therefore, it is not feasible to think about
a responsible person who is devoted to adapt manually process instances
on very frequent time basis. Moreover, this would delay the completion
of process instances, and that should be avoided as much as possible since
typically pervasive processes are time-constrained.
Pre-planned techniques should be avoided, as well. Indeed, pervasive process management systems, such as SmartPM, are expected to be used in
environments where there occur a great deal of possible exceptional events.
Foreseeing all of them is not feasible; even if we handled many exceptional
events, we would forget to consider others that may occur.
The section has broadly discussed existing approaches on the concern of
adapting processes, and Table 3.2 summarises the discussion. It is easy to
see almost all approaches are addressing the problem by manually adapting
the process instances or through various flavours of pre-planned techniques.
The lack of unplanned techniques can be partially motivated by the fact
that the majority of PMSs are intrinsically intended for traditional business
processes where either every change or every recovery policy is generally
subject to the chief approval.
3.2. RELATED WORKS ON ADAPTABILITY
37
YAWL provides one the most interesting approaches for adaptability [1]
(more details about YAWL are given in Section 7.2.6). In YAWL each activity
may be associate to an extensible repertoire of actions, one of which is chosen
at run-time to carry out the activity. These actions are named “worklet”: a
worklet is a small, self-contained, complete process which handles one specific
task (action) in a large and composite process. On the basis of hierarchical
Ripple-Down rules defined contextually, an activity is dynamically replaced by
the most appropriate of the available worklets. This approach is pre-planned :
the substitution rules are defined at design-time, possibly manually updated
at run-time, and never inferred on the process instance state.
WASA systems is totally manual [139] and concerns modifying process specification according to evolutionary changes. As discussed above in this section,
It focuses also on checking whether running instances can be migrated to update specifications [142].
COSA allows to associate sub-processes to external “triggers” and
events [27]. But the adaptation policies are pre-planned : associations triggers to sub-processes have to be defined at design-time. COSA allows also
manual instance adaptations at run-time by using change patterns such as
reordering, skipping, repeating, postponing or terminating steps.
Tibco iProcess Suite provides constructs called “event nodes” [127]. They
allow designers to define handlers for expected events to be activated when
they occur. Policies comprise the possibility of suspending processes either
indefinitely or wait until a deadline occurs. All of exceptions for which no
handler exists are forward to the “default exception queue” to be handled
manually.
WebSphere MQ Workflow supports deadlines and, when they occur,
branches to pre-planned exception paths and/or sends notification messages
to specific administrator [68]. Administrators can manually suspend, restart
or terminate processes, as well as they can reallocate tasks.
SAP Workflow allows to define exception handling processes [73]. Although they are defined at design-time, they cannot be associated to exceptional events at that time. At run-time, when an event occurs, these handling
processes are proposed to the administrator, who can manually select the most
appropriate one. There is no way to define properties in order to filter out
some handlers on the basis of the case and the occurred event.
38
CHAPTER 3. LITERATURE REVIEW
OPERA prototype allows to associate at design-time an handler for a
certain event to single tasks [59]. That means such an handler is launched,
only when the event occurs and blocks those tasks. It allows also to define
more general handlers for certain events to associate to all tasks. When an
exceptional event occurs, the process is stuck and, if any, the corresponding
handler for that event and that task is invoked. If it cannot recover the
execution or there is no specific handler for that task and event, the general
handler for that event is used. If it does not exist or cannot solve, manual
adaptation can be used.
ADEPT is able to handle both exceptional events and evolutionary
changes [20]. All of changes can be achieved by manual interventions, although ADEPT provides a minimal support to facilitate such operations.
As far as evolutionary changes, it supports also the feature of migrating
running instances to the updated specifications. Version 2 introduces new
features [118, 54], such as the structural and semantic correctness of the
changes [87], but the ADEPT approach should be still considered as totally
manual. ADEPT is one of the few works dealing with the issues of checking for
semantic correctness, and this aspect is very valuable for Process Management
Systems like SmartPM where adaptation is willing to be completely automatic.
Unfortunately, the semantic correctness relies on a significant effort for configuration. Indeed, checking is not computed automatically on the basis of preand post-conditions of activities but it relies on the semantic constraints that
are defined manually by designers at design-time. This is also related to the
semi-formality of the process modelling language of ADEPT, which does not
allow automatic reasoning. ADEPT2 relies on two relations between pairs of
activities, namely dependency and exclusion. These relations allow to specify
respectively (i) what activities are depending (the first in pairs can be executed only if the second has been also executed) and (ii) what activities are
not compatible in achieving the outcomes of process instance.
ADOME system provides some support for manual changes as well as for
pre-planned policies [26]. An exception handler is linked to a certain task,
instead of being associated to events. When a given exceptional event is such
that it makes impossible to execute a certain task, the recovery policy for that
task is used, if any.
AgentWork provides the ability to modify process instances by dropping
and adding individual tasks based on Event-Condition Rules rules [101]. They
are formally defined by ACTIVETFL, which combines Frame Logic, a logic
based on the notion of objects, with some features available for temporal logics.
3.3. CASE HANDLING
39
Consequently, AgentWork is comprised in the pre-planned approaches. Since
the graph-based model used by AgentWork is not very formal, there is no
way to check for semantic soundness and conflicts. Therefore, some rules may
generate incompatible recovery actions.
CBRFlow uses a case-based reasoning approach to support adaptation of
workflow specifications to changing circumstances [141]. Case-based reasoning
(CBR) is the process of solving new problems based on the solutions of similar
past problems: users are supported to adapt process specifications by taking
into account how other specifications have been modified in the past to follow
the evolutionary changes.
3.3
Case Handling
Case Handling aims at providing less rigidity than usual Process Management
Systems by leveraging process orientation and data orientation to route the
execution of processes [129, 130]. Flower [12] is one of the few systems that
use the Case Handling paradigm
Case Handling meets the requirements in some application scenarios where
process participants are highly skilled. In such cases, organizations want participants to be more autonomous in driving and control the case executions.
Being more rigid, traditional Process Management technologies does not allow
expert participants to make deviate processes from the prescribed schema. In
some scenarios, from the one side business process management is still valuable, but from the other side participants should be able to perform a broad
range of activities and, consequently, drive how processes are carried on.
Let us consider the scenarios of a process for the delivery of certain goods.
At an certain stage, participants may be required to fill in a certain form to
provide some data that also include some information for the mail delivery
(e.g., the mail address, the post code). The next step consists in sending
the goods and could be done as soon as delivery information is available. By
traditional Process Management Technology, goods can be sent only when the
whole form is filled in, including information which is not directly needed for
the delivery. Case Handling approaches allow to enable the next step as soon
as the delivery information is put into the form and committed, although the
form itself is not yet wholly completed. Indeed, the work-item enablement
is mainly drive by data availability in the case handling approach, whereas
process management steers enablement by the control flow (coming-before
relationship).
In case handling, every activity is associated with at least a data object
definition. There exist two main types of association: mandatory and re-
40
CHAPTER 3. LITERATURE REVIEW
stricted. If a data object is mandatory for a certain activity, then it has to
be entered in order to complete that activity. A data object is restricted to a
certain activity set if it can be entered only by activities in that set. Associations mandatory and restricted are orthogonal; for instance, if a data object
mandatory and restricted to a given activity, it means it is going to be entered
by that activity (and by none else). Data objects may also be free in the sense
that it can be entered in any moment. More information about case handling
can be read in the aforementioned papers.
From the one side, highly dynamic and pervasive scenarios could benefit
from less rigidity, which is somehow intrinsical in these settings. Furthermore,
weak activity boundaries and data-driven nature of Case Handling prevent in
several cases the need for changes. Finally, the environment of the execution of
processes running in dynamic scenarios are often continuously changing and,
hence, it is really valuable for a participant to be going over the old process
schema and decide to handle multiple activities in one go.
But, from the other side, Case Handling systems experience some shortcomings:
1. Case Handling is data-driven: a certain state is reached when some data
become available. Consequently, handling such cases has to be mainly
performed within the system itself: the activity outcomes are always
represented by the data produced. In many pervasive scenarios, such
as healthcare or emergency management processes, the main effects of
activities are not represented in the systems themselves. For instance,
the outcomes of saving or aiding victims are not naturally definable as
data updated or manipulated.
2. The nature of case handling makes quite difficult to modify cases which
are already running as also argued in [129, 130, 56]. Generally, adaptability in case handling is designated to affect new cases going to be
created after the modification. In highly dynamic and pervasive scenarios, changes should affect only running cases as they are fired by
exogenous events which somehow invalidated them.
3. Activities are no longer pushed to participants by the system. The system becomes a discreet support to accomplish activities rather than a
way to control the case progressions in a mechanic way. This might be
problematic in some high dynamic and pervasive scenarios where process participants have to be continuously pushed by PMS to perform the
work assigned as fast as possible.
Therefore, the Case Handling approach is not actually feasible for the scenarios at which we are aiming. Günther et al. [56] start discussing a possible
3.3. CASE HANDLING
41
integration of case handling and typical Process Management. They start
exploring how ideas from case handling can be introduced into Process Management Systems to gain the corresponding benefits and what changes would
be required. Unfortunately, the discussion is still at an initial stage and,
hence, unable to address the concerns of process management for pervasive
scenarios.
42
CHAPTER 3. LITERATURE REVIEW
Chapter 4
A General Framework for
Automatic Adaptation
This chapter is devoted to describe a general conceptual framework for
SmartPM, the adaptive Process Management System (PMS) object of this
thesis. This chapter aims at presenting a practical technique for solving adaptation, which is based on planning in AI. Moreover, we prove the correctness
and completeness of the approach.
In SmartPM, process specifications associates every task to a set of capabilities that the service going to execute it has to provide. Every task can
be assigned to a given service that provides all capabilities required if a set
of conditions hold. Conditions are defined on control and data flow (e.g., a
previous task has to be finished, a variable needs to be assigned a specific
range of values, etc.). This kind of conditions can be somehow considered as
“internal”: they are handled internally by the PMS and, thus, easily controllable. Another type of conditions exists, the “external” ones: they depend on
the environment where process instances are carried on. These conditions are
more difficult to keep under control and a continuous monitoring to detect
discrepancies is required. Indeed we can distinguish between a physical reality
and a virtual reality [31]; the physical reality is the actual values of conditions,
whereas the virtual reality is the model of reality that PMS uses in making
deliberations. A PMS builds the virtual reality by assuming the effects of
tasks/actions fill expectations (i.e., they modify correctly conditions) and no
exogenous events break out, which are capable to modify conditions.
When the PMS realizes that one or more events caused the two kinds of
reality to deviate, there are three possibilities to deal with such a discrepancy:
1. Ignoring deviations – this is, of course, not feasible in general, since the
new situation might be such that the PMS is no more able to carry out
the process instance.
43
44
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
2. Anticipating all possible discrepancies – the idea is to include in the process schema the actions to cope with each of such failures. As discussed
in Chapter 3, most of PMSs use this approach. For simple and mainly
static processes, this is feasible and valuable; but, especially in mobile
and highly dynamic scenarios, it is quite impossible to take into account
all exception cases.
3. Devising a general recovery method able to handle any kind of exogenous events. As discussed in Chapter 3, the process is defined as if
exogenous actions causing deviations cannot occur, that is everything
runs fine (the try block). Whenever the execution monitor (i.e., the
module intended for execution monitoring) detects discrepancies leading
the process instance not to be terminable, the control flow moves to the
(unique) catch block. The catch block activates the general recovery
method to modify the old process P in a process P 0 so that P 0 can terminate in the new environment and its goals are included in those of P .
Here the challenge is to automatically synthesize P 0 during the execution
itself, without specifying a-priori all the possible catches.
In summary, this chapter aims (i) at introducing a general conceptual
framework in accordance with the third approach previously described, and
(ii) at presenting a practical technique, in the context of this framework, that
is able to automatically cope with anomalies. We prove the correctness and
completeness of such a technique, which is based on planning techniques in
AI. This chapter extends the framework published in paper [40] and revises it
in the light of the subsequent operationalisation which was devised after the
paper.
Section 4.1 introduces some preliminary notions, namely Situation Calculus and IndiGolog, that are used as proper formalisms to reason about processes
and exogenous events. This section is not meant to give an all-comprehensive
and very formal introduction of the notions. It aims mostly at giving an overall
insight to those who are not very expert on such topics.
Section 4.2 presents the general conceptual framework to address adaptivity in highly dynamic scenarios, and introduces a running example. Section
4.3 presents the proposed formalization of processes, and Section 4.4 deals
with the adaptiveness. Section 4.5 presents the specific technique and proves
its correctness and completeness. The chapter introduces and carries on also
a concrete example, which is continually extended to cover and explain better
the different concepts introduced. This example will be, then, operationalised
in the Chapter 5.
4.1. PRELIMINARIES
4.1
45
Preliminaries
SmartPM uses the situation calculus (SitCalc) to formalize adaptation. The
SitCalc is a logic formalism designed for representing and reasoning about
dynamic domains [119].
We will not go over the situation calculus in detail; we merely note the
following components: there is a special constant S0 used to denote the initial
situation, namely that situation in which no actions have yet occurred; there
is a distinguished binary function symbol do, where do(a, s) denotes the successor situation to s resulting from performing the action a; relations (resp.
functions) whose values vary from situation to situation, are called fluents,
and are denoted by predicate (resp. function) symbols taking a situation term
as their last argument. There is a special predicate P oss(a, s) used to state
that action a is executable in situation s.
We
abbreviate
with
do([a1 , . . . , an−1 , an ], s)
the
term
do(an , do(an−1 , ..., do(a1 , s))), which denotes the situation obtained from
s by performing the sequence of actions a1 , . . . , an .
Within this language, we can formulate domain theories which describe
how the world changes as the result of the available actions. Here, we use
action theories of the following form:
• Axioms describing the initial situation, S0 .
• Action precondition axioms, one for each primitive action a, characterizing P oss(a, s).
• Successor state axioms, one for each relational fluent F . The successor
state axiom for a particular fluent F captures the effects and non-effect
of actions on F and has the following form:
F (~x, do(a, s)) ⇔ ΦF (~x, a, s)
(4.1)
where ΦF (~x, a, s) is a formula fully capturing the truth-value of fluent F
on objects ~x when action a is performed in situation s (~x, a, and s are
all the free variables in ΦF ).
• Unique names axioms for the primitive actions.
• A set of foundational, domain independent axioms for situations Σ as in
[119].
A certain formula is uniform in situation s if s is the only situation term
that appears in it. Sometimes, we use situation-suppressed formulas; these
are uniform formulas with situation arguments suppressed (e.g. G denotes
46
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
Construct
a
φ?
δ1 ; δ2
δ1 | δ2
π x. δ
δ∗
if φ then δ1 else δ2
while φ do δ
δ1 k δ2
δ ||
hφ → δi
proc P (~x) δ endProc
~
P (θ)
Σ(δ)
Meaning
primitive action
wait for a condition
sequence
nondeterministic branch
nondeterministic choice of argument
nondeterministic iteration
conditional
while loop
concurrency with equal priority
concurrent iteration
interrupt
procedure definition
procedure call
search operator
Table 4.1: IndiGolog constructs
the situation-suppressed expression for G(s)). Finally, we can introduce an
ordering among situations:
s ≤ s0 ⇔ ∃[a1 , . . . , an−1 , an ].do([a1 , . . . , an−1 , an ], s) = s0
On top of these theories of action, one can define complex control behaviors by means of high-level programs expressed in Golog-like programming
languages. Specifically we focus on IndiGolog [121], which provides a set of
programming constructs sufficient for defining every well-structured process
as defined in [72]
IndiGolog is a logic-based languages born to program the behaviour of intelligent agents and robots. It derives from ConGolog to which it adds basically
the lookahead search operator. Such operator allows to simulate the execution
of a process with the aim of searching for a successful termination before actually performing the program in the real world. In its, turn ConGolog extends
the original Golog by introducing construct for current execution of different
operations. Table 4.1 summarizes the constructs of IndiGolog used in this
thesis.
In the first line, a stands for a situation calculus action term whereas,
in the second line, φ stands for a formula over situation calculus predicates
including fluents, which are, then, evaluated in the current situation when
IndiGolog program execution reaches φ
The constructs listed included some nondeterministic ones. These include
(δ1 | δ2 ), which nondeterministically chooses between programs δ1 and δ2 ,
π x. δ, which nondeterministically picks a binding for the variable x and per-
4.1. PRELIMINARIES
47
forms the program δ for this binding of x, and δ ∗ , which performs δ zero or
more times. π x1 , . . . , xn . δ is an abbreviation for π x1 . . . . .π xn δ.
The constructs if φ then δ1 else δ2 and while φ do δ are the synchronized
versions of the usual if-then-else and while-loop. They are synchronized in
the sense that testing the condition φ does not involve a transition per se:
the evaluation of the condition and the first action of the branch chosen are
executed as an atomic unit. So these constructs behave in a similar way to
the test-and-set atomic instructions used to build semaphores in concurrent
programming.
We also have constructs for concurrent programming. In particular
(δ1 k δ2 ) expresses the concurrent execution (interpreted as interleaving) of
the programs δ1 and δ2 . Observe that a program may become blocked when
it reaches a primitive action whose preconditions are false or a wait action φ?
whose condition φ is false. Then, execution of (δ1 k δ2 ) may continue provided
another program executes next. Another concurrent programming construct is
(δ1 ii δ2 ), where δ1 has higher priority than δ2 , and δ2 may only execute when
δ1 is done or blocked. Finally, an interrupt hφ → δi has a trigger condition φ,
and a body δ. If the interrupt gets control from higher priority processes and
the condition φ is true, the interrupt triggers and the body is executed. Once
the body completes execution, the interrupt may trigger again. h~x : φ → δi
is an abbreviation for h∃~x.φ → π~x.δi.
Finally, the search operator Σ(δ) is used to specify that lookahead should
be performed over the (nondeterministic) program δ to ensure that nondeterministic choices are resolved in a way that guarantees its successful completion.
Formally two predicates are introduced to specify program transitions:
• T rans(δ 0 , s0 , δ 00 , s00 ), given a program δ 0 and a situation s0 , returns (i) a
new situation s00 resulting from executing a single step of δ 0 , and (ii) δ 00
which is the remaining program to be executed.
• F inal(δ 0 , s0 ) returns true when the program δ 0 can be considered successfully completed in situation s0 .
The predicate Trans for programs without procedures is characterized by
the following set of axioms:
1. Empty program:
Trans(nil, s, δ 0 , s0 ) ⇔ false.
2. Primitive actions:
Trans(a, s, δ 0 , s0 ) ⇔ Poss(a[s], s) ∧ δ 0 = nil ∧ s0 = do(a[s], s).
48
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
3. Test/wait actions:
Trans(φ?, s, δ 0 , s0 ) ⇔ φ[s] ∧ δ 0 = nil ∧ s0 = s.
4. Sequence:
Trans(δ1 ; δ2 , s, δ 0 , s0 ) ⇔
∃γ.δ 0 = (γ; δ2 ) ∧ Trans(δ1 , s, γ, s0 ) ∨ Final(δ1 , s) ∧ Trans(δ2 , s, δ 0 , s0 ).
5. Nondeterministic branch:
Trans(δ1 | δ2 , s, δ 0 , s0 ) ⇔ Trans(δ1 , s, δ 0 , s0 ) ∨ Trans(δ2 , s, δ 0 , s0 ).
6. Nondeterministic choice of argument:
Trans(πv.δ, s, δ 0 , s0 ) ⇔ ∃x.Trans(δxv , s, δ 0 , s0 ).
7. Nondeterministic iteration:
Trans(δ ∗ , s, δ 0 , s0 ) ⇔ ∃γ.(δ 0 = γ; δ ∗ ) ∧ Trans(δ, s, γ, s0 ).
8. Synchronized conditional:
Trans(if φ then δ1 else δ2 endIf, s, δ 0 , s0 ) ⇔
φ[s] ∧ Trans(δ1 , s, δ 0 , s0 ) ∨ ¬φ[s] ∧ Trans(δ2 , s, δ 0 , s0 ).
9. Synchronized loop:
Trans(while φ do δ endWhile, s, δ 0 , s0 ) ⇔
∃γ.(δ 0 = γ; while φ do δ) ∧ φ[s] ∧ Trans(δ, s, γ, s0 ).
10. Concurrent execution:
Trans(δ1 k δ2 , s, δ 0 , s0 ) ⇔
∃γ.δ 0 = (γ k δ2 ) ∧ Trans(δ1 , s, γ, s0 ) ∨ ∃γ.δ 0 = (δ1 k γ) ∧ Trans(δ2 , s, γ, s0 ).
11. Prioritized concurrency:
Trans(δ1 ii δ2 , s, δ 0 , s0 ) ⇔
∃γ.δ 0 = (γ ii δ2 ) ∧ Trans(δ1 , s, γ, s0 ) ∨
∃γ.δ 0 = (δ1 ii γ) ∧ Trans(δ2 , s, γ, s0 ) ∧ ¬∃ζ, s00 .Trans(δ1 , s, ζ, s00 ).
4.2. EXECUTION MONITORING
49
12. Concurrent iteration:
Trans(δ || , s, δ 0 , s0 ) ⇔ ∃γ.δ 0 = (γ k δ || ) ∧ Trans(δ, s, γ, s0 ).
By using T rans and F inal we can define a predicate Do(δ, s, s0 ) which,
given the starting situation s and a program δ, holds for all possible situations
s0 that result from executing δ starting from s such that situations s0 are final
with respect to program δ 0 remaining to execute. Formally:
Do(δ, s, s0 ) ⇔ ∃δ 0 .T rans∗ (δ, s, δ 0 , s0 ) ∧ F inal(δ 0 , s0 )
where T rans∗ is the definition of the reflective and transitive closure of Trans.
Notice that there may be more than one resulting situation s0 since IndiGolog
programs can be non-deterministic (e.g., due to concurrency).
To cope with the impossibility of backtracking actions executed in the real
world, IndiGolog incorporates a new programming construct, namely the search
operator. Let δ be any IndiGolog program, which provides different alternative
executable actions. When the interpreter encounters program Σ(δ), before
choosing among alternative executable actions of δ, it performs reasoning in
order to decide for a step which still allows the rest of δ to terminate successfully. More precisely, according to [30], the semantics of the search operator
is that
T rans(Σ(δ), s, Σ(δ 0 ), s0 ) ⇔ T rans(δ, s, δ 0 , s0 ) ∧ ∃s∗ .Do(δ 0 , s0 , s∗ ).
If δ is the entire program under consideration, Σ(δ) emulates complete offline
execution.
Finally, our adaptation procedure will make use of regression (see [4]
and [119]). Let ϕ(do([a1 , . . . , an ], s)) be a SitCalc formula with situation
argument do([a1 , . . . , an ], s). Then, Rs (ϕ(do([a1 , . . . , an ], s))) is the formula
with situation s which denotes the facts/properties that must hold before
executing a1 , . . . , an in situation s for ϕ(do([a1 , . . . , an ], s)) to hold (aka the
weakest preconditions for obtaining ϕ). To compute the regressed formula
Rs (ϕ(do([a1 , . . . , an ], s))) from ϕ(do([a1 , . . . , an ], s)), one iteratively replaces
every occurrence of a fluent with the right-hand side of its successor state axiom (Formula 4.1) until every atomic formula has a situation argument that
is simply s.
4.2
Execution Monitoring
The general framework is based on execution monitoring formally represented
in SitCalc [83, 31]. After each action, the PMS has to align the internal world
50
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
Figure 4.1: Execution Monitoring
representation (i.e., the virtual reality) with the external one (i.e., the physical
reality), since they could differ due to unforeseen events.
When using IndiGolog for process management, tasks are considered as
predefined sequences of actions (see later) and processes as IndiGolog programs.
Before a process starts to be executed, the PMS takes the initial context
from the real environment as initial situation, together with the program (i.e.
the process) δ0 to be carried on. The initial situation s0 is given by first-order
logic predicates. For each execution step, the PMS, which has a complete
knowledge of the internal world (i.e., its virtual reality), assigns a task to a
service. The only assignable tasks are those ones whose preconditions are
fulfilled. A service can collect from the PMS the data which are required in
order to execute the task. When a service finishes executing the task, it alerts
the PMS of its completion.
The execution of the PMS can be interrupted by the monitor when a
misalignment between the virtual and the physical reality is sensed. When
this happens, the monitor adapts the program to deal with such a discrepancy.
Figure 4.1 illustrates such an execution monitoring. The kind of monitor described here is a specialised version of what proposed by Soutchanski
et al. [31]. At each step, PMS advances the process δ in the situation s by
executing an action, resulting in a new situation s0 with the process δ 0 remain-
4.2. EXECUTION MONITORING
51
ing to be executed. The state1 is represented as first-order formulas that are
defined on situations. The current state corresponds to the boolean values of
these formulas evaluated on the current situation.
Both the situation s0 and the process δ 0 are given as input to the monitor.
The monitor collects data from the environment through sensors (here sensor
is any software or hardware component enabling to retrieve contextual information). If a deviation is sensed between the virtual reality as represented by
s0 and the physical reality as s00 , the PMS internally generates a discrepancy
~e = (e1 , e2 , . . . , en ), which is a sequence of actions called exogenous events such
that s00 = do(~e, s0 ).2
Notice that the process δ 0 may fail to be correctly executed (i.e., by assigning all tasks as required) in s00 . If so, the monitor adapts the process
by generating a new process δ 00 that pursues at least each δ 0 ’s goal and is
executable in s00 . At this point, the PMS is resumed and the execution is
continued from δ 00 and s00 .
We end this section by introducing our running example, stemming from
project WORKPAD, described in Chapter 2.
Example 4.1. The example is meant to code a possible process for managing
the aftermath of an earthquake: a team is sent to the affected area to make an
assessment, which comprises taking some valuable photos, compiling a questionnaire and sending all these data to the headquarter. Here we assume that
it is already known which buildings have to be assessed, namely buildings A, B
and C. The team is equipped with PDAs in which some software services are
installed and members are communicating with each other through a manet
network.
For each building, an actor compiles a questionnaire by using a certain
software service, that is an specific application installed on some actor devices.
Compiling questionnaires can be done anywhere: that is, no movement is required. Then, another actor/service has to be sent to the specific building to
collect some pictures (this, conversely, requires movement). Finally, according
to the information filled in the questionnaire, a third actor/service evaluates
effectiveness of collected pictures. In order to evaluate properly the pictures, a
certain minim number of pictures is required as input: if less, the evaluation
cannot be done. Once evaluated, if pictures are judged as not effective, the
task of taking new pictures is scheduled again (as well as the evaluation of the
new pictures). When these steps have been performed for the three buildings
A, B and C, the collected data (questionnaires and pictures) are sent to the
headquarter.
1
Here we refer as state both the tasks’ state (e.g, performable, running, terminated, etc.)
and the process’ variables on which task firing and process routing is defined
2
Note that the action sequence ~e might not be the one that really occurred.
52
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
Compile Questionnaire
of building A
Compile Questionnaire
of building B
Compile Questionnaire
of building C
Move to
destination A
Move to
destination B
Move to
destination C
Take photos of
Destination B
Take photos of
Destination C
Evaluate photos
Evaluate photos
Take photos of
destination A
Evaluate photos
¬evaluationOK (A)
¬evaluationOK (B)
¬evaluationOK (C)
evaluationOK (B)
evaluationOK (C)
evaluationOK (A)
Send data
to headquarter
Figure 4.2: A possible process to be carried on in disaster management scenarios
Coordination and data exchange requires manet nodes to be continually
connected each other. But this is not guaranteed in a manet. The environment is highly dynamic, since nodes move in the affected area to carry
out assigned tasks. Movements may cause possible disconnections and, so,
unavailability of nodes, and, consequently, unavailability of provided services.
Therefore processes should be adapted, not simply by assigning tasks in progress
to other services, but also considering possible recovery of the services.
4.3. PROCESS FORMALISATION IN SITUATION CALCULUS
4.3
53
Process Formalisation in Situation Calculus
Next we detail the general framework proposed above by using Situation Calculus and IndiGolog. We use some domain-independent predicates to denote
the various objects of interest in the framework:
• service(a): a is a service
• task(x): x is a task
• capability(b): b is a capability
• provide(a, b): the service a provides the capability b
• require(x, b): the task x requires the capability b
Every task execution is the sequence of four PMS actions: (i) the assignment of the task to a service, resulting in the service being not free anymore;
(ii) the notification to the service to start executing the task. Then, the service carries out the tasks and, after receiving the service notification of the
task conclusion, (iii) the PMS acknowledges the successful task termination.
Finally, (iv) the PMS releases the service, which becomes free again. We
formalise these four actions as follows:
• Assign(a, x): task x is assigned to a service a
• Start(a, x, p): service a is allowed to start the execution of task x. The
input provided is p.
• AckT askCompletion(a, x): service a concluded successfully the executing of x.
• Release(a, x): the service a is released with respect to task x.
In addition, services can execute two actions:
• readyT oStart(a, x): service a declares to be ready to start performing
task x
• f inishedT ask(a, x, q): services declares to have completed the execution
of task x returning output q.
The terms p and q denote arbitrary sets of input/output, which depend on
the specific task. Special constant ∅ denotes empty input or output.
The interleaving of actions performed by the PMS and services is as follows.
After the assignment of a certain task x by Assign(a, x), when the service a is
ready to start executing, it executes action readyT oStartT ask(a, x). At this
54
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
stage, PMS executes action Start(a, x, p), after which a starts executing task x.
When a completes task x, it executes the action f inishedT ask(a, x, q). Specifically, we envision that actions f inishedT ask(·) are those in charge of changing
properties of world as result of executing tasks. When x is completed, PMS
is allowed in any moment to execute sequentially AckT askCompletion(a, x)
and Release(a, x). The program coding the process will the executed by only
one actor, specifically the PMS. Therefore, actions readyT oStartT ask(·) and
f inishedT ask(·) are considered as external and, hence, not coded in the program itself.
For each specific domain, we have several fluents representing the properties of situations. Some of them are modelled independently of the domain
whereas others, the majority, are defined according to the domain. If they are
independent of the domain, they can be always formulated as defined in this
chapter. Among the domain-independent ones, we have fluent f ree(a, s), that
denotes the fact that the service a is free, i.e., no task has been assigned to it,
in the situation s. The corresponding successor state axiom is as follows:
f ree(a,
¡ do(t, s)) ⇔
¢
∀x.t
=
6
Assign(a,
x)
∧
f
ree(a,
s)
∨¢
¡
¬f ree(a, s) ∧ ∃x.t = Release(a, x)
(4.2)
This says that a service a is considered free in the current situation if and only
if a was free in the previous situation and no tasks have been just assigned to
it, or a was not free and it has been just released. There exists also the domainindependent fluent enabled(x, a, s) which aims at representing whether service
a has notified to be ready to execute a certain task x so as to enabled it. The
corresponding successor-state axiom:
enabled(x,
a, do(t, s)) ⇔
¡
¢
enabled(x,
a, s) ∧ ∀q.t 6= f inishedT ask(a, x, q) ∨¢
¡
¬enabled(x, a, s) ∧ t = readyT oStartT ask(a, x)
(4.3)
This says that enabled(x, a, s) holds in the current situation if and only if it
held in the previous one and no action f inishedT ask(a, x, q) has been performed or it was false in the previous situation and readyT oStartT ask(a, x)
has been executed. This fluent aims at enforcing the constraints that the PMS
can execute Start(a, x, p) only after a performed begun(a, x) and it can execute
AckT askCompletion(a, x, q) only after f inishedT ask(a, x, q). This can represented by two pre-conditions on actions Start(·) and AckT askCompletion(·):
∀p.P oss(Start(a, x, p), s) ⇔ enabled(x, a, s)
∀p.P oss(AckT askCompletion(x, a), s) ⇔ ¬enabled(x, a, s)
(4.4)
provided that AckT askCompletion(x, a) never comes before Start(x, a, p), s.
4.3. PROCESS FORMALISATION IN SITUATION CALCULUS
55
Furthermore, we introduce a domain-independent fluent started(x, a, p, s)
that holds if and only if an action Start(a, x, p) has been executed but the
dual AckT askCompletion(x, a) has not yet:
started(a,
x, p, do(t, s)) ⇔
¡
¢
started(a,
x,
p,
s)
∧
t
=
6
Stop(a,
x)
∨
¡ 0
¢
0
@p .started(x, a, p , s) ∧ t = Start(a, x, p)
(4.5)
In addition, we make use, in every specific domain, of a predicate
available(a, s) which denotes whether a service a is available in situation s
for tasks assignment. However, available is domain-dependent and, hence, requires to be defined specifically for every domain. Knowing whether a service
is available is very important for the PMS when it has to perform assignments.
Indeed, a task x is assigned to the best service a which is available and provides
every capability required by x. The fact that a certain service a is free does
not imply it can be assigned to tasks (e.g., in the example described above it
has to be free as well as it has to be indirectly connected to the coordinator).
The definition of available(·) must enforce the following condition:
∀a s.available(a, s) ⇒ f ree(a, s)
(4.6)
We do not give explicitly pre-conditions to task. We assume tasks can
always be executed. We assume that, given a task, if some conditions do not
hold, then the outcomes of that tasks are not as expected (in other terms, it
fails).
We illustrate such notions on our running example.
Example 4.1 (cont.). We formalize the scenario in Example 4.1:
• at(loc, p, s) is true if service w is located at coordinate loc =
hlocx , locy , locz i in situation s. In the starting situation S0 , for each
service ai , we have at(ai , loci , S0 ) where location loci is obtained through
GPS sensors.
• evaluationOK(loc, s) is true if the photos taken are judged as having a
good quality, with evaluationOK(loc, S0 ) = false for each location loc.
• inf oSent(s) is true in situation s if the information concerning injured
people at destination d has been successfully forwarded to the headquarter. There holds inf oSent(d, S0 ) = false.
• photoBuild(loc, n, s) is true if in location loc n photos have been taken.
In the starting situation S0 photoBuildA(loc, 0, S0 ) for all locations loc.
Before giving the success-state axioms for the above fluents, we define some
abbreviations:
56
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
• available(a, s): which states a service a is available if it is connected to
the coordinator device (denoted by Coord) and is free.
• connected(w, z, s): which is true if in situation s the services w and z
are connected through possibly multi-hop paths.
• neigh(w, z, s): which holds if the services w and z are in radio-range in
the situation s.
Their definitions are as follows:
def
neigh(w0 , w1 , s) = at(w0 , p0 , s) ∧ at(w1 , p1 , s)∧ k p0 − p1 k< rrange
def
connected(w
0 , w1 , s) = neigh(w0 , w1 , s)
¡
¢
∨¡∃w2 .neigh(w0 , w2 , s) ∧ neigh(w2 , w1 , s)
¢
∨¡∃w2 , w3 .neigh(w
¢ 0 , w2 , s) ∧ neigh(w2 , w3 , s) ∧ neigh(w3 , w1 , s)
∨ ∃w2 , w3 , w4 . . .
∨¡. . .
∨ ∃w2 , w3 , . . . , wn .neigh(w
¢ 0 , w2 , s) ∧ neigh(w2 , w3 , s)
∧neigh(w3 , w1 , s) ∧ . . .
def
available(w, s) = f ree(w, s) ∧ connected(w, Coord, s))
The successor state axioms for this domain are:
at(a,
¡ loc, do(t, s)) ⇔ 0
¢
at(a,
loc, s) ∧ ∀loc .t 6= f inishedT ask(a, Go, loc0 )
¡
¢
∨ ¬at(a, loc, s) ∧ t = f inishedT ask(a, Go, loc) ∧ started(a, Go, loc, s)
evaluationOK(loc,
do(t, s)) ⇔ evaluationOK(loc, s)
¡
∨ ∃a.t = f inishedT ask(a, Evaluate, hloc, OKi)
¢
∧photoBuild(loc, n, s) ∧ ∃p.started(a, Evaluate, p, s) ∧ n ≥ threshold
inf oSent(do(t,
s)) ⇔ inf oSent(s)
¡
∨ ∃a.t = f inishedT ask(a, SendToHeadquarter,
OK)
¢
∧∃p.started(a, SendToHeadquarter, p, s)
photoBuild(loc,
n, do(t, s)) ⇔
¡
∃a, m, o.photoBuild(loc, m, s) ∧ t = f inishedT ask(a, TakePhoto,
hloc, oi)
¢
¡∧ n = m + o ∧ at(a, loc, s) ∧ ∃p.started(a, TakePhoto, p, s)
∨ ∃a, o.photoBuild(loc, n, s) ∧ t = f inishedT ask(a,
¢ TakePhoto, hloc, oi)
¡∧¬at(a, loc, s) ∧ ∃p.started(a, TakePhoto, p, s)
¢
∨ ∀a, o.photoBuild(loc, n, s) ∧ t 6= f inishedT ask(a, TakePhoto, hloc, oi
4.4. MONITORING FORMALISATION
57
It is worthy noting that all fluents which denote world properties of interest
are changed by f inishedT ask, as already told. Moreover, the value of fluent
photoBuild(loc, n, s) is updated by the execution of task Go only if the executor
is at location loc. Otherwise, the photos taken are not considered as valuable.
Even if that is not formally the pre-condition of the task (the task can be
executed in any case), in fact that is a condition that has to hold in order the
task be executed.
4.4
Monitoring Formalisation
Next we formalize how the monitor works. Intuitively, the monitor takes the
current program δ 0 and the current situation s0 from the PMS’s virtual reality
and, analyzing the physical reality by sensors, introduces fake actions in order
to get a new situation s00 which aligns the virtual reality of the PMS with
sensed information. Then, it analyzes whether δ 0 can still be executed in s00 ,
and if not, it adapts δ 0 by generating a new correctly executable program δ 00 .
Specifically, the monitor work can be abstractly defined as follows (we do not
model how the situation s00 is generated from the sensed information):
0 , s0 , s00 , δ 00 ) ⇔
M onitor(δ
¡
¢
0 , s0 , s00 ) ∧ Recovery(δ 0 , s0 , s00 , δ 00 ) ∨
Relevant(δ
¡
¢
¬Relevant(δ 0 , s0 , s00 ) ∧ δ 00 = δ 0
(4.7)
where: (i) Relevant(δ 0 , s0 , s00 ) states whether the change from the situation s0 into s00 is such that δ 0 cannot be correctly executed anymore; and
(ii) Recovery(δ 0 , s0 , s00 , δ 00 ) is intended to hold whenever the program δ 0 , to be
originally executed in the situation s0 , is adapted to δ 00 in order to be executed
in the situation s00 .
Formally Relevant is defined as follows:
Relevant(δ 0 , s0 , s00 ) ⇔ ¬SameConf ig(δ 0 , s0 , δ 0 , s00 )
where SameConf ig(δ 0 , s0 , δ 00 , s00 ) is true if executing δ 0 in s0 is “equivalent” to
executing δ 00 in s00 (see later for further details).
In this general framework we do not give a definition for
SameConf ig(δ 0 , s0 , δ 00 , s00 ).
However we consider any definition for
SameConf ig to be correct if it denotes a bisimulation [99].
Definition 4.1. A predicate SameConf ig(δ 0 , s0 , δ 00 , s00 ) is correct if for every
δ 0 , s0 , δ 00 , s00 :
1. F inal(δ 0 , s0 ) ⇔ F inal(δ 00 , s0 )
58
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
Main()
1 (EvalTake(LocA) k EvalTake(LocB) k EvalTake(LocC));
2 π.a0 [available(a0 ) ∧ ∀c.require(c, SendByGPRS) ⇒ provide(a0 , c)];
3
Assign(a0 , SendByGPRS);
4
Start(a0 , SendByGPRS, ∅);
5
AckT askCompletion(a0 , SendByGPRS);
6
Release(a0 , SendByGPRS);
EvalTake(Loc)
1 π.a1 [available(a1 ) ∧ ∀c.require(c, CompileQuest) ⇒ provide(a1 , c)];
2
Assign(a1 , CompileQuest);
3
Start(a1 , CompileQuest, Loc);
4
AckT askCompletion(a1 , CompileQuest);
5
Release(a1 , CompileQuest);
6 while ¬evaluationOK(Loc)
7 do
8
π.a2 [available(a2 ) ∧ ∀c.require(c, Go) ⇒ provide(a2 , c)];
9
Assign(a2 , Go);
10
Start(a2 , Go, Loc);
11
AckT askCompletion(a2 , Go);
12
Start(a2 , TakePhoto, Loc);
13
AckT askCompletion(a2 , TakePhoto);
14
Release(a2 , TakePhoto);
15
π.a3 [available(a3 ) ∧ ∀c.require(c, EvaluatePhoto) ⇒ provide(a3 , c)];
16
Assign(a3 , EvaluatePhoto);
17
Start(a3 , EvaluatePhoto, Loc);
18
AckT askCompletion(a3 , EvaluatePhoto);
19
Release(a3 , EvaluatePhoto);
Figure 4.3: The IndiGolog program corresponding to the process in Figure 4.2
4.5. A CONCRETE TECHNIQUE FOR RECOVERY
59
¡
¢
2. ∀ a, δ 0 .T rans δ 0 , s0 , δ 0 , do(a, s0 ) ⇒
¡
¢
¡
¢
∃ δ 00 .T rans δ 00 , s00 , δ 0 , do(a, s00 ) ∧ SameConf ig δ 0 , do(a, s), δ 00 , do(a, s00 )
¡
¢
3. ∀ a, δ 0 .T rans δ 00 , s00 , δ 0 , do(a, s00 ) ⇒
¡
¢
¡
¢
∃ δ 00 .T rans δ 0 , s0 , δ 0 , do(a, s0 ) ∧ SameConf ig δ 00 , do(a, s00 ), δ 0 , do(a, s0 )
Intuitively, a predicate SameConf ig(δ 0 , s0 , δ 00 , s00 ) is said to be correct if δ 0
and δ 00 are terminable either both or none of them. Furthermore, for each action a performable by δ 0 in the situation s0 , δ 00 in the situation s00 has to enable
the performance of the same actions (and viceversa). Moreover, the resulting
configurations (δ 0 , do(a, s0 )) and (δ 00 , do(a, s0 )) must still satisfy SameConf ig.
The use of the bisimulation criteria to state when a predicate
SameConf ig(· · · ) is correct, derives from the notion of equivalence introduced
in [64]. When comparing the execution of two formally different business processes, the internal states of the processes may be ignored, because what really
matters is the process behavior that can be observed. This view reflects the
way a PMS works: indeed what is of interest is the set of tasks that the
PMS offers to its environment, in response to the inputs that the environment
provides.
Next we turn our attention to the procedure to adapt the process formalized by Recovery(δ, s, s0 , δ 0 ). Formally is defined as follows:
Recovery(δ 0 , s0 , s00 , δ 00 ) ⇔
∃δa , δb .δ 00 = δa ; δb ∧ Deterministic(δa ) ∧
Do(δa , s00 , sb ) ∧ SameConf ig(δ 0 , s0 , δb , sb )
(4.8)
where Deterministic(δ) in general holds if δ does not use the concurrency
constructs, nor non-deterministic choices.
Recovery determines a process δ 00 consisting of a deterministic δa (i.e., a
program not using the concurrency construct), and an arbitrary program δb .
The aim of δa is to lead from the situation s00 in which adaptation is needed
to a new situation sb where SameConf ig(δ 0 , s0 , δb , sb ) is true.
Notice that during the actual recovery phase δa we disallow for concurrency
because we need full control on the execution of each service in order to get
to a recovered state. Then the actual recovered program δb can again allow
for concurrency.
4.5
A Concrete Technique for Recovery
In the previous sections we have provided a general description on how adaptation can be defined and performed. Here we choose a specific technique that
is actually feasible in practice. Our main step is to adopt a specific definition
60
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
for SameConf ig, here denoted as SameConfig, namely:
SameConfig(δ 0 , s0 , δ 00 , s00 ) ⇔
SameState(s0 , s00 ) ∧ δ 0 = δ 00
(4.9)
In other words, SameConfig states that δ 0 , s0 and δ 00 , s00 are the same
configuration if (i) all fluents have the same truth values in both s0 and s00
(SameState)3 , and (ii) δ 00 is actually δ 0 .
The following shows that SameConfig is indeed correct.
Theorem 4.1. SameConfig(δ 0 , s0 , δ 00 , s00 ) is correct.
Proof. We show that SameConfig is a bisimulation. Indeed:
• Since SameState(s0 , s00 ) requires
all fluents to have the ¢same values both
¡
0
00
in s and s , we have that F inal(δ, s0 ) ⇔ F inal(δ, s00 ) .
• Since SameState(s0 , s00 ) requires all fluents to have the same values both
in s0 and s00 , it follows that the PMS is allowed for the same process δ 0
to assign the same tasks both in s0 and in s00 and moreover for each
action a and situation s0 and s00 s.t. SameState(s0 , s00 ), we have that
0
00
SameState(do(a,
a and δ 0 such¢
¢ hold. As a result, for ¡each
¡ 0 0 s ), do(a, s0 ))
0
00
that T rans δ , s , δ 0 , do(a, s ) we have that T rans δ , s , δ 0 , do(a, s00 )
¡
¢
and SameConfig δ 0 , do(a, s), δ 00 , do(a, s00 ) . Similarly for the other direction.
Hence, the thesis holds.
Next let us denote by LinearP rogram(δ) a program constituted only by
sequences of actions, and let us define Recovery as:
Recovery(δ 0 , s0 , s00 , δ 00 ) ⇔
∃δa , δb .δ 00 = δa ; δb ∧ LinearP rogram(δa ) ∧
Do(δa , s00 , sb ) ∧ SameConfig(δ 0 , s0 , δb , sb )
(4.10)
Next theorem shows that we can adopt Recovery as a definition of
Recovery without loss of generality.
Theorem 4.2. For every process δ 0 and situations s0 and s00 , there exists a δ 00 such that Recovery(δ 0 , s0 , s00 , δ 00 ) if and only if there exists a δ 00
such that Recovery(δ 0 , s0 , s00 , δ 00 ), where in the latter we use SameConfig as
SameConf ig.
3
Observe that SameState can actually be defined as a first-order formula over the fluents,
as the conjunction of F (s0 ) ⇔ F (s00 ) for each fluent F .
4.5. A CONCRETE TECHNIQUE FOR RECOVERY
61
Proof. Observe that the only difference between the two definitions is that
in one case we allow only for linear programs (i.e., sequences of actions) as
δa , while in the second case also for deterministic ones, that may include also
if-then-else, while, procedures, etc.
(⇒) Trivial, as linear programs are deterministic programs.
(⇐) Let us consider the recovery process δ 00 = δa ; δb where δa is an arbitrary
deterministic program. Then by definition of Recovery there exists a (unique)
situation s such that Do(δa , s0 , s). Now consider that s as the form s =
do(an , do(an−1 , . . . , do(a2 , do(a1 , s0 )) . . .)). Let us consider the linear program
p = (a1 ; a2 ; . . . ; an ). Obviously we have Do(p, s0 , s). Hence the process δ 00 =
p; δb is a recovery process according to the definition of Recovery.
The nice feature of Recovery is that it asks to search for a linear program that achieves a certain formula, namely SameState(s0 , s00 ). Moreover,
restricting to sequential programs obtained by planning with no concurrency
does not prevent any recoverable process from being adapted.
In sum, we have reduced the synthesis of a recovery program to a classical Planning problem in AI [52]. As a result we can adopt a well-developed
literature about planning for our aim. In particular, if the services and input
and output parameters are finite, then the recovery can be reduced to propositional planning, which is known to be decidable in general (for which very
well performing software tools exists).
Theorem 4.3. Let assume a domain in which services and input and output
parameters are finite. Then given a process δ 0 and situations s0 and s00 , it is
decidable to compute a recovery process δ 00 such that Recovery(δ 0 , s0 , s00 , δ 00 )
holds.
Proof. In domains in which services and input and output parameters are
finite, also actions and fluents instantiated with all possible parameters are
finite. Hence we can phrase the domain as a propositional one and the thesis
follows from decidability of propositional planning [52].
Example 4.1 (cont.). In the running example, let us consider here two cases
of discrepancies causing significant deviations.
Case 1. The process is between lines 10 and 11 in the execution of the procedure invocation EvalTake(LocA). A certain node a2 is assigned to tasks Go
and TakePhoto. Suddenly, an appropriate sensor predicts that a2 is moving
soon out of range and, hence, disconnecting from the coordinator device.4 The
sensor generates and executes the action f inishedT ask(a2 , Go, RealPosition)
4
Section 5.4 describes a proposal of a Bayesian approach for predicting disconnects before
they actually happen. Such an approach has been also implemented in SmartPM.
62
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
where RealPosition is the position where node is going to disconnect. After
this action, in the resulting situation s0 fluent at(a2 , RealPosition, s0 ) holds
accordingly. The monitor infers that the exogenous event causes a significant
deviation (i.e., connected(a2 , Coord) does not hold and the process cannot be
completed). Hence, it uses a planner to build a recovery program pursuing the
goal connected(a2 , Coord) ∧ at(a2 , LocA) ∧ φ.5 Formula φ denotes the conjunction of all fluents in situation-suppressed form; holding fluents appear in form
affirmed, non-holding fluents negated). The planner will build very likely a
recovery program similar to the following:
δa = [
Assign(a3 , Go);
Start(a3 , Go, RealPosition);
AckT askCompletion(a3 , Go);
Release(a3 , Go);
AckT askCompletion(a2 , Go);
Start(a2 , Go, LocA);
AckT askCompletion(a2 , Go);
]
where a3 is a free team member that has been judged as the best to go after
a1 . Consequently, the program after the deviation is δ 0 = δa ; δb where δb is the
original one from line 12.
Case 2. The process is currently executing in any point among lines 1417 of procedure EvalTake(LocA). At this point, the number of photos taken
is bigger than constant threshold. For some reason some of those photos are suddenly lost (e.g., the files have been corrupted); hence, fluent
photoBuild(LocA, val, s) holds where val < threshold. The monitor senses a
significant deviation and, hence, it plans a proper recovery program pursuing
the goal photoBuild(LocA, threshold, s) ∧ φ:
δa = [
π.a5 [available(a5 ) ∧ ∀c.require(c, TakePhoto) ⇒ provide(a5 , c)];
Assign(a5 , TakePhoto);
Start(a5 , TakePhoto, LocA);
AckT askCompletion(a5 , TakePhoto);
Release(a5 , TakePhoto);
]
5
Observe that if the positions are discretised, so as to become finite, this recovery can be
achieved by a propositional planner.
4.6. SUMMARY
63
The example has shown that the approach proposed is not based on the
idea of capturing expected exceptions. Other approaches rely on rules to define the behaviors when special events that cause deviations are triggered.
Here we simply model (a subset of) the running environment and the actions’
effects, without defining how to manage the adaptation. Modeling the environment, even in detail, is feasible where modeling all possible exceptions is
often impossible.
4.6
Summary
This chapter has presented the formal foundation of a general approach, based
on execution monitoring, for automatic process adaptation in dynamic scenarios. Such an approach is (i) practical, by relying on well-established planning
techniques, and (ii) does not require the definition of the adaptation strategy
in the process itself (as most of the current approaches do). We have also given
the basic concepts of situation calculus and IndiGolog which are extensively
used throughout this thesis.
The approach proposed in this chapter has been formally proven to be
correct and complete and we have shown the application to a significant example stemming from a real scenario, specifically emergency management.
This example will be later used in the next chapter as running example when
discussing the concrete implementation done using the IndiGolog interpreter
developed by the Cognitive Robotics Group of the Toronto University.
64
CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION
Chapter 5
The SmartPM System
This chapter is devoted to describe SmartPM, the concrete implementation
of the framework described in Chapter 4. For this aim, we used in IndiGolog
platform developed by University of Toronto in collaboration with the Agent
Group of RMIT University, Melbourne.
Specifically, Section 5.1 overviews the IndiGolog platform used for
SmartPM, whereas Section 5.2 shows the concrete choices we made in order to
tailor the theoretical framework to a concrete implementation. The concrete
implementation of SmartPM has encountered two main group of issues.
Firstly, the IndiGolog platform was targeted to the agent and robot programming and, hence, using it for process management, which is not a close
application field, has been quite difficult. For example, we needed the inclusion of the construct atomic to define a sequence of actions that have to be
executed all together and cannot be interrupted and actions in sequences concurrently executing. Such a kind of construct makes less sense in the field of
robots, where it is quite important in process management in order to introduce the concept of transaction. In fact, the development has been carried on
with a tight collaboration with the conceivers and developers who very kindly
made some changes to meet our requirements.
Secondly, the theoretical framework did not consider the features and limitations that are actually available in the platform. For instance, the theoretical framework supposed to stop the process and restructure it by placing
the recovery beforehand. In practice, the platform does not allow to change
the program that codes the process when already started. In order to overcome this limitation, we committed to use interrupts at different priorities (see
Section 5.2).
Nowadays, in many pervasive scenarios, such as emergency management or
health care, it is not feasible to assume that the area or the house (or whatever
else) is equipped with access points providing Wi-Fi networks. In order to have
65
66
CHAPTER 5. THE SMARTPM SYSTEM
devices, operators and services to communicate, it is required to deploy quickly
a wireless network for the time the communication is necessary that relies on no
fixed network. As already told in the Introduction, a Mobile Ad hoc Network
(manet) is a P2P network of mobile nodes capable of communicating with
each other without an underlying infrastructure. Nodes can communicate with
their own neighbors (i.e., nodes in radio-range) directly by wireless links. Nonneighbor nodes can communicate as well, by using other intermediate nodes
as relays that forward packets toward destinations. Therefore, manets seem
to be appropriate in pervasive scenarios since they can also operate where the
presence of access points is not guaranteed, as in emergency management [91].
Sections 5.3 and 5.4 shows to interesting research carried on in order to
apply concretely SmartPM to many pervasive scenarios. The former turns to
describe a network layer to make communicate devices and services in manet
settings. The latter describes the development and testing of an algorithm
that enables to alert when mobile devices are going out of range from the
others and, hence, the services installed become unavailable. These signals
represent exogenous events to be caught by the PMS, which should build a
recovery plan trying to avoid the service unavailability.
In order to test the effectiveness of SmartPM and of the techniques to
support its usage in manet scenarios, the best solution would be on-field
tests. But they would require many people moving around in large areas and
repeatability of the experiments would be compromised. In these cases, it is
better to emulate: during emulation, some software or hardware pieces are
not real whereas others are exactly the ones on actual systems. The “nice”
feature is that software systems are not aware of being working on some layers
that are partially or totally emulated. Therefore, the software is not changed
to meet the emulator environment; it can be used in real settings with few
or no changes. Section 5.5 describes octopus, a specific emulator to test the
SmartPM PMS on manets and the aforementioned components.
5.1
The IndiGolog Platform
This section describes the IndiGolog-based platform that we have used to implement the framework described in Chapter 4.1 Part of this section is a
summary of the work published in [29] by kind agreement with its authors.
The agent platform to be described here is a logic-programming implementation of IndiGolog that allows the incremental execution of high-level
Golog-like programs. This implementation of IndiGolog is modular and easily
extensible so as to deal with any external platform, as long as the suitable
interfacing modules are programmed (see below).
1
Available at http://sourceforge.net/projects/indigolog/.
5.1. THE INDIGOLOG PLATFORM
67
Although most of the code is written in vanilla Prolog, the overall architecture is written in the well-known open source SWIProlog 2 [144]. SWIProlog provides flexible mechanisms for interfacing with other programming languages such as Java or C, allows the development of multi-threaded applications, and provides support for socket communication and constraints.
Generally speaking, the IndiGolog implementation provides an incremental interpreter of high-level programs as well as a framework for dealing with
the real execution of these programs on concrete platforms or devices. This
amounts to handling the real execution of actions on concrete devices (e.g.,
a real robot platform), the collection of sensing outcome information (e.g.,
retrieving some sensor’s output), and the detection of exogenous events happening in the world. To that end, the architecture is modularly divided into six
parts, namely, (i) the top-level main cycle; (ii) the language semantics; (iii)
the temporal projector; (vi) the environment manager; (v) the set of device
managers; and finally (vi) the domain application. The first four modules are
completely domain independent, whereas the last two are designed for specific
domain(s). The architecture is depicted in Figure 5.1.
5.1.1
The top-level main cycle and language semantics
The IndiGolog platform codes the sense-think-act loop well-known in the agent
community [76]:
1. check for exogenous events that have occurred;
2. calculate the next program step; and
3. if the step involves an action, execute the action.
While executing actions, the platform keeps updated an history, which is the
sequence of actions performed so far.
The main predicate of the main cycle is indigo/2; a goal of the form
indigo(E,H) states that the high-level program E is to be executed online at
history H.
The first thing the main cycle does is to assimilate all exogenous events that
have occurred since the last execution step. After all exogenous actions have
been assimilated and the history progressed as needed, the main cycle goes
on to actual executing the high-level program E. First, if the current program
to be executed is terminating in the current history, then the top-level goal
indigo/2 succeeds. Otherwise, the interpreter checks whether the program
can evolve a single step by relaying on predicate trans/4 (explained below).
If the program evolves without executing any action, then the history remains
2
http://www.swi-prolog.org/
68
CHAPTER 5. THE SMARTPM SYSTEM
Figure 5.1: The IndiGolog implementation architecture. Links with a circle
ending represent goal posted to the circled module (as from [29])
5.1. THE INDIGOLOG PLATFORM
69
unchanged and we continue to execute the remaining program from the same
history. If, however, the step involves performing an action, then this action is
executed and incorporated into the current history, together with its sensing
result (if any), before continuing the execution of the remaining program.
As mentioned above, the top-level loop relies on two central predicates,
namely, final/2 and trans/4. These predicates implement relations T rans
and F inal, giving the single step semantics for each of the constructs in the
language. It is convenient, however, to use an implementation of these predicates defined over histories instead of situations. Indeed, the constructs of the
IndiGolog interpreter never treat about situations but they are always assuming
to work on the current situation. So, for example, these are the corresponding
clauses for sequence (represented as a list), tests, nondeterministic choice of
programs, and primitive actions:
final([E|L],H) :- final(E,H), final(L,H).
trans([E|L],H,E1,H1) :- final(E,H), trans(L,H,E1,H1).
trans([E|L],H,[E1|L],H1) :- trans(E,H,E1,H1).
final(ndet(E1,E2),H) :- final(E1,H) ; final(E2,H).
trans(ndet(E1,E2),H,E,H1) :- trans(E1,H,E,H1).
trans(ndet(E1,E2),H,E,H1) :- trans(E2,H,E,H1).
trans(?(P),H,[],H) :- eval(P,H,true).
trans(E,H,[],[E|H]) :- action(E), poss(E,P), eval(P,H,true).
/* Obs: no final/2 clauses for action and test programs */
These Prolog clauses are almost directly “lifted” from the corresponding axioms for T rans and F inal. Predicates action/1 and poss/2 specify the actions of the domain and their corresponding precondition axioms; both are
defined in the domain axiomatization (see below). More importantly, eval/3
is used to check the truth of a condition at a certain history, and is provided
by the temporal projector, described next.
The naive implementation of the search operator would deliberate from
scratch at every point of its incremental execution. It is clear, however, that
one could do better than that, and cache the successful plan obtained and
avoid planning in most cases:
final(search(E),H) :- final(E,H).
trans(search(E),H,path(E1,L),H1) :trans(E,H,E1,H1), findpath(E1,H1,L).
/* findpath(E,H,L): solve (E,H) and store the path in list L */
/* L = list of configurations (Ei,Hi) expected along the path */
70
CHAPTER 5. THE SMARTPM SYSTEM
findpath(E,H,[(E,H)]) :- final(E,H).
findpath(E,H,[(E,H)|L]) :- trans(E,H,E1,H1), findpath(E1,H1,L).
So, when a search block is solved, the whole solution path found is stored as
the sequence of configurations that are expected. If the actual configurations
match, then steps are performed without any reasoning (first final/2 and
trans/4 clauses for program path(E,L)). On the other hand, if the actual
configuration does not match the one expected next, for example, because
an exogenous action occurred and the history thus changed, re-planning is
performed to look for an alternative path (code not shown).
5.1.2
The temporal projector
The temporal projector is in charge of maintaining the agent’s beliefs about the
world and evaluating a formula relative to a history. The projector module
provides an implementation of predicate eval/3: goal eval(F,H,B) states
that formula F has truth value B, usually true or false, at history H.
Predicate eval/3 is used to define trans/4 and final/2, as the legal evolutions of high-level programs may often depend on what things are believed
true or false.
We assume then that users provide definitions for each of the following
predicates for fluent f , action a, sensing result r, formula w, and arbitrary
value v:
fun fluent(f) f is a functional fluent;
rel fluent(f) f is a functional fluent;
prim action(a) a is a ground action;
init(f,v) v is the value for fluent f in the starting situation;
poss(a,w) it is possible to execute action a provided formula w is known to
be true;
causes val(a,f,v,w) action a affects the value of f
Formulas are represented in Prolog using the obvious names for the logical
operators and with all situations suppressed; histories are represented by lists
of the form o(a, r) where a represents an action and r a sensing result. We will
not go over how formulas are recursively evaluated, but just note that there
exists a predicate (i) kTrue(w, h) is the main and top-level predicate and it
tests if the formula w is at history h. Finally, the interface of the module is
defined as follows:
5.1. THE INDIGOLOG PLATFORM
71
eval(F,H,true) :- kTrue(F,H).
eval(F,H,false) :- kTrue(neg(F),H).
5.1.3
The environment manager and the device managers
Because the architecture is meant to be used with concrete agent/robotic platforms, as well as with software/simulation environments, the online execution
of IndiGolog programs must be linked with the external world. To that end,
the environment manager (EM) provides a complete interface with all the
external devices, platforms, and real-world environments that the application
needs to interact with.
In turn, each external device or platform that is expected to interact with
the application (e.g., a robot, a software module, or even a user interface)
is assumed to have a corresponding device manager, a piece of software that
is able to talk to the actual device, instruct it to execute actions, as well as
gather information and events from it. The device manager understands the
“hardware” of the corresponding device and provides a high-level interface to
the EM. It provides an interface for the execution of actions (e.g., assign,
start, etc.), the retrieval of sensing outcomes for actions, and the occurrence
of exogenous events (e.g., disconnect as well as finishedTask).
Because actual devices are independent of the IndiGolog application and
may be in remote locations, device managers are meant to run in different
processes and, possibly, in different machines; they communicate then with
the EM via TCP/IP sockets. The EM, in contrasts, is part of the IndiGolog
agent architecture and is tightly coupled with the main cycle. Still, since the
EM needs to be open to the external world regardless of any computation
happening in the main cycle, the EM and the main cycle run in different (but
interacting) threads, though in the same process and Prolog run-time engine.3
So, in a nutshell, the EM is responsible of executing actions in the real
world and gathering information from it in the form of sensing outcome and
exogenous events by communicating with the different device managers. More
concretely, given a domain high-level action (e.g., assign(W rkList, Srvc)),
the EM is in charge of: (i) deciding which actual “device” should execute
the action; (ii) ordering its execution by the device via its corresponding
device manager; and finally (iii) collecting the corresponding sensing outcome.
To realize the execution of actions, the EM provides an implementation of
exec/2 to the top-level main cycle: exec(A,S) orders the execution of action
A, returning S as its sensing outcome.
When the system starts, the EM starts up all device managers required by
the application and sets up communications channels to them using TCP/IP
3
SWIProlog provides a clean and efficient way of programming multi-threaded Prolog
applications.
72
CHAPTER 5. THE SMARTPM SYSTEM
stream sockets. Recall that each real world device or environment has to have
a corresponding device manager that understands it. After this initialization
process, the EM enters into a passive mode in which it asynchronously listens
for messages arriving from the various devices managers. This passive mode
should allow the top-level main cycle to execute without interruption until
a message arrives from some device manager. In general, a message can be
an exogenous event, a sensing outcome of some recently executed action, or
a system message (e.g., a device being closed unexpectedly). The incoming
message should be read and handled in an appropriate way, and, in some cases,
the top-level main cycle should be notified of the occurred event.
5.1.4
The domain application
From the user perspective, probably the most relevant aspect of the architecture is the specification of the domain application. Any domain application
must provide:
1. An axiomatization of the dynamics of the world. Such axiomatization
would depend on the temporal projector to be used.
2. One or more high-level agent programs that will dictate the different
agent behaviors available. In general, these will be IndiGolog programs.
3. All the necessary execution information to run the application in the
external world. This amounts to specifying which external devices the
application relies on (e.g., the device manager for the ER1 robot), and
how high-level actions are actually executed on these devices (that is, by
which device each high-level action is to be executed). Information on
how to translate high-level symbolic actions and sensing results into the
device managers’ low-level representations, and vice-versa, could also be
provided.
5.2
The SmartPM Engine
According to the framework defined in Section 4, the PMS interrupts the
execution of processes when a misalignment between the virtual and physical
reality is sensed. When this happens the monitor adapts the program to
deal with such a discrepancies. This section describes how the adaptation
framework has been concretely implemented in SmartPM.
Figure 5.2 shows how conceptually SmartPM has been integrated into the
IndiGolog interpreter.
At the beginning, we envision a responsible person to design the process
specification through a Graphical Tool, namely SPIDE (Figure 5.3 shows a
5.2. THE SMARTPM ENGINE
Figure 5.2: Architecture of the PMS.
73
74
CHAPTER 5. THE SMARTPM SYSTEM
Figure 5.3: The SPIDE Tool
screen shot), which generates an accordant XML files [98]. Specifically, it is
meant to generate the XML specification file which should contain a formal
domain theory as well as the process schema and the action conditions. SPIDE
tailors the approach proposed in [81] to SmartPM: it allows to define specific
templates with a finite number of open options. When the instance needs to
be created, an operator chooses the proper template from a repository and
close the open points, thus transforming the abstract template in a concrete
process specification.
The XML-to-IndiGolog Parser component translates a SPIDE’s XML specification in three conceptual parts:
Domain Program. The IndiGolog program corresponding to the designed
5.2. THE SMARTPM ENGINE
75
process. It includes also some helper procedures to handle the task
executions, the interaction with the external services and other features.
Domain Axioms It comprises the action theory: the set of fluents modeling
world properties of interest, the set of available tasks, and the successorstate axioms which describes how the actions applied on tasks change
the fluents. Some axiomatization parts are, in fact, independent of the
domain, and, hence, remain unchanged when passing from a domain to
another. On the contrary, other axioms are modeled according to the
domain and model how domain-dependent fluents change as result of the
task executions.
Execution Monitor This parts is always generated in the same way and
does not take the specific domain into account.
Specifically, more details of the first two parts are given in Section 5.2.1,
whereas Section 5.2.2 turns to described how the monitoring/recovering mechanism has been coded in SmartPM using IndiGolog.
When the program is translated in the Domain Program and Axioms, the
Communication Manager (CM) starts up all of device managers, which are
basically some drivers for making communicate PMS with the services and
sensors installed on devices. PMS holds a device manager for each device
hosting services. After this initialization process, CM activates the IndiGolog
Engine, which is in charge of executing IndiGolog programs by realising the
main cycle described in Section 5.1.1. Then, CM enters into a passive mode
where it is is listening for messages arriving from the devices through the
device managers. In general, a message can be a exogenous event harvested
by a certain sensor installed on a given device as well as a message notifying
the beginning or the completion of a certain task.
The Communication Manager can be invoked by the IndiGolog Engine
whenever it produces an action for execution. The IndiGolog Engine relies
on two further modules named Transition System and Temporal Projector.
The former is used to compute the evolution of IndiGolog programs according
to the statements’ semantic., whereas the latter is in charge of holding the
current situations throughout the execution, making possible to evaluate the
fluent values.
From the one side, the Execution Monitor makes use of CM which notifies
which notifies the occurrence of exogenous events; from the other side, it relies
on the Temporal Projector to get the updated values of fluents.
5.2.1
Coding processes by the IndiGolog interpreter
This sub-section turns to describe how processes can be concretely coded as
IndiGolog programs by using the interpreter described in Section 5.1. Inter-
76
CHAPTER 5. THE SMARTPM SYSTEM
ested readers may look at Example 5.1, which shows the most significant parts
of the interpreter code.4
The process requires a model definition for the predicates that are defined
in Section 4.3: service(a), task(x), capability(b), provide(a, b), require(x, b).
In addition, we introduced predicate identif iers(i), which defines the valid
identifiers for tasks. Indeed, the process specification may comprise certain
tasks more than once; of course, different instances of the same task have to
be distinguished as they are different pieces of work.
Example 5.1. The following is the code of the IndiGolog interpreter giving a
definition of the aforementioned predicates for the running example. Specifically, the example assumes the team to be composed by five services, all humans, that are univocally identified by a number. Predicate domain(N,X) has
been made available by the IndiGolog interpreter itself. And it holds whether
element N is into list X.
/* Available services */
services([1,2,3,4,5]).
service(Srvc) :- domain(Srvc,services).
/* Tasks defined in the process specification */
tasks([TakePhoto,EvaluatePhoto,CompileQuest,Go,SendByGPRS]).
task(Task) :- domain(Task,tasks).
/* Capabilities relevant for the process of interest*/
capabilities([camera,compile,gprs,evaluation]).
capability(B) :- domain(B,capabilities).
/* The list of identifiers that may be used
to distinguish different instances of the same task*/
task_identifiers([id_1,id_2,id_3,id_4,id_5,id_6,id_7,id_8,id_9,
id_10,id_11,id_12,id_13,id_14,id_15,id_16,id_17,id_18,id_19,
id_20]). id(D) :- domain(D,task_identifiers).
/* The capabilities required for each task */
required(TakePhoto,camera).
required(EvaluatePhoto,evaluation).
required(CompileQuest,compile).
required(SendByGPRS,gprs).
4
Appendix A lists all the code of the interpreter.
5.2. THE SMARTPM ENGINE
77
/The capabilities provided by each service */
provide(1,gprs).
provide(1,evaluation).
provide(2,compile).
provide(2,evaluation).
provide(2,camera).
provide(3,compile).
provide(4,evaluation).
provide(4,camera).
provide(5,compile).
Tasks with their identifiers and inputs are packaged into elements
workitem(T ask, Id, Input) of predicates listElem(workitem). The work-item
element can be grouped in lists identified by elements worklist(·). The following is the corresponding Prolog code:
worklist([]).
worklist([ELEM | TAIL]) :- worklist(TAIL),listelem(ELEM).
Indeed, actions assign(·) and release(·) take as input elements worklist(·).
In fact, this implementation assigns one worklist(·) to one proper service that
is capable to execute all tasks in the list. The assignment of lists of tasks to
services rather than single tasks is motivated by the fact that we are willing
to constrain multiple tasks to be executed by the same service.
Example 5.1 (cont.). The example shows the definition of the different
types of valid work items and their input parameters. Specifically, the first
definition of listElem below gives the definition of work items of tasks Go,
CompileQuest, EvaluatePhoto, TakePhoto. The second gives the definition
of work items of SendByGPRS. The former group relies on the definition of
predicate location that represents the possible locations in the geographic area
of interest.
/* Definition of predicate location(...) identifying locations
in the geographic area of interest */
gridsize(10).
gridindex(V) :gridsize(S),
get_integer(0,V,S).
location(loc(I,J)) :- gridindex(I), gridindex(J).
78
CHAPTER 5. THE SMARTPM SYSTEM
/* member(ELEM,LIST) holds if ELEM is contained in LIST */
member(ELEM,[HEAD|_]) :- ELEM=HEAD.
member(ELEM,[_|TAIL]) :- member(ELEM,TAIL).
/* Definition of predicate listelem(workitem(Task,Id,I)).
It identifies a task Task with id Id and input I */
listelem(workitem(Task,Id,I)) :- id(Id), location(I),
member(Task,[Go,CompileQuest,EvaluatePhoto,TakePhoto]).
listelem(workitem(SendByGPRS,Id,input)) :- id(Id).
According to the framework of Chapter 4, there exist two classes of fluents,
domain-dependent and domain-independent. The domain-independent fluents are enabled and f ree, as defined in the framework of Section 4, as well as
assigned(LW rk, Srvc), which is not part of the theoretical framework and has
been introduced for some implementation reasons (see below in this Section).
Predicate assigned(·) holds if a certain worklist Lwrk is assigned to a service
Srvc as result of the execution of action assign(Lwrk, Srvc). On the basis of
some of these fluents we can define the four PMS actions, which are named
Primary Actions in the terminology of the IndiGolog intepreter: assign, start,
ackT askCompletion and release. The domain-dependent fluents can be represented in any form, relational or functional, and their successor-state axioms
can be as complex as the domain needs.
Example 5.1 (cont.). For the sake of brevity, we are showing below only
the definitions of fluents assigned(·) and enabled(·) and their successor-state
axioms. As far as the actions, assign and release can be executed in any
case, whereas start(T ask, Id, Srvc, I) can be executed only if a certain work
list LW rk is assigned to Srvc, there exists an element workitem(Task,Id,I)
in LW rk. Moreover, T ask has to be enabled to Srvc, which means Srvc
has previously executed action readyT oStart(T ask, Id, Srvc). The IndiGolog
interpreter defines two procedures and(F1 , F2 ) and or(F1 , F2 ). The first is
true if F1 and F2 are two formulas that hold in the current situation; the
second if at least one between F1 and F2 holds. F1 and F2 are formulas that
may be conjunction or disjunction of sub-formulas, which may include fluents,
procedures, generic predicates, etc.
/* Indicates that list LWrk of workitems has been assigned
to service Srvc */
rel_fluent(assigned(LWrk,Srvc)) :- worklist(LWrk),
5.2. THE SMARTPM ENGINE
79
service(Srvc).
/* assigned(LWrk,Srvc) holds after action assign(LWrk,Srvc) */
causes_val(assign(LWrk,Srvc),assigned(LWrk,Srvc),true,true).
/* assigned(LWrk,Id,Srvc) holds no longer after action
release(LWrk,Srvc) */
causes_val(release(LWrk,Srvc),assigned(LWrk,Srvc),false,true).
/* Indicates that task Task with id Id has been begun by
service Srvc */
rel_fluent(enabled(Task,Id,Srvc))
:- task(Task), service(Srvc), id(Id).
/* enabled(Task,Id,Srvc) holds if the service Srvc calls
readyToStart((Task,Id,Srvc)*/
causes_val(,enabled(Task,Id,Srvc),true,true).
/* enabled(Task,Id,Srvc) holds no longer after service Srvc
calls exogenous action finishedTask(Task,Id,Srvc,V)*/
causes_val(finishedTask(Task,Id,Srvc,_),
enabled(Task,Id,Srvc),false,true).
/* ACTIONS and PRECONDITIONS (INDEPENDENT OF THE DOMAIN) */
prim_action(assign(LWrk,Srvc)) :- worklist(LWrk),service(Srvc).
poss(assign(LWrk,Srvc), true).
prim_action(ackTaskCompletion(Task,Id,Srvc))
:- task(Task), service(Srvc), id(Id).
poss(ackTaskCompletion(Task,Id,Srvc), neg(enabled(Task,Id,Srvc))).
prim_action(start(Task,Id,Srvc,I))
:- listelem(workitem(Task,Id,I)), service(Srvc).
poss(start(Task,Id,Srvc,I), and(enabled(Task,Id,Srvc),
and(assigned(LWrk,Srvc),
member(workitem(Task,Id,I),LWrk))
)).
prim_action(release(LWrk,Srvc)) :- worklist(LWrk),service(Srvc).
poss(release(LWrk,Srvc), true).
80
CHAPTER 5. THE SMARTPM SYSTEM
Below we show some of the fluents that have been defined for the running example. Specifically we show fluents at(Srvc) and evaluationOK(Loc).
Careful readers may note that at is defined here as a functional fluent, which
returns locations, rather than as a relational fluent. In addition, we show the
abbreviation hasConnection(Srvc), which returns true if Srvc is connected to
Service 1 through a possible multi-hop path. Indeed, Service 1 is supposed to be
deployed on the device that hosts the SmartPM engine. Such an abbreviation
makes use of the IndiGolog procedure some(n, F (n)) which returns true if there
exists a value n which makes hold formula F (n).
/* at(Srvc) indicates that service Srvc is in position P */
fun_fluent(at(Srvc)) :- service(Srvc).
causes_val(finishedTask(Task,Id,Srvc,V),at(Srvc),loc(I,J),
and(Task=Go,V=loc(I,J))).
rel_fluent(evaluationOK(Loc)) :- location(Loc).
causes_val(finishedTask(Task,Id,Srvc,V),
evaluationOK(loc(I,J)), true,
and(Task=EvaluatePhoto,
and(V=(loc(I,J),OK),
and(photoBuild(loc(I,J),N),
N>3)))).
proc(hasConnection(Srvc),hasConnectionHelper(Srvc,[Srvc])).
proc(hasConnectionHelper(Srvc,M),
or(neigh(Srvc,1),
some(n,
and(service(n),
and(neg(member(n,M)),
and(neigh(n,Srvc),
hasConnectionHelper(n,[n|M]))))))).
The realisation of the execution cycle of a work list (i.e., a list of work
items) is based on procedure isP ickable(W rkList, Srvc). It holds if W rkList
is a list of proper work items and Srvc is capable to perform every task defined
in every work item in such a list (i.e., Srvc provides all of the capabilities
required).
5.2. THE SMARTPM ENGINE
81
In order to add a certain work list W rkList to the process specification,
designers should use procedure manageT ask(W rkList), which takes care of
(i) assigning all tasks in work list W rkList to one proper service, (ii) performing start(·) and ackT askCompletion(·), waiting for readyT oStart(·)
f inishedT ask(·) from services, as well as (iii) releasing services when all tasks
in the work list have been executed.
Example 5.1 (cont.). Procedure manageT ask(W rkList) is internally composed by three sub-procedures. Firstly, it calls manageAssignment(W rkList)
that picks a certain Srvc and turns assign(W rkList, Srvc). Then procedure
manageExecution(W rkList) is invoked and such a procedure executes actions start(start(T ask, Id, Srvc, I) and ackT askCompletion(T ask, Id, Srvc)
one by one for each work item workitem(T ask, Id, I) in list W rkList. Finally, the last sub-procedure is manageT ermination(W rkList) which makes
the picked service Srvc free again by using the PMS action realise. It is worthy noting the use of the IndiGolog construct atomic([a1 ; . . . ; an ]) to provide an
atomic execution of a action sequence a1 , . . . , an . Here atomicity is intended
in the sense that all of these actions are performed sequentially and any other
procedure is stuck till the whole sequence execution. For instance, in procedure
manageAssignment the atomic constructs is used in order to prevent the same
service to be picked by different executions of procedure manageAssignment.
Otherwise, this would cause obvious inconsistences.
Procedure
isExecutable
uses
the
IndiGolog
construct
f indall(elem, f ormula, set), which works as follows: it takes all instances of elem that makes f ormula true and puts all of them in set set.
elem and f ormula are unified by the same term name; that means f ormula
has to have a non-ground term named elem. Being that said, in procedure
isExecutable(T ask, Srvc) term A denotes the set of all capabilities required
by task T ask, whereas C denotes all capabilities provided by service Srvc.
When can Srvc execute T ask? If the set A of the capabilities required by
T ask is a sub set of C, the capabilities provided by Srvc.
proc(isPickable(WrkList,Srvc),
or(WrkList=[],
and(free(Srvc),
and(WrkList=[A|TAIL],
and(listelem(A),
and(A=workitem(Task,_Id,_I),
and(isExecutable(Task,Srvc),
isPickable(TAIL,Srvc))))))
)
).
82
CHAPTER 5. THE SMARTPM SYSTEM
proc(isExecutable(Task,Srvc),
and(findall(Capability,required(Task,Capability),A),
and(findall(Capability,provide(Srvc,Capability),C),
subset(A,C)))).
/* PROCEDURES FOR HANDLING THE TASK LIFE CYCLES */
proc(manageAssignment(WrkList),
[atomic([pi(Srvc,[?(isPickable(WrkList,Srvc)),
assign(WrkList,Srvc)])])]).
proc(manageExecution(WrkList),
pi(Srvc,[?(assigned(WrkList,Srvc)=true),
manageExecutionHelper(WrkList,Srvc)])).
proc(manageExecutionHelper([],Srvc),[]).
proc(manageExecutionHelper([workitem(Task,Id,I)|TAIL],Srvc),
[start(Task,Id,Srvc,I), ackTaskCompletion(Task,Id,Srvc),
manageExecutionHelper(TAIL,Srvc)]).
proc(manageTermination(WrkList),
[atomic([pi(n,[?(assigned(WrkList,n)=true),
release(X,n)])])]).
proc(manageTask(WrkList),
[manageAssignment(WrkList),
manageExecution(WrkList),
manageTermination(WrkList)]).
Finally, if the framework is properly configured, the program that codes a
process results to be quite simple. Specifically, for the running example is the
following:
proc(branch(Loc),
while(neg(evaluationOk(Loc)),
[
manageTask([workitem(CompileQuest,id_1,Loc)]),
5.2. THE SMARTPM ENGINE
83
manageTask([workitem(Go,id_1,Loc),
workitem(TakePhoto,id_2,Loc)]),
manageTask([workitem(EvaluatePhoto,id_1,Loc)]),
]
)
).
proc(process,
[rrobin([branch(loc(2,2),branch(loc(3,5)),branch(loc(4,4)))]),
manageTask([workitem(SendByGPRS,id_29,input)])
] ).
The next sub-section turns to describe how adaptability is realized in this
implementation.
5.2.2
Coding the adaptation framework in IndiGolog
Figure 5.4 shows how the actual implementation of the adaptation framework
is coded by the IndiGolog interpreter.
In the remaing of this section, we name as exogenous events every unexpected exogenous action executed by the environment. Service actions
readyT oStart(·) and f inishedT ask(·) are not unexpected but, rather, “good”
expected actions which change the fluents to achieve to the process goals.
Specifically, the framework implementation relies on three additional
domain-independent fluents:
f inished(s). In the starting situation it is false. It is turned to true when
the process is carried out. Indeed, before finishing the execution, the
process itself executes the action finish. The corresponding successorstate axioms is, hence, the following:
f inished(do(t, s)) ⇔ f inished(s) ∨ t = f inish.
adapting(s) In the starting situation it is false. It is turned to true when a
recovery plan starts being built and is turned to false when the recovery plan is found and executed. In order to set and unset this fluent
there exist two actions adaptStart and adaptFinish. The successor-state
axioms is, hence, as follows:
¡
¢
adapting(do(t,
s)) ⇔ adapting(s) ∧¢t 6= adaptF inish
¡
∨ ¬adapting(s) ∧ t = adaptStart
exogenous(s). In the starting situation, it is false. It is turned to true when
any exogenous action occurs. Action resetExo, when executed, aims at
restoring the fluent to value true.
84
CHAPTER 5. THE SMARTPM SYSTEM
Main()
1 h(¬f inished ∧ exogenous) → Monitor()i;
2 h¬f inished → (Process(); f inish)i;
3 h¬f inished → (wait)i;
Monitor()
1 if (Relevant())
2
then Adapt;
3 resetExo;
Adapt()
1 adaptStart;
¡
¢
2 AdaptingProgram(); adaptF inish;
3 ii
¡
¢
4 while (adapting) do wait();
AdaptingProgram()
¡
1 Σ SearchP rogram,
2
assumptions(
3
[hAssign(workitem(T ask, Id, Input), Srvc), readyT oStart(T ask, Id, Srvc)i,
4
hStart(T ask, Id, Srvc, Input), f inishedT ask(T ask, Id, Srvc, Input)i]
5 ¢
)
6
SearchProgram()
¡
1 π (T ask, Input, Srvc);
2
isP ickable(workitem(T ask, Id Adapt, Input), Srvc)?;
3
assign([workitem(T ask, Id Adapt, Input)], Srvc);
4
start(T ask, Id Adapt, Srvc, Input);
5
ackT askCompletion(T ask, Id Adapt, Srvc);
6 ¢ release[workitem(T ask, Id Adapt, Input)], Srvc);
∗
7
;
8 (GoalReached)?;
Figure 5.4: The process adaptation procedure represented using the IndiGolog
formalism
5.2. THE SMARTPM ENGINE
85
Main Procedure
The main procedure of the whole IndiGolog program is Main, which involves
three interrupts running at different priorities. All these interrupts are guarded
by fluent f inished(·). When it holds, that means the process execution is
completed successfully. Therefore, these interrupts cannot fire anymore. The
first highest priority interrupt fires when an exogenous event has occurred (i.e.,
condition exogenous is true). In such a case the Monitor procedure. If no
exogenous event has been sensed, the second triggers and the execution of the
actual process is attempted. If also the progress cannot be progress, the third
is activated, which consists just in waiting. The fact that the process cannot
be carried on does not necessarily mean that the process is stuck forever. For
instance, the process cannot progress because a certain task cannot be assigned
to any qualified member (i.e., the pick is unable to find any member providing
all capabilities required by that task) as all of them are currently involved in
the performance of other tasks. If we did not add the third interrupt, when
the process is unable to progress, IndiGolog would consider the program as
failing.
The monitoring/repairing procedure
The Monitor procedure checks through procedure Relevant whether the
exogenous event has been relevant, i.e. some fluents have changed their value
consequently. If so, the Adapt procedure is launched, which will build the
recovery program/process. Both if changes are relevant or are irrelevant,
the procedure concludes by executing action resetExo, which turns fluent
exogenous(·) to false.
Let us describe how procedure Relevant works. The IndiGolog interpreter
used in this realization is always evaluating situation-suppressed fluents where
the situation is always intended to be the current one. Therefore, there is
no way to get access to past situations in order to check if the application of
the exogenous event has changed some fluents. In the light of this, for each
defined fluent F (~x) in the action theory D, we give a definition of another
fluent Fprev (~x) that keeps the F value in the previous situation:
∀a.Fprev (~x, do(a, s)) = x ⇔ F (~x, s) = x
When an exogenous event occurs, before applying the corresponding action
on the fluents, we copy the value of each fluent F to Fprev . Then, we apply the changes to every fluent as consequence of the action and, finally, we
check for changes through procedure Relevant. At higher level, procedure
Relevant should be modeled in second-order logics as follows (using the
86
CHAPTER 5. THE SMARTPM SYSTEM
situation-suppressed form for fluents):
^
∀~x.domainF (~x) ⇒ F (~x) = a ∧ Fprev (~x) = a
F ∈D
where domainF (~x) holds whether ~x is an appropriate input for F .
Of course, being based on Prolog, the IndiGolog interpreter does not accept
formulas in second-order logic. The only solution is to enumerate explicitly all
fluents (say n) and to connect them by operators and as well as it is to define
predicates domainF i(~x):
¢
def ¡
φ =¡ ∀~x.domainF 1 (~x) ⇒ ∃a.F 1(~x) = a ∧ F 1prev (~x) =¢a
∧ ∀~x.domainF 2 (~x) ⇒ ∃a.F 2(~x) = a ∧ F 2prev (~x) = a
∧¡. . .
¢
∧ ∀~x.domainF n (~x) ⇒ ∃a.F n(~x) = a ∧ F nprev (~x) = a
(5.1)
If Equation 5.1 does not hold, the exogenous event has caused a relevant deviation. Please note that the number of existing fluents and appropriate inputs
are finite and, hence, the approach of enumerating all of them is practically
realizable.
Procedure Adapt invokes AdaptingProgram in order to build and execute the recovery program and, at the same time, to remain waiting till the
recovery program is totally executed.
Procedure Adapting Program achieves to build the recovery plan as
follows. Let φ be the formula representing the state that has to be restored
(i.e., the formula in Equation 5.1 instantiated on the action theory of the
current process domain). Theorem 4.2 guarantees that if there exist some
recovery programs for a certain deviation, then there exists also a linear one.
Therefore, we can focus on searching for linear programs. Specifically, the
linear recovery program can be abstracted as follows:
δerec = (π a.a)∗ ; (φ)?;
The program above would state to iterate for a non-deterministic number
of times the operations of picking non-deterministically a action a and executing it. Finally, the condition φ is checked. When executing δerec , the
non-determinism has to be solved by choosing the number of iteration times
as well as the actions a to be picked at each cycle. If we use the IndiGolog
search operator and we execute Σδrec , the interpreters will use the mechanism
for off-line lookahead so as to solve the non-determistic choices in a way that
the whole program can terminate. Therefore, the following program is exactly
the recovery plan:
δrec = Σ[(π a.a)∗ ; (φ)?; ];
(5.2)
5.2. THE SMARTPM ENGINE
87
When practically implementing that, program δerec corresponds to procedure
SearchProgram where formula φ is there named GoalReached. In addition,
we have restricted the researching space in the light of the fact we already
know that there is a specific pattern of the sequence of actions required for
the whole execution of single tasks. In this way, the search discards directly
action sequences that do not respect such a pattern without evaluating them.
Reminding that the fluents are not changed by PMS actions, if we considered only such actions, the recovery plan meant to achieve goal GoalReached
would fail as no PMS action can change fluents. Therefore, the search operator
should try to find the recovery plan on the basis of some assumptions [121].
Specifically for SmartPM, there are two assumptions: the first is that the
action readyT oStart(T ask, Id, Srvc) performed by a certain service Srvc is
expected to follow the PMS action Assign(workitem(T ask, Id, Input), Srvc);
the second concerns the PMS action Start(T ask, Id, Srvc, Input), which is
supposed to come before f inishedT ask(T ask, Id, Srvc, Input) by the PMS.
Once specified these assumptions, the search operator considers that, for instance, Start(·) may contribute to the achievement of a certain goal φ, given
that a proper Start(·) is going to be followed by a corresponding proper
f inishedT ask(·). And the latter action is able to change fluents. What
does happen if assumptions are not respected? For example, the action
f inishedT ask does not follow Start or returns parameter values different from
those assumed. In those cases, the recovery plan is consider as failed, and a
new recovery plan is searched by applying again Equation 5.2 and considering
the new values of fluents. And, if found, it is executed. That means we do not
recovery the recovery plan; instead, we create a new plan achieving φ starting
from the current situation, discarding the previous.
The code of the implementation is available in Appendix A; specifically the
features for monitoring and repairing are coded between lines 222 and 345.
Some screen shots. We would like to close the explanation of the SmartPM
engine by showing some screen shots of the PMS. Figure 5.5 depicts the main
window of SmartPM showing the log of all actions (both exogenous and of
PMS). Specifically Figure 5.5(a) shows all of actions performed by the PMS
and service 5 ranging from assign to ackTaskCompletion. In the windows, it
is easy to see the presence of rows starting with =======> EXOGENOUS EVENT
that represent the actions executed by service 5, which are considered by
the PMS as exogenous events, though “good” ones. Figure 5.5(b) shows
the logging of the behaviour of the system as results on exogenous event
disconnect(4,loc(9,9)). This exogenous event, a “bad” one, is launched
to notify that service 4 is predicted for performing a task to move to location (9,9), where it would be disconnected. This is a significant deviation of
88
CHAPTER 5. THE SMARTPM SYSTEM
(a) The actions executed for performing task Go for location (5,5)
(b) The recovery planning for handling exogenous event disconnect of service 4
Figure 5.5: The main window of the IndiGolog interpreter used by SmartPM.
5.2. THE SMARTPM ENGINE
(a)
(c)
89
(b)
(d)
Figure 5.6: The proof-of-concept Work-list Handler for SmartPM (subfigures
a, b and c) and a Work-list Handler that we developed at an earlier stage and
we are willing to integrate with SmartPM (subfigure d )
90
CHAPTER 5. THE SMARTPM SYSTEM
the physical reality from the virtual one, and requires the PMS to adapt the
process by building a recovery plan. The final plan consists in moving a free
service, namely service 1, to a location, specifically (3,6), in order to make
sure that when service 4 will stay connected when reaching destination location loc(9,9). Although the example may seem trivial, it shows the power of
the SmartPM approach: no designer specified how to deal with disconnections;
in this case, disconnection has been handled easily, since there is a service that
is not occupied performing other tasks. If, the case was not this, PMS automatically would have chosen a different strategy.
Figures 5.6(a)-(c) depict the proof-of-concept Work-list Handler, which
emulates the Graphic User Interface for PDAs for supporting the distribution
of tasks to human services. We developed it for the sake of testing the actual
functioning of the SmartPM PMS. We believe that it is also worthy showing
in Figure 5.6(d) the Work-list Handler of ROME4EU, a previous attempt to
deal with unexpected deviations (see [7]). It does not do any reasoning able
to detect discrepancies and recover and, hence, it is not able to recovery from
unexpected contingencies (it uses a pre-planned recovery approach). Nevertheless, it is valuable as it is entirely running on PDAs, where many of other
PMSs are not. We plan to integrate the ROME4EU’s work-list handler into
SmartPM so as to provide a tool for task distribution to human operators
equipped with PDAs.
5.2.3
Final discussion
This section has been devoted to describe how SmartPM has been implemented
by using the IndiGolog platform developed by the University of Toronto. Processes are coded by IndiGolog programs. We have shown in Section 5.2.1 the
feasibility of the approach of coding processes in IndiGolog. We have also underlined the program parts that may remain unchanged when passing from a
process domain to another and those which have to be defined case by case.
SPIDE, the Graphic Tool that allows designers to define graphically abstract process templates and, upon instantiating, create their concrete specification. SPIDE specifications are exported as XML files, which includes
information useful to generate the required IndiGolog programs and the whole
domain theory.
Furthermore, thanks to the use of IndiGolog SmartPM has made possible
to represent the adaptation features directly as IndiGolog procedures. The
adaptation is based on the IndiGolog search operator which relies on a quite
inefficient planning mechanism implemented in Prolog. Therefore, the current
implementation should be considered as proof-of-concept rather than a final
implementation.5
5
That is the main motivation why we do not provide here any testing results for judging
5.2. THE SMARTPM ENGINE
91
The next step we are currently working on is to overcome the intrinsical
planning inefficiency of Prolog by making use of efficient state-of-the-art planners to build the recovery program/process. As also claimed in [29], the step
should be theoretically and practically feasible
Some authors have already considered the problem of integrating Gologlike programs with planners, which are mostly compliant with PDDL [92, 50].
PDDL is an action-centred language, inspired by the well-known STRIP formulations of planning problems. In addition to STRIP, PDDL allows to express a type structure for the objects in a domain, typing the parameters that
appear in actions and constraining the types of arguments to predicates. At its
core it is a simple standardisation of the syntax for expressing this familiar semantics of actions, using pre- and post-conditions to describe the applicability
and effects of actions. Fritz et al. [5] develops an approach for compiling Gologlike task specifications together with the associated domain definition into a
PDDL 2.1 planning problem that can be solved by any PDDL 2.1 compliant
planner. Baier et al. [4] describes techniques for compiling Golog programs
that include sensing actions into domain descriptions that can be handled by
operator-based planners. Fritz et al. [51] shows how ConGolog concurrent programs together with the associated domain specification can be compiled into
an ordinary situation calculus basic action theory; moreover, it shows how the
specification can be complied into PDDL under some assumptions.
As far as the client, Figure 5.6 has shown the current version of the worklist handler, just a proof-of concept for the sake of testing the SmartPM engine.
As future development, we envision two two types of work-list handler: a fullfledged version for ultra mobile devices and a “compact” version for PDAs.
First steps have been already done in these directions. The version for ultra
mobile has been currently operationalized for a different PMS (see Section 7.2).
The same holds also for the PDA version that has been currently developed
during this thesis in the ROME4EU Process Management System, a previous
valuable attempt to deal with unexpected deviations (see [7]).
In conclusion, we are willing to underline once more that the approach proposed, which this section has shown an implementation of, is not another way
to catch pre-planned exception. Other approaches rely on rules to define the
behaviors when special events are triggered. Here we simply model (a subset
of) the running environment and the actions’ effects, without considering any
possible exceptional events. We argue that, in most of cases, modeling the
environment, even in detail, is easier than modeling all possible exceptions.
the performance
92
5.3
CHAPTER 5. THE SMARTPM SYSTEM
The Network Protocol
This section aims at describing an implementation of a manet layer for PDAs
and PCs to allow the multi-hop communication. Indeed, the current operating systems allows to add devices to mobile ad-hoc networks (i.e., mobile
networks without access points), but two devices that are not in radio range
cannot communicate. By implementing multi-hop communication features,
some devices that are not in radio range can exchange data packets using
intermediate nodes are relays. Passing node by node, the packets reach the
appropriate receivers, conceptually in the same way as packets flow through
public world-wide Internet to arrive at servers (and viceversa back at clients).
We are willing to use SmartPM in an emergency management scenario
where services communicate with the PMS through manets. Therefore, we
decided to implement a concrete multi-hop manet layer, starting from a preexisting implementation by the U.S. Naval. We extended it in order to be
working on the last generation of PDAs and low-profile devices.
In order to verify the actual feasibility of data packet exchange in manet
networks, we performed emulation by using octopus so as to let PDAs really
exchange packets. An important concern is that, when testing, since all nodes
are in the same laboratory room, the interference among nodes was significantly higher than whether those node were place in a real area. Nevertheless,
we discovered and proved a relationship between laboratory and on-the-spot
results, thus being able to derive on-the-spot performance levels from those
got in the laboratory.
Section 5.3.1 compares with relevant work and describes some technical
aspects of the U.S. Naval implementation, which we have started from. Section 5.3.2 shows how tests have been conducted and the results obtained.
Section 5.3.3 gives some final remarks which influenced the use of SmartPM
in manet scenarios.
5.3.1
Protocols and implementations
The purpose of this section is to overview of protocols and actual available implementations for providing multi-hop delivering features in manets, pointing
out pros and cons.
Routing protocols for manets can be divided in (i) topology-based or
(ii) position-based. A position-based routing needs information about the
current physical position of a node, that can be acquired through “localization
services” (e.g., a GPS), which very recently are becoming easily available on
PDAs (e.g., [75, 6]). Topology based protocols use information about the
existent links between node pairs. These protocols can be classificated by the
“time of route calculation”: (i) proactive, (ii) reactive and (iii) hybrid.
5.3. THE NETWORK PROTOCOL
93
A proactive approach to the manet routing seeks to maintain constantly
an updated topology knowledge, known to every node. This results in a constant overhead of routing traffic, but no initial delay in communication. Example protocols are OLSR and DSDV [107].
Reactive protocols seek to set up routes on-demand. If a node is willing
to initiate a communication with another node to which it has no route, the
routing protocol will try to establish such a route upon request. DSR [71],
AODV 6 and DYMO 7 are all reactive protocols. Finally hybrid protocols use
both proactive and reactive approaches, as ZRP 8 .
Out of these routing protocols, some implementations exist, mainly for
laptops, and only a few of them works on PDAs. Protocols that require
special equipment on board of devices or on the field, such as position-based
protocols, were discarded in our study because we aim at using off-the-shelf
devices and at operating with no existing infrastructures (e.g., in emergency
management). Moreover, we notice that reactive protocols in general have
worse performance than proactive ones in term of reactiveness to changes in
the topology. Conversely proactive protocols require more bandwidth [106].
A working implementation of AODV is WINAODV [143]; DYMO is the
most recent project and hence it is still in the standardization stage; an implementation for PDAs does not exist yet. Three OLSR working implementations
are available. The OLSRD 9 has a strong development community and it can
be extended through plug-ins. The “OLSRD for Windows 2000 and PocketPc” implementation 10 is a porting of the laptop OLSR version to mobile
devices. But these two projects, designed for older Windows CE versions,
seem not to be working properly on the latest Windows CE version (Windows Mobile 6). The NRL (US Naval Research Lab) implementation 11 offers
QoS functionalities, appears as a mature project and works on Unix/Windows/WinCE. Although it seems not to be working on the latest version of
Windows Mobile-based PDAs, it results to be a good starting point to extend
in some features.
NRLOLSR is a research oriented OLSR implementation, evolved from
OLSR draft version 3. It is written in C++ according to an object oriented
paradigm, and built on top of the NRL protolib library 12 for guaranteeing
system portability.
6
http://www.faqs.org/rfcs/rfc3561.html
http://tools.ietf.org/
html/draft-ietf-manet-dymo-02
8
http://www.tools.ietf.org/id/
draft-ietf-manet-zone-zrp-04.txt
9
http://www.olsr.org
10
http://www.grc.upv.es/calafate/olsr/olsr.htm
11
http://cs.itd.nrl.navy.mil/work/olsr/index.php
12
http://cs.itd.nrl.navy.mil/work/protolib/
7
94
CHAPTER 5. THE SMARTPM SYSTEM
Figure 5.7: MAC interference among a chain of nodes. The solid-line circle
denotes a node’s valid transmission range. The dotted-line circle denotes a
node’s interference range. Node 1’s transmission will corrupt the node 4’s
transmissions to node 3
Protolib works with Linux, Windows, WinCE, OpenZaurus, ns-2, Opnet;
it can works also with IPv6. It provides a system independent interface; so,
NRLOLSR does not make any direct system calls to the device operating
system. Timers, socket calls, route table management, address handling are
all managed through Protolib calls. To work with WinCE, Protolib uses the
RawEther component to handle at low level raw messages and get access to
the network interface cards.
The core OLSR code is used for all supported systems. Porting NRLOLSR
to a new system only requires re-defining existing protolib function calls.
NRLOLSR has non-standard command line options for research purposes,
such as “shortest path first route calculations”, fuzzy and slowdown options,
etc. Moreover, it uses a link-local multicast address instead of broadcast by
default.
5.3.2
Testing Manets
One of the most significant tests described later concerns the throughput in a
chain of n links, when the first node is willing to communicate with the last.
In this chain, every node is placed at a maximum coverage distance from
the previous and the next node in the chain, such as in Figure 5.7. In the
5.3. THE NETWORK PROTOCOL
95
shared air medium, any 802.11x compliant device cannot receive and/or send
data in presence of an interference caused by another device which is already
transmitting. From other studies (e.g., [84]) we know that every node is able
to communicate only with the previous and the next, whereas it can interfere
also with any other node located at a distance less or equal to the double of
the maximum coverage distance. Therefore, if many devices are in twice the
radio range, only one of them will be able to transmit data at once.
In our tests for the chain throughput, all devices are in the same laboratory
room, which means they are in a medium sharing context. The chain topology
is just emulated by octopus. Of course, having all devices in the laboratory,
the level of interference is much higher than on the field; hence, the throughput
gets a significant decrease. We have achieved a way to compute a theoretical
on-field throughput for a chain from the result obtained in the laboratory.
Let Qf ield (n) be the throughput in a real field for a chain of n links (i.e.,
n+1 nodes). We are willing to define a method in order to compute it starting
from laboratory-measured throughput Qlab (n). Here, we aim at finding a
function Conv(n), such that:
Qf ield (n) = Conv(n) · Qlab (n)
(5.3)
in order to derive on-field performance. We rely on the following assumptions:
1. The first node in the chain wishes to communicate with the last one (e.g,
by sending a file). The message is split into several packets, which pass
one by one through all intermediate nodes in the chain.
2. Time is divided in slots. In the beginning of each slot all nodes, but the
last one, try to forward to the following in the chain a packet, which slot
by slot arrives at the last node.
3. Communications happen on the TCP/IP stack. Hence, every node that
has not delivered a packet has to transmit it again.
4. The laboratory throughput Qlab (n) = nαβ , for some values of α and β.
This assumption is realistic as it complies several theoretical works, such
as [84, 57].
We have proved the following statement:
Statement. Let us consider a chain formed by (n+1) nodes connected through
n links. On the basis of assumptions above, it holds13 :
¡ n
¢β
Conv(n) = b c + 1 2
3
13
b·c denotes the truncation to the closest lower integer
(5.4)
96
CHAPTER 5. THE SMARTPM SYSTEM
Proof. From the first assumption, we can say that, if the i-th node successes
in transmitting, then (i − 1)-th, (i − 2)-th, (i + 1)-th and (i + 2)-th cannot.
Let us name the following events: (i) Dn be the event of delivering a
packet in a chain of n links and (ii) Sni be the event of delivering at the i-th
attempt.
Let us name Ti,n as the probabilistic event of delivering a packet in a
network of n links (i.e., n + 1 nodes) after i retransmissions 14 .
For all n the probability of delivering after one attempt is the same as
the probability of deliver a packet: P (T1,n ) = P (Dn ). Conversely, probability
P (T2,n ) is equal to the probability of not delivering at the first P (¬Sn1 ) and of
delivering at the second attempt P (Sn2 ):
P (T2,n ) = P (Sn2 ∩ ¬Sn1 ) = P (Sn2 ) · P (¬Sn1 |Sn2 )
(5.5)
Since, for all i, events Sni are independent and P (Sni ) = P (Dn ), Equation 5.5
becomes:
P (T2,n ) = P (Sn2 ) · P (¬Sn1 ) = P (Dn ) · (1 − P (Dn ))
In general, the probability of delivering a packet to the destination node after
i retransmissions is:
(i−1)
P (Ti,n ) = P (Sni ) · P (¬Sn ) · . . . · P (¬Sn1 ) =
= P (Dn ) · (1 − P (Dn ))i−1
(5.6)
We can compute the average number of retransmissions, according to
Equation 5.6 as follows:
P
Tn =P ∞
i=1 P (Ti,n ) =
(5.7)
1
i−1 =
= ∞
i=1 P (Dn ) · (1 − P (Dn ))
P (Dn )
In a laboratory, all nodes are in the same radio range. Therefore, independently on the nodes number,
P (Dnlab ) = 1/n
(5.8)
On the field, we have to distinguish on the basis of the number of links. Up to
2 links (i.e., 3 nodes), all nodes interfere and, hence, just one node out of 2 or
3 can deliver a packet in a time slot. So, P (D1f ield ) = 1 and P (D2f ield ) = 1/2.
For links n = 3, 4, 5, two nodes success: P (Dnf ield ) = 2/n. For links n = 6, 7, 8,
there are 3 nodes delivering: P (Dnf ield ) = 3/n. Hence, in general we can state:
P (Dnf ield ) =
14
b n3 c + 1
n
(5.9)
Please note this is different with respect to Sni , since Ti,n implies deliver did not success
up to the i − 1-th attempt
5.3. THE NETWORK PROTOCOL
97
By applying Equations 5.8 and 5.9 to Equation 5.7, we derive the number of
retransmission needed for delivering a packet:
T f ield (n) =
T lab (n) = n
n
bn
c+1
3
(5.10)
Fixing the number of packets to be delivered, we can define a function f that
expresses the throughput in function of the number of sent packets. If we have
a chain of n links and we want to deliver a single packet from the first to the
last node in the chain, then we have altogether to send the number n of links
times the expected value for each link Tn . Therefore:
Qlab (n) = f (T lab (n) · n) = f (n2 )
2
Qf ield (n) = f (T f ield (n) · n) = f ( b nnc+1 )
(5.11)
3
From our laboratory experiments described in Section 5.3.2, as well as from
other theoretical results [84]), we can state f (n2 ) = nαβ . By considering it and
Equations 5.11, the following holds:
¢β
¡ n
Qf ield (n)
Qlab (n)
2
=
c
+
1
(5.12)
⇒
Q
(n)
=
Q
(n)
·
b
f
ield
lab
2
f (n2 )
3
f ( nn )
b 3 c+1
The test-bed and experiments
The test-bed devices are all off-the-shelf, certified for the 802.11b standard.
Specifically, we used one HP iPAQ 5550 (CPU 400 MHz) running PocketPC
2003/WinCE 4.2 and three ASUS P527 (CPU 200 MHz) equipped with Windows Mobile 6.0/WinCE 5.0. These are complemented by 4 PDAs emulated
through the PDA emulator of MS Visual Studio .NET. Such emulated PDAs
are running on usual laptops and guaranteed performance levels are slightly
less than ASUS PDAs. Therefore, in every test, manets are only composed
by PDAs.
We build the ad hoc network with 802.11b, and we connect all the devices with encryption and RTS/CTS ability turned off. One more workstation
(equipped with a wireless card) is running the octopus emulator and plays
the role of gateway: devices are supposed to send any packet to the target
destination; but actually every packet is captured by octopus, which decides
whether or not to forward it to the destination by considering whether or not
the sender and the destination node are neighbor in the kept virtual map.
Each device is running the NRLOLSR protocol implementation specific for its
operating system (WinCE or Windows XP).
We investigate on three kinds of tests: the performance of chain topology;
some tuning related to the protocol; some tests with moving devices.
98
CHAPTER 5. THE SMARTPM SYSTEM
Figure 5.8: Test results for a manet chain in the laboratory, and estimated
on-the-spot results
Performance of the chain topology. The aim of this test is to get the
maximum transfer rate on a chain. To obtain the measurements an application
for Windows CE was built (using the .NET Compact Framework 2.0), which
transfers a file from the first to the last node on top of TCP/IP, reporting the
elapsed time.
All the devices use the routing protocol with the default settings and
HELLO INTERVAL set to 0.5 seconds. octopus emulates the chain topology and grabs all broadcast packets. When a node wants to communicate
to another node, it sends packets directly to it if this is in his neighborhood,
otherwise it sends them following the routing path. Both real and emulated
devices were used; each reported value is the mean value of five test runs.
Figure 5.8 shows the throughput outcomes. The blue curve tracks the
laboratory results; as stated in Section 4, we found through interpolation that
the curve follows the trend Qlab (n) = nαβ where α = 385 and β = 1.21. The
green curve is the maximum theoretical throughput computed by Equation 5.4.
We believe the actual throughput we can trust when developing applications
is between the green and the blue curve.
5.3. THE NETWORK PROTOCOL
99
Tuning of the protocol. There are a lot of parameters of NRLOLSR that
can be changed but only few of them have a strong impact on the protocol
effectiveness. We focus on HELLO INTERVAL that is the most important
value because it influences the reactivity on topology changes. We test how
increasing or decreasing this parameter could affect the topology knowledge,
and, hence, the reactivity of the network. As every mobility pattern can be
stepwise considered as a crossing of chain of nodes, we investigate a single
chain, by considering it as a “building block”.
The scenario is as shown in Figure 5.7: the nodes in the chain are fixed
and not moving; each node knows only two neighbors; at time t node 1 enters
in the range of node 2; we compute the time elapsed between t and the first
application message sent by 6 and received by 1. To do this a client/server
application that continuously sends UDP messages from the first node to the
last node was built; this, indeed, introduce a small delay that can be ignored.
This interval is referred as FPT (First Packet Time) and it can be broken
down as follows:
F P T = 2 · chain time + build route time
(5.13)
where chain time is the time used by the packet to travel along all the chain
and to come back, and build route time is the fraction of time that is necessary
to the head node to build the new routing table and choose the correct path
for the packet. To catch the exact time, in this test, the head node and the
entering node are laptop instead of PDAs, so it easy to use a network sniffer
software (that is not available on PDAs). Again the mobility emulation is
provided by the octopus machine.
Figure 5.9 shows the trend of FPT with different values for
HELLO INTERVAL. Each reported value is the mean value of eight runs.
The curve decreases linearly except on the last point, where the interval is
set to 0.1 second. For interval less than 0.5 seconds the FPT increases. A
minimum around 0.5s is due to the inability of devices to follow the network
load. The value of the minimum depends upon the CPU, the RAM, in general
upon the hardware configuration of the PDA: more powerful devices should
return a smaller minimum.
All these values have to be considered for one single traffic flow, so in a
real scenarios where the traffic is very high and there are multiple flows, it is
important to choose an interval value that allows fast topology reactivity and
that does not overload too much the devices.
Tests with moving devices. This kind of test aims to determine whether
or not the NRLOLSR implementation is suitable for a real environment where
nodes are often moving. Indeed, in a real field it is important not to break the
100
CHAPTER 5. THE SMARTPM SYSTEM
Figure 5.9: Time elapsed to establish a direct communication in a chain of
five nodes
communication among movements of nodes. If a team member is transmitting
information to another team member, and nodes topology changes, all data
must be delivered successfully, provided that the sender and the receiver are
connected at all times through a multi-hop path, maybe changing over the
time.
In order to emulate a setting of moving devices, we investigate three topologies, as shown in Figure 5.10, where the dashed line shows the trajectory
followed by a moving device. Such topologies are designed in order to have
(i) the moving node always connected at least another node, and (ii) each
node is connected in some ways to at least another one, i.e., there are not
disconnected node (no partitions in the manet).
A WinCE application is used that continually sends 1000-byte longs
TCP/IP packets between node S and node D. We tested every topology five
times and every run was 300-seconds long.
Outcomes are demonstrated to be quite good, for every topology: during
every run all data packets were correctly delivered to the destination. Wed
experience only some delays when the topologies were changing for a node
movement. Indeed, while a new path is set up, data transmission incurs in
100% losses since the old path cannot be used for delivering. At application
level, we are using reliable TCP and, hence, packets delivering is delayed since
every single packet has to be transmitted again and again until the new path
is built up.
TCP defines a timeout for retries; if a packet cannot be delivered by a
5.3. THE NETWORK PROTOCOL
101
Figure 5.10: Dynamic topologies for testing TCP/IP disconnections
certain time amount, an error is returned at application level and no attempts
are going to be done anymore.
In order not to incur in TCP timeouts, the node motion speed is crucial: if
nodes are moving too fast, topologies are changing too frequently and, hence,
the protocol is not reactive enough to keep routes updated. In the tested
topologies, we have discovered that the maximum speed is around 18 m/s (65
km/h) such that TCP timers never expire.
5.3.3
Final Remarks
The results depicted in Figure 5.8 allows to carefully take into account the
throughput that a manet of real devices can nowadays support. Surely, on
the basis of the previous discussions, any configuration of a manet will present
a performance that lies in the area between the two lines, being one the possible
worst case and the other the possible best case. We have shown that for more
102
CHAPTER 5. THE SMARTPM SYSTEM
than 5 devices we have a throughput of about 50 Kbytes/sec. As a matter
of fact, the data exchanged between services and the PMS engine are quite
limited in size and compatible with such a limited bandwidth. The fact that
SmartPM itself works in manet scenarios does not mean that the services
integrated do. Services to be integrated should be conceived and developed in
order to limit the bandwidth they require.
5.4
Disconnection Prediction in Manets
This section aims at illustrating a technique to predict disconnections of devices in Mobile Ad-Hoc Networks before the actual occurrence.
When working on the spot, team members move in the affected area to
carry out the tasks assigned to services. If using manets, movements may
cause possible disconnections and, hence, unavailability of nodes, and, finally,
unavailability of provided services. The SmartPM adaptation should be able
to realize when devices are disconnecting and enact an appropriate recovery
plan to avoid to lose such devices and the services they provide. This section
aims at showing a specific sensor that is able to predict disconnection before
they actually occur. Indeed, once a device disconnects, it gets out of control
and, hence, SmartPM cannot generate appropriate recovery plans that involve
actions for such devices with the result of reducing the effectiveness of such
plans.
Figure 5.11 shows how the disconnection predictor is located into the overall SmartPM architecture. The prediction is done by a central entity called
Disconnection Prediction Coordinator which is currently implemented in C#.
When a disconnection of a given device a is predicted, the coordinator informs
the corresponding Prediction Manager, which is physically located inside the
SmartPM architecture. This manager generates for each of the service s installed on a an exogenous action/event disconnect(s, loc). Parameter loc is a
location pair (x, y) identifying the location where a (and all its services) are
predicted to move to once disconnected.. Finally the Communication Manager
notifies the IndiGolog engine of the occurred event.
Our predictive technique is based on few assumptions:
1. Each device is equipped with specific hardware that allows it to know
its distance from the surrounding connected (i.e., within radio range)
devices. This is not a very strong assumption, as either devices are
equipped with GPS or specific techniques and methods (e.g., TDOA time difference of arrival, SNR - signal noise ratio, the Cricket compass,
etc.) are easily available. Kusy et al. [79] present a precise technique to
track multiple wireless nodes simultaneously. It relies on measuring the
position of tracked mobile nodes through radio interferometry. This is
5.4. DISCONNECTION PREDICTION IN MANETS
103
Figure 5.11: The architecture of the disconnection predictor in SmartPM.
guaranteed to reduce significantly the error with respect to GPS. Nevertheless, Hadaller et al. [58] have devised techniques to mitigate the error
when computing node position through GPS. Indeed, they performed
experiments where the error has been reduced to 3 meters when nodes
are not moving and to 20 meters when nodes are at 80 km/h.
2. There are no landmarks (i.e., static devices with GPS) in the manet;
we are indeed interested in very dynamic manets, where the availability
of landmarks can not be supposed.
3. At start-up, all devices are connected (i.e., for each device there is a
path - possibly multi-hop - to any other device). The reader should note
that we are not requiring that each device is within the radio range of
(i.e., one hop connection to) any other device (tight connection), but we
require only a loose connection, which can be guaranteed by appropriate
routing protocols, such as its implementation described in Section 5.3.
4. A specific device in manet, referred to as coordinator, is in charge of
centrally predicting disconnections. As all devices can communicate at
start-up and the ultimate goal of our work is to maintain such connections through predictions, it is possible to collect centrally all the
information from all devices. The coordinator may coincide with the
104
CHAPTER 5. THE SMARTPM SYSTEM
node hosting the SmartPM core engine but may be any other node in
the same network.
The predictive technique is essentially as follows: at a given time instant ti
the coordinator device collects all distance information from other devices (for
assumptions (1) and (3)); on the basis of such information, the coordinator
builds a probable connection graph that is the probable graph at the next time
instant ti+1 in which the predicted connected devices are highlighted. On the
basis of such prediction, the coordinator layer will take appropriate actions
(which are no further considered in the following of this section).
The remaining of this section starts with evaluating the state of the art
of mobility prediction. Then, we enter deeply inside the technique we aim at
proposing.
5.4.1
Related Work
Much research on mobility prediction has been carried on (and still it is in
progress) above all for cellular phone systems [2, 85]. These approaches are
based on Markov models, which predict the mobile user future’s location on the
basis of its current and past locations. The aim is to predict whether a mobile
user is leaving a current cell (crossing the cell boundaries) and the new cell
where she is going. Such an information is then used for channel reservation
in the new cell. Anticipating reservation should lower the probability of a call
to be dropped during handoff 15 due to the absence of a free channel for the
call in the new cell.
The main differences with our approach are related different scenarios:
manet versus mobile phone networks. Indeed, peculiarities of manets consist
in that higher mobility, compared with phone networks. In manets, links
between couples of devices disappear very frequently. That does not happen
in phone cells, which are very big: leaving a cell and entering into a new is
rare with respect to how often manet links falls down.
We use a centralized approach like in cellular network where a coordinator
collects information to allow prediction. The difference is that our approach
takes into account the knowledge of all distances among all users. Indeed,
we don’t have any base station; therefore, we do not have just to predict the
distance of any mobile device to it. We are interested in the distance from any
device to anyone else.
In the literature, several approaches predict the state of connectivity of
manet nodes. The most common approaches assume that some of nodes are
aware of their location through GPS systems in order to study node motions
15
In cellular telecommunications, the term handoff refers to the process of transferring an
ongoing call or data session from one channel connected to a core network or cell to another.
5.4. DISCONNECTION PREDICTION IN MANETS
105
and predict disconnections. In [103] the authors perform positioning in network using range measurements and angle of arrival measurements. But their
method requires a fraction of nodes to disseminate their location information
such that other nodes can triangulate their position. In [116] the probability that a connection will be continuosly available during a period of time is
computed only if at least one node knows its position and its speed through
GPS. Our approach is more generic as it doesn’t require any specific location
techniques: every hardware allowing to know node distances is fine.
In [137] manets are considered as a combination of clusters of nodes and it
studies the impact (i.e., the performances) of two well defined mobility prediction schemes on the temporal stability of such clusters; unlike our approach
the authors use the pre-existing predictive models while the novelty of our
approach consists in the formalization of a new model based on Bayesian filtering techniques. In [45] neighbor prediction in manets is enacted through
a suitable particle filter and it uses the information inside the routing table of
each node. Routing table is continuously updated by the underlying manet
protocol. The first drawback is that it can operates only with those protocols
that work by updating routing tables. Since it is based only on routing table
updates, it predicts how long couples of nodes are going to be connected on
the basis of how long they have been connected in the past. It doesn’t consider
whether couples of nodes are moving closer or drifting apart, nor node motion
speed. Our approach takes such an information also into account, making
prediction more accurate.
Fox et al. [49] address the issue of robot location estimation. For each
position pi and each robot rj , the technique gives the probability for rj to be
in pi . This approach cannot be easily used to compute when nodes are going
to disconnect.
5.4.2
The Technique Proposed
Bayesian Filters
Bayes filters [13] probabilistically estimate/predict the current state of the
system from noisy observations. Bayes filters represent the state at time t
by a random variable Θt . At each point in time, a probability distribution
Belt (θ) over Θt , called belief, represents the uncertainty. Bayes filters aim
to sequentially estimate such beliefs over the state space conditioned on all
information contained in the sensor data. To illustrate, let’s assume that
the sensor data consists of a sequence of time-indexed sensor observations
z1 , z2 , ...., zn . The Beli (θ) is then defined by the posterior density over the
random variable Θt conditioned on all sensor data available at time t:
Belt (θ) = p(θ|z1 , z2 , ...zt )
(5.14)
106
CHAPTER 5. THE SMARTPM SYSTEM
Generally speaking, the complexity of computing such posterior density
grows exponentially over time because the number of observations increases
over time; it is necessary for making the computation tractable the following
two assumptions:
1. The system’s dynamic is markovian, i.e., the observations are statistically independent;
2. The devices are the only subjects that are capable to change the environment.
On the basis of the above two assumptions, the equation in a time instant
t can be expressed as the combination of a prediction factor Belt−1 (θ) (the
equation in the previous time instant) and an update factor that realizes the
update of the prediction factor on the basis of the observations in the time
instant t.
In our approach, the random variable Θt belongs to [0, 1] and we use the
Beta(α,β) function as a belief distribution to model the behavior of the system,
according to the following equation:
Belt (θ) = Beta(αt ,βt )
(5.15)
The beta distribution is a family of continuous probability distributions
defined on interval [0, 1] parameterized by two positive shape parameters. The
probability density function of the beta distribution is:
Beta(α,β) (x) = R 1
0
xα−1 (1 − x)β−1
uα−1 (1 − u)β−1 du
While the mean value and the variance are closed-form expression, the Cumulative distribution function can be only computed through numerical analysis.
Mean value and variance are defined as follows:
α
E(X) = α+β
V ar(X) = (α+β)2αβ
(α+β+1)
In Bayesian Filtering, values α and β represent the state of the system and
vary according to the following equations:
½
αt+1 = αt + zt
(5.16)
βt+1 = βt + zt
In our approach, the observation zt represents the variation of the relative
distance between nodes (i,j) normalized with respect to radio range in the
time period [t-1,t]. It is used to update the two parameters α and β of the
Beta function according to Equation 5.16. The evaluated Beta(α, β) function
(i,j)
predicts the value of θt+1 estimating the relative distance that will be covered
by the nodes (i,j) in the next time period [t,t+1].
5.4. DISCONNECTION PREDICTION IN MANETS
107
timer: a timer expiring each T seconds.
iBuffer[x,y]: a bi-dimensional squared matrix storing distance among couples of nodes X and
Y.
bayesianBuffer[x,y]: a bi-dimensional square matrix storing a triple (α, β, distance) for each
couple of nodes X and Y.
upon delivering by node i of tuple(i, j, dist)
1 iBuf f er[i, j] ← dist
upon expiring of timer()
1 localBuf f er ← iBuf f er[i, j]
2 /*empty intermediate buffer*/
3 for (i, j) ∈ ibuf f er
4 do ibuf f er[i, j] ← RADIO RAN GE
5
6 for (i, j) ∈ localBuf f er
7 do if localBuf f er[i, j] ← RADIO RAN GE
8
then observation ← 1
9
else observation ← (localBuf f er[i, j] − bayesianBuf f er[i, j].distance)/RADIO RAN GE
10
observation ← (observation + 1)/2
11
bayesianBuf f er[i, j].distance ← localBuf f er[i, j]
12
bayesianBuf f er[i, j].alpha ← u ∗ bayesianBuf f er[i, j].alpha + observation
13
bayesianBuf f er[i, j].beta ← u ∗ bayesianBuf f er[i, j].beta + (1 − observation)
Figure 5.12: Pseudo-codes of the Bayesian algorithm for predicting node distances.
Prediction of distances
Our approach relies on clock cycles whose periods are T . The pseudo-code
for the coordinator is described in Figure 5.12. We assume the iBuffer data
structure to be stored only at Team Leader and accessed only by local threads
in a synchronized way. For each ordered couple (i, j) of nodes, in the n-th
(i,j)
(i,j)
cycle, the monitor stores two float parameters, αn and βn , and the last
(i,j)
observed distance dn−1 .
Let us assume a node k comes in a manet during the m-th clock cycle.
(k,j)
(k,j)
Then, for each manet node j we initialize αm = βm = 1. In such a way
(k,j)
we get the uniform distribution in [0, 1] and, so, every distance dm+1 gets the
same probability.
For each time period T , each generic node i sends a set of tuples (i, j, dj )
to the coordinator, where j is an unique name of a neighboring node and dj is
the distance to j. The coordinator collects continuously such tuples (i, j, dj )
coming from the nodes in an intermediate buffer. We do no assumption about
clock synchronization. So, every node collects and sends information to Team
Leader according to its clock, which is in general shifted with respect to the
one of other nodes.
Monitor performs prediction according to the same clock T : at the begin-
108
CHAPTER 5. THE SMARTPM SYSTEM
ning of the generic n-th clock cycle upon timer expiring, it copies the tuples
(i, j, djn ) from the intermediate buffer to another one and, then, it empties the
former buffer to get ready for updated values. In the clock cycle, for each collected tuple (i, j, dj ) monitor updates the parameters as follow by a bayesian
filter:
(
(i,j)
(i,j)
(i,j)
αn+1 = u · αn + on
(5.17)
(i,j)
(i,j)
(i,j)
βn+1 = u · βn + (1 − on )
(i,j)
where on is an observation and u ∈ [0, 1] is a constant value. Constant
u aims for permitting old observations to age. As new observations arrive,
the previous ones get less and less relevance. Indeed, old observations do not
capture the updated status of manet connectivity and motion.
The value for observation can be computed from the relative distance variation between i and j, scaled with radio-range:
(i,j)
∆drn(i,j)
(i,j)
dn − dn−1
=
radio range
(5.18)
where radio range is the maximum distance from where two nodes can communicate with each other.
(i,j)
Possibly dn can miss in the cycle n. The distance between i and j could
miss because i and j are not in radio-range or packets sent by i to Team Leader
are lost or delivered lately.
(i,j)
It is straightforward to prove ∆drn
to range in [-1, 1] interval. This
range is not suitable for Bayesian filter since observations should be between
0 and 1. So we map the value in Equation 5.18 into the suitable range [0, 1]
as follows16 :
 (i,j) (i,j)
 dn −dn−1 dn and dn−1 are available
radio range
(5.19)
o(i,j)
=
if dn is unavailable
n
 11
2
if dn is available but dn−1 is not
In sum, our Bayesian approach estimates the variation of the future distance between every couple of nodes, normalized in the [0, 1] range. Values
greater than 0.5 mean nodes to drift apart and smaller values to move closer.
If the value is equal to 0.5, node i is estimated not to move with respect to j.
The parameters α and β are the inputs for Beta distribution Beta(α, β),
¡
(i,j)
(i,j)
(i,j) ¢
where the expectation θn+1 = E Beta(αn+1 , βn+1 ) is the variation of the distance between i and j in radio-range percentage that will be estimated at the
beginning of (n + 1)-th clock cycle.
16
(i,j)
If a node has entered in this cycle we assume on
= 0.5, i.e., it is not moving.
5.4. DISCONNECTION PREDICTION IN MANETS
109
At this stage we can estimate the distance between nodes i and j at the
beginning of (n + 1)-th clock cycle. That can be done from Equation 5.19 by
(i,j)
(i,j)
replacing the observation term on with the estimated value θn+1 . Hence:
(i,j)
(i,j)
(i,j)
f
den+1 = dn + ∆d
=
n
(i,j)
(i,j)
= dn + (2θ
− 1) ∗ radio range
(5.20)
(i,j)
(j,i)
(i,j)
(j,i)
It should hold dn = dn ; so, it should be den+1 = den+1 . But we have
(i,j)
(j,i)
to consider den+1 6= den+1 . Indeed distance sent by i about distance (i, j) can
differ from what is sent by j about the same distance. This is why distances
are collected at beginning of clock cycles but these can be shifted. Indeed,
information can be different, as collected in different moments.
Therefore, estimated distance dei,j is computed by considering both dei,j
n+1
n+1
(i,j)
and den+1 , through different weights.
(i,j)
(j,i)
e(i,j)
e(j,i)
dei,j
n+1 = reln+1 ∗ dn+1 + reln+1 ∗ dn+1
(i,j)
where
reln+1
liability q
and
(i,j)
σn+1 =
is
a
factor
for
the
estimation
is
inversely
proportional
¡ (i,j) (i,j) ¢
V ar(Beta αn+1 , βn+1 ) :
it
(i,j)
reln+1
=
1
(i,j)
σn+1
1
(i,j)
σn+1
+
reto
(j,i)
=
1
(j,i)
σn+1
σn+1
(i,j)
(j,i)
σn+1 + σn+1
.
Connected Components Computation
Disconnection prediction depends on a parameter γ, which stands for the
fraction of the radio-range for which the predictive technique doesn’t signal a
(i,j)
(i,j)
disconnection anomaly17 . Let be P (discn+1 ) = P (den+1 ≥ γradio range); two
nodes i and j are predicted going to disconnect if and only if
(i,j)
(i,j)
(j,i)
(j,i)
reln+1 ∗ P (discn+1 ) + reln+1 ∗ P (discn+1 ) >
1
2
(5.21)
i.e. two nodes i and j are estimated disconnecting if it is more probable their
distance to be greater than γradio range rather than distance to be smaller
than such a value. We could tune more conservativeness by lowing γ (i.e.
17
As an example, in IEEE 802.11 with 100 meters of radio-range, γ equal to 0.7 means
that for a communication distance of 70 meters the prediction algorithm signals a probable
disconnection.
110
CHAPTER 5. THE SMARTPM SYSTEM
the fraction of radio-range in which disconnections are not predicted). If we
consider Equation 5.20, then:
(i,j)
P (discn+1 )
¯
= P (¯
¡
= P θ(i,j) ≥
(i,j)
dn
radio range
1+γ
2
−
¯
+ (2θ(i,j) − 1)¯ ≥ γ)
¢
d(i,j)
n
2∗radio range
(5.22)
where the last term in Equation 5.22 is directly computable from the estimated
beta distribution:
Z 1
¡
¢
(i,j)
P (θ
> k) =
Beta α(i,j) , β (i,j)
k
Once the algorithm predicts which links exist at the next cycle, we can compute easily the connected components (i.e., sets of nodes that are predicted to
be connected). Afterwards, on the basis of the connected components, disconnection anomalies are identified by the monitor. Connected components are
computable through “The Mobile Gamblers Ruin Algorithm” below, where an
edge between couples of nodes in the connection graph exists if Equation 5.21
is false.
Note that an error could be introduced by techniques for communication
distance evaluation: as our model is based on a Markov chain made of communication distances between devices, and since the measured distances could
include an approximation error compared to real communication distances,
t
this error could affect our model. Let’s assume that for every S(i,j)
there is
an average error 4S introduced by the real measure. Thus, by observing that
our model is linear, it follows that the 4S is spread all over the measures but
(t+1)
(t+1)
doesn’t depend on t, so Sp(i,j) is actually Sp(i,j) ± 4S. Indeed, the exact value
of 4S depends on which technique is used for distances evaluation, but as it
(t+1)
is typically small compared to Sp(i,j) , then our average error on the prediction
model is only partially affected by this error.
The “The Mobile Gambler’s Ruin” (MGR) algorithm is derived from the
Markov chain model of the well-known gambler’s ruin problem [47, 62]. Such
a study of the device movements and the consequent distance prediction is
based on Markov chains, because the success of a prediction depends only on
events of previous time frame units. Instead of using a markovian process in
time domain, we are going to focalize on spatial domain and we will build a
matrix, which is similar to the one presented in the original gambler’s ruin
model but with other elements.
Let’s consider a square matrix of |E| × |E| elements, where |E| = m,
with m, with m is the total number of mobile devices in the manet. We
build M = (mij ) as a m × m symmetric matrix, in which mij = 1 is
the Equation 5.21 is false or, otherwise mij = 0 if the equation is true18 .
18
The matrix is of course symmetric since always there holds mij = mji
5.4. DISCONNECTION PREDICTION IN MANETS
111
FUNCTION MGR()
1 numcomps ← 0
2 Comps ← newArray of integer[m];
3 for i ← 0 to (m − 1)
4 do if Comps[i] = 0
5
then numcomps ← numcomps + 1
6
Comps[i] ← numcomps
7
CCDFSG(M, i, numcomps, Comps[])
8 return Comps[]
SUB CCDFSG(M, i, numcomps, Comps[])
1 for i ← 0 to (m − 1)
2 do if Comps[j] = 0 and M [i, j] = 1
3
then numcomps ← numcomps + 1
4
CCDFSG(M, j, numcomps, Comps[])
5
FUNCTION TEST CONNECTION(i, j, Comps[])
1 if Comps[i] = Comps[j]
2
then T EST ← true
3
else T EST ← f alse
4 return T EST
Figure 5.13: Pseudo-Code of the MGR algorithm.
(i,j)
(j,i)
Every diagonal element mii = 1 since the P (discn+1 ) = P (discn+1 ) = 0.
That follows for definition: the distance of a mobile device from itself is always
equal to 0.
The matrix M = (mij ) can be considered as the Adjacency matrix of
an (undirected) graph where the set of nodes are devices and an arc exists
between two nodes if they are foreseen as direct neighbors.
The strategy of the MGR algorithm, which is described in Figure 5.4.2, is to
find the connected components of the graph (using the CCDFSG procedure),
and then, by giving two devices ei and ej , to verify if they belong to the
same connected component (the TEST CONNECTION function); if it is
true then ei , ej will still communicate in the next time period; else they will
lose their connection within the next time period. Using this strategy, after
building the matrix M = (mij ), we can verify which devices are connected,
directly (i.e., one hop) or indirectly (i.e., multi hop), and thus let decide when
disconnection management techniques should be activated in order to keep the
connection between the involved devices. The aim of such techniques should
be to have a unique connected component in the graph.
The MGR algorithm computes the connected components starting from
the matrix that represents the graph. The output of the MGR program
is the Comps array in which for each i-th element there is an integer value
corresponding to the connected component it belongs. For example, if we
have a set of devices E = {e1 , ..., em } and they form a graph with k connected
112
CHAPTER 5. THE SMARTPM SYSTEM
Generic Peer
Coordinator Device
BayesianPredServer
it.uniroma1.dis.Octopus
BayesianBuffer
PredictiveTimer
BayesianTuple
Buffer
DistanceServer
Information
About Neighbors
0..*
PMS Manager
BayesianPredClient
TCP/IP Sockets
Disconnection Signalling
Process Management System
Prediction Manager Manager
Figure 5.14: The components of the actual implementation.
components, we will have an output vector of this shape:
¡
¢
0 0 ... 1 ... 2 ... k − 1
(5.23)
Thus for two different devices ei , ej we have only to test, using the
TEST CONNECTION program, if they have the same value in the vector
(5.23), It will give us a confidence about the probability of being still connected in the next time period.
5.4.3
Technical Details
We implemented the Bayesian algorithm on actual devices. We coded in MS
Visual C# .NET as it enables to write applications once and deploy them
on any device for which a .NET framework exists (PCs and PDAs included).
In this section, we describe the technical details of packages and classes for
implementing the Bayesian algorithm.
We can identify two sides in the implementation as described in Figure 5.14:
the code running on the coordinator device, which realizes the prediction,
and that on the generic peers sending information about neighbors to the
coordinator.
The code of generic peers is conceptually easy. It is basically composed of
two modules:
it.uniroma1.dis.Octopus. We tested our algorithm by the octopus virtual
environment described in Section 5.5. octopus is intended to emulate
5.4. DISCONNECTION PREDICTION IN MANETS
113
small manet and holds a virtual map of the are where nodes are arranged. This module is intended to query octopus for knowing node
neighbors and their distance.
BayesianPredClient. This module includes internally two timers. The first
timer has a clock T, where T is the same as defined in Figure 6.1. For
each clock period, it gets information about neighbors (who and how
far they are) by using the it.uniroma1.dis.Octopus module. Then, it
arranges such an information in a proper packet, which is sent to coordinator. Upon expiring of the second timer, the client sends a command to
octopus to change the position of the node which this device is mapped
to. Of course, this timer uses also the it.uniroma1.dis.Octopus module.
The core of the coordinator predictor is the BayesianPredServer module. In
the specific case, it worthy breaking it down in the composing classes:
DistanceServer. This module implements a TCP/IP server to retrieve the
neighboring information from peers (sent by them through the module
BayesianPredClient). At the same time, it stores retrieved information
in the intermediate buffer, which is implemented by the module Buffer.
It corresponds to event handler for upon delivering of a tuple from a
peer as defined in Figure 6.1.
Buffer. It implements the intermediate buffer module, written by the DistanceServer module and read/made empty by PredictiveTimer. This
module guarantees synchronized accesses.
PredictiveTimer. This is a timer that repeats each T seconds. It implements the event upon expiring of timer as defined in Figure 6.1. Consistently to the pseudo-code, it accesses to the Buffer module to get
new information from other peers, as well as the BayesianBuffer module. The latter module stores the information to compute for each
couple of nodes the Equations 5.17 e 5.19. This module uses also the
it.dis.uniroma1.Octopus module. Indeed, Team Leader is a node itself
and it can lead to disconnections. Therefore, it has to ask for neighbors
to octopus and predict distances to any other node.
BayesianBuffer, BayesianTuple.
The BayesianBuffer
class handles and
¡
¢
stores the triple α(i,j) ,β(i,j) ,d(i,j) , each one represented by a
BayesianTuple object.
A second module composes the coordinator architecture and is named
PMSManager. It is in charge of communicating with the proper device management of the SmartPM engine to inform when disconnections are predicted
114
CHAPTER 5. THE SMARTPM SYSTEM
by module BayesianPredServer, specifically by using class PredictiveTimer.
The device manager, in its turn, will generate an appropriate exogenous action disconnect(·) to inform the IndiGolog engine about the disconnection.
5.4.4
Experiments
We conclude the section of the bayesian algorithm for disconnection prediction
by giving the result of some experiments performed for the sake of verifying
the accuracy of predictions. Therefore, testing does not involve the SmartPM
adaptive PMS.
In order to test the implementation, we used emulation by octopus (see
Section 5.5). This allows to test the feasibility of an actual implementation
beyond the theoretical soundness of the approach. octopus keeps a map of
virtual areas, which users can design and show by a GUI. Such a GUI enables
the users to put in that map the virtual nodes and bind each one to a different
real device. Furthermore, users can add possible existing obstacles in a real
scenario: ruins, walls and buildings.
The test-bed consists of nine machines (PCs and PDAs). In addition to
these, there is a further PC that hosts the octopus virtual environment. Each
of the nine machines is bound to a different virtual node of octopus’ virtual
map.
We set the testing virtual map as 400 × 300 meters wide and communication radio-range as 100 meters. At the beginning, nodes are located into the
virtual map in a random fashion in order to form one connected component.
Afterwards, each S seconds, every node chooses a point (X,Y ) in the map
and begin heading towards at a speed of V m/s. Both S and V are Gaussian
random variables: the mean and variance are set as, respectively, 450 and 40
seconds for S and 3 and 1.5 m/s for V . The couple (X,Y ) is chosen uniformly at random in the virtual map. Of course, devices used in tests do not
move actually: nodes move only in the virtual map. For this purpose, devices
send particular commands to a specific octopus socket for instructing node
motions.
The first set of experiments has been intended to verify which error in
percentage is obtained for different values of clock period T . The error here
is defined as the gap between the estimated distances den at (n − 1)-th clock
cycle and the actual measures dn at n-th clock cycle. The value is scaled with
respect to the radio-range:
The Figure 5.15(a) shows the outcome for the clock periods equal to 15,
20, 30, and 45 seconds. We have set the parameter u of Equation 5.17 to
value 0.5 and performed ten tests per clock period. Every test was 30 minutes
long. The results show, of course, that the error percentage grows high as
clock period increases. Probably the most reasonable value for real scenarios
5.4. DISCONNECTION PREDICTION IN MANETS
115
25,00%
22,50%
The error percentage
20,00%
17,50%
15,00%
Best Case
Worst Case
12,50%
10,00%
7,50%
5,00%
2,50%
0,00%
15
20
30
45
Polling time
(a) The smallest and largest measured error in percentage,
changing clock periods.
22,00%
21,53%
21,50%
21,00%
The error percentage
20,50%
20,30%
19,94%
20,00%
19,55%
19,50%
19,03%
19,02%
19,00%
18,28%
18,50%
18,09%
18,00%
17,59%
17,44%
17,50%
17,00%
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
u
(b) The measured error in percentage, changing the weight of
past observations.
Figure 5.15: Experiment results of the disconnection prediction module.
116
CHAPTER 5. THE SMARTPM SYSTEM
is 30–45 seconds (smaller values are not practically feasible since manets
would be probably overloaded by “distance” messages). Please consider the
greatest clock period we tested: the error ranges between 24.34% and 26.8%
(i.e., roughly 25 meters).
Afterwards, in a second tests set, we fixed clock period to 30 seconds,
testing for u equal to 0.01, 0.05, 0.1, 0.2, . . . , 0.8. We even tripled the frequency
which nodes start moving with. The outcomes are depicted in Figure 5.15(b),
where x-axis corresponds to u values and y-axis to the error percentage. The
trend is parabolic: the minimum is obtained for u = 0.3 where the error is
17.44% and the maximum is for u = 0.8 where the error is 21.54%. Small
values for u mean that the past is scarcely considered whereas large values
mean the past is strongly taken into account. This matches our expectation:
we get the best results for the intermediate values. That is to say that the
best tuning is obtained when we consider the past neither too little nor too
much.
As far as SmartPM and other possible applications, they can rely on such
predictions. Indeed, setting polling time to 30 seconds, we have got errors
around 18% for u = 0.3. If range is supposed to be 100 meters, the mean
error is around 18 meters. Indeed, if we set γ = 0.75 (i.e., disconnections
are predicted when nodes are more than 75 meters far away), we would be
sure to predict every actual disconnection. That means no disconnection is
not handled, although coordination layer (distributed or centralized, local or
global) will be alerted about some false negatives, enacting recovery actions
to handle unreal disconnections.
5.5
The OCTOPUS Virtual Environment
This section turns to describe an emulator, namely octopus, that we developed with the purpose of testing SmartPM in pervasive scenarios by using
manet. Indeed, when developing any software system (including SmartPM),
it is needed to study alternatives for the design and implementation of software modules, analyze possible trade-offs and verify whether specific protocols,
algorithms and applications are actually working. There exist three way to
perform analysis and tests: (i) simulation, (ii) emulation and (iii) field tests.
Clearly on-field tests would be the most preferred solution, but they require many people moving around in large areas and repeatability of the experiments would be compromised. Simulation and emulation allow to perform
several experiments in a cheaper and more manageable fashion than field tests.
Simulator and emulator (i.e., hardware and/or software components enabling
simulation or emulation) do not exclude each other. Simulation can be used
at an earlier stage: it enables to test algorithms and evaluate their perfor-
5.5. THE OCTOPUS VIRTUAL ENVIRONMENT
117
mance before starting actually implementing on real devices. Simulators, such
as NS219 [112], GlomoSim [147] or OMNeT++ [126], allow for several kinds
of hardware, through appropriate software modules (such as different device
types, like PDAs or smartphones, or networks, like Ethernet or WLAN 802.11).
Even if the application code written on top of simulators can be quickly written and performances easily evaluated, such a code must be throw out and
rewritten when developers want to migrate on real devices.
The emulators’ approach is quite different: during emulation, some software or hardware pieces are not real whereas others are exactly the ones on
actual systems. All emulators (for instance, MS Virtual PC or PDA emulator
in MS Visual Studio .NET) share the same idea: software systems are not
aware of being working on some layers that are partially or totally emulated.
On the other hand, performance levels can be worse: operating systems running on Microsoft Virtual PC work slower than on a real PC with the same
characteristics. Anyway, software running on emulators can be deployed on
actual systems with very few or no changes.
On the basis of such considerations, we developed octopus, a complete
emulator environment for manets.20 Our emulator is intended to emulate
small scale manets (10-20 nodes). Instead of making the whole manet stack
virtual, which would require duplication of a large amount of code, we decided to emulate only the physical MAC layer, leaving the rest of the stack
untouched. octopus keeps a map of virtual areas that users can show and
design by a GUI. Such a GUI enables users to put in that map virtual nodes
and bind each one to a different real device. Further, users can add possible
existing obstacles in a real scenario: ruins, walls, buildings.
The result is that real devices are unaware of octopus: they believe to
send packets to destinations. Actually, packets are captured by octopus,
playing the role of a gateway. The emulator analyzes the sender and the
receiver and takes into account the distances of corresponding virtual nodes,
the probability of losses as well as obstacles screening direct view21 . On the
basis of such information, it decides whether to deliver the packet to the
receiver.
The advantage of octopus is that, in any moment, programmers can
remove it and perform field manet tests without any kind of change. The aim
here is to present octopus and its novel features. Later we investigate existing
solutions by taking into account several comparing dimensions, specifically:
Minimal initial effort. The time amount necessary to learn and start us19
NS2 enables both simulation and emulation. Here, we refer to NS2’s simulation features.
octopus can be downloaded at: http://www.dis.uniroma1.it/∼deleoni/Octopus.
21
We assume whenever two nodes are not directly visible, every packet sent from one node
to the other one is dropped.
20
118
CHAPTER 5. THE SMARTPM SYSTEM
ing the emulator. Several emulators require to write complex scripts to
model channels in detail. We are interested in algorithms for the application layer (and not for the network one), whose performances are only
slightly modified by the channel and network parameters.
Portability. This feature gets a twofold meaning: from one hand, it means
code to be ported in non-emulated environments with few or no changes.
From the other hand, we refer portability as the ability to enable, during emulation, the use of several platforms, such as PCs with Linux or
Windows and PDAs with Windows CE or PalmOS.
Handling of Obstacles. The virtual map, which emulator holds, should allow users to insert obstacles representing walls, ruins, buildings. Virtual
nodes should move into the map by passing around without going over
such obstacles. Movements should be as realistic as possible, according
to well-know patterns.
Run-time Event Support. During experiments, destinations of the nodes
are required to be defined at run-time, according to the behavior of client
applications. Essentially, movements cannot be defined in a batch way;
conversely, during emulations, nodes have to interactively inform the
emulator about the movement towards given destinations.
As of our knowledge, octopus is the first manet emulator enabling clients
to interactively influence changes in the topology, upon firing of events which
were not defined before the beginning of the emulation. Other emulators
require the specification in batch mode, i.e., when the emulation is not yet
started, of which and when events fire.
In addition, octopus allows to include any kind of device, even PDAs or
smartphones, and applications, whereas other approaches support only some
platforms or applications coded in specific languages. Finally, octopus supports and handles possible obstacles, packet losses and enhanced movement
models, like Voronoi [69].
Please note that, though octopus was built for testing SmartPM, its applicability is broader and comprises all software systems that developers are
willing to test on manet without having to write some code that is thrown
away after the experiments.
5.5.1
Related Work
There exist several mobile emulators in the literature, even if they do not
provide the features which we need for our intends.
5.5. THE OCTOPUS VIRTUAL ENVIRONMENT
Patched NS2
MobiEmu
MNE
MobiNet
EMWIN
NEMAN
JEMU
Initial
effort
High
Low
Medium
Medium
Low
Low
Low
Code needs
changes?
No
No
Yes
No
No
No
Yes
Platform
Linux
Linux
Linux
All
All
Linux
All (only Java)
Obstacle
handling
Yes (Little)
No
No
No
No
No (but planned)
No
119
Run-time
support
No
No
No
No
No
No (but planned)
No
Table 5.1: Summary of features provided by some manet emulators
NS2 [112] on its own enables to emulate only wired networks. Anyway,
Magdeburg University has developed a NS2’s patch to perform wireless emulation [90]. This patched NS2 version can emulate an arbitrary IEEE 802.11
network by connecting real devices to the emulator machine. This solution
actually enables to build applications as if the emulator were not present and
to switch from a real and to an emulated environment without any change.
Anyway, it gets some drawbacks: (i) client hosts have to be Linux-based and,
thus, Windows-based computers or PDAs cannot be used; (ii) it is needed to
write complex TCL scripts to set up all emulated aspects of wireless links.
So a detailed manet configuration makes sense when people want to emulate
protocols of lower layers and it is important to consider several physical parameters. But in the case where we want to test application software (whose
performance and correctness is only slightly affected by such parameters), we
would like to easily configure emulated manets by a GUI so as to minimize
initial effort. Moreover, (iii) NS2, even patched, does not allow to put possible
obstacles on the map. At the most, people can define some Voronoi’s paths for
node movements to get a similar result, assuming them to be around obstacles.
However, we want that two virtual nodes are unable to communicate with each
other if they are not in direct sight (e.g., a building is located between them).
This is not possible by NS2. Finally, (iv) possible events during emulation are
decided at batch time in TCL scripts. So, clients cannot affect any change in
nodes topology.
Other emulators, such as MobiEmu [148], MNE [88] MobiNet [89], EMWIN
[149] and NEMAN [115], show similar problems. EMWIN is one of the few
emulators supporting any kind of devices. It works in a distributed fashion:
so-called emulator nodes are real machines and physically attached to a fast
ethernet switch. Emulator nodes can be installed on whichever platform, PCs
or PDAs. Every emulator node represents a sort of virtual hub where up to
8 Virtual Mobile Nodes (VMNs) can be connected. Therefore, EMWIN can
emulate any platform (PDAs included), even if it does not handle obstacles,
nor it allows to insert new events at run-time.
JEmu [48] replaces, for each client, the lowest layer of the communication
stack by an emulated one. The emulated stack sends packets to the JEmu
120
CHAPTER 5. THE SMARTPM SYSTEM
server. It decides, taking into account certain information (e.g., distance,
collision, etc.) whether the actual destination can receive them (even if ostacles
are not handled). If so, the emulator forwards these packets to the JEmu client
of the receiver. JEmu is totally written in Java so it works only with Java
software. Furthermore, applications need many changes if emulated by JEmu.
Table 5.1 summarizes the features which we are interested in for octopus.
In this table, “Patched NS2” refers to NS2 enhanced by Madgeburg’s patch.
Its “Little” obstacle support means that people might define Voronoi’s paths
for node movements, assuming paths to be around obstacles. The NEMAN
entries referred as “planned” are the features which authors will implement
in future releases: specifically they plan in future to handle obstacles and to
enable applications to influence at run-time links topology.
As you can see from the table, no emulator allows applications to modify at run-time nodes topology. All emulators are based on the same way
of using: at design-time, possibly through a GUI, users set up the scenario
and a virtual map, binding virtual wireless nodes to real devices, as well as
the moments when events fire, such as reaching a given position. Afterwards,
applications are running on devices to be emulated. When such a preparation phase finishes, emulation starts and events fire according to the specified
schedule. We want to enable the firing of events which were unforeseen during
the arrangement of emulation scenarios. In the “real world” the events, such
as movements, are caused by users, which interact with applications on board
of devices. In general, and especially when testing novel prototypes of application software on top of manets, applications on devices may influence the link
topology and nodes motion (e.g., when executing tasks, devices . Therefore
batch emulations might be completely useless. Moreover, obstacles are not
handled by other emulators. We think that these aspects are important to
make emulations realistic. So, we introduced such novel features in octopus.
5.5.2
Functionalities and Models
octopus provides functionalities to emulate a wireless local area network by
an intuitive and user friendly graphic interface. Main features provided by
octopus are described as follows:
Integrated graphical scenario editor. Emulation scenario setup is fully
managed through a GUI and there is no need to know or use any scripting language at all. This choice has been made to allow the average
user, even with only basic network knowledge, to focus mainly on the
experimental aspects.
Real time node mobility management. In our target experiments, destinations that nodes want to reach, have to be defined at run-time, accord-
5.5. THE OCTOPUS VIRTUAL ENVIRONMENT
121
ing to the behavior of client applications. Essentially, movements cannot
be defined in a batch way before emulation starts; on the contrary, during
emulations, nodes have to somehow inform emulator about their movements towards a given destination22 . This feature is implemented as a
TCP server listening for special “movement” commands sent by software
on board of devices. We know this breaks our constraint, which states
software on devices do not have to be modified when removing emulator.
Anyway, changes, if any, are extremely bounded. Basically they consist
in “commenting” invocations to octopus. In this case we could not
avoid to violate it: since those events are generated at run-time by software on devices, only such software can send those commands. However,
if we do not need this feature, software on devices actually does not have
to be modified when the emulator is removed.
Packets losses. The emulation system supports user-defined packet loss policies, described by a customizable range based function pd (r). The function pd (r) = k means the probability of a packet sent by a node to be
delivered to a node r-meters far is k. octopus supports also a more
advanced modelling of packet loss based on the ricean fading, which is
also more compliant with the real behaviour of a wireless communication
channels. Section 5.5.2 gives more details.
Obstacle-aware mobility model. Two movement models are available in
octopus: Way-Point and Voronoi [69]. The first one assumes nodes to
move straight on the line joining starting and destination point. The
latter is more realistic and it takes into account even possible obstacles
along the path. The devised algorithm is based by the Voronoi plane
tessellation model. Section 5.5.2 gives more detail about this algorithm.
Broadcast address emulation support. In some algorithms, we may
want peers to broadcast a message to every peer in radio range. Since
devices are connected through a real LAN23 , we cannot use the normal
broadcast IP address (i.e., x.x.x.255), as it would send the packet to
all peers in the network without considering the routing table. We want
to broadcast only to virtual neighbors. This issue is resolved by adding
a customizable virtual broadcast address instead of the usual one.
In the following, some details of octopus are given.
22
This makes sense when behavior of client applications is controlled by humans.
octopus and other actual devices have to be deployed in the same LAN in order to have
octopus to be able to reach other devices.
23
122
CHAPTER 5. THE SMARTPM SYSTEM
Voronoi Mobility Model
In order to develop a realistic mobility management, the nodes, living in the
emulated environment, move avoiding obstacles. As a matter of fact, humans
follow predefined paths to reach a place, such as roads and sidewalks: emulated environments should show similar behaviors. octopus allows to define
polygonal obstacles in the virtual map and it generates the graph of all possible segments of paths that do not cross them. The algorithm we have devised
derives from Original Voronoi algorithm. Original Voronoi assumes to have
a given set P of points in the plane and builds some special lines. Voronoi’s
lines describe closed polygons in the plane. Each polygons includes exactly
one pi ∈ P of the original points. For each pi , the corresponding polygon
contains all points which are closer to pi than other pj ∈ P .
Since obstacles are polygons and not simply points, a generalization is
needed:
1. Generate a “sampled” version of every obstacle by sampling every side
of every obstacle and replacing each one with a sequence of points. The
sampling rate can be defined by users.
2. Generate Voronoi diagram by considering points generated at point 1.
3. Remove segments crossing one or more obstacles. That means all segments having at least one of the two vertices inside an obstacle are
removed.
octopus Voronoi diagram is computed as dual of Delaunay triangulation
[55], as it gets actually lower realization complexity. Each segment generated
by the Voronoi algorithm represents a possible part of the path that nodes are
forced to follow in order to move without crossing an obstacle.
Packet Loss Models
In order to model the packet losses due to the unreliability of the physical
channel, octopus comes with two channel models. A first model relies on the
definition of a customized function; a second is based on the ricean fading.
Customized packet loss function. The first model concerns the possibility of advanced users to define their own loss function pd (r). The modelling
of a loss function tells which is the probability of delivering a packet when the
possible receiver is far away from the send r meters. For instance, users can
model perfect reliability by defining pd (r) = 1 ∀r ∈ [0, rrange], where rrange
is the radio range of the specific transmission technology, e.g., 100 meters for
IEEE 802.11b/g and 10 meters for Bluetooth.
5.5. THE OCTOPUS VIRTUAL ENVIRONMENT
123
Since obstacles are present in the virtual area, we assume radio signal do
not pass through obstacles; this means that each packet sent by a node to
another is surely dropped if the couple of nodes is not in direct sight. In the
real world, a wireless device may measure its distance to the others by signal to
noise ratio (SNR) techniques: the higher is the physical distance, the higher is
the noise in communication channel and, hence, SNR. However, that gives an
approximate “communication distance” between two peers: this method does
not give the exact physical distance for other factors, such as thin obstacles
among devices or other interferences, which can cause noise incrementing.
So, communication distance dem
and real distance dem
may differ. It is too
c
r
difficult (and perhaps even impossible) to emulate physical factors affecting
communication distance. Therefore, octopus define communication distance
between two nodes a and b as follow:
½ em
dr (a, b) if nodes are in direct sight
em
dc (a, b) =
∞
if at least one obstacle divides a and b
The probability to deliver a packet is given by evaluating the user defined loss
function where input is dem
c . So, the probability to deliver to a node b a packet
sent by a node a:
pa,b = pd (dem
c (a, b)) ∈ [0, 1]
When a wants actually to send a packet to b, octopus computes pa,b . Then,
it generates a random value x ∈ [0, 1] from an uniform distribution. Finally,
octopus follows the rule “if x ≤ pa,b then deliver else drop” to decide
whether such a packet has to be delivered or dropped.
Ricean Fading. A second way octopus feature to model packet losses is
based on the ricean fading extensively used to model wireless channels. Rician
fading is a stochastic model for radio propagation anomaly caused by partial
cancellation of a radio signal by itself [104]. These anomalies are generate due
to small changes the elements in the environments where the wireless signal
has to propagate (e.g., objects changing their position, people moving in the
area, doors or windows opening or closing). In such situations, the signal
arrives at the receiver by different paths in different points in time, and such
different “versions” interfere with each other. Here we do not want to detail
more how the channel has been modeled to take Ricean Fading into account. It
is only worthy telling the reduction of the signal strength (and, consequently,
the probability of packet losses) when the distance is r is characterized by the
Ricean Distribution:
³ xν ´
x2 +ν 2
r
f (r) = 2 · exp− 2σ2 ·I0 2
σ
σ
where ν and σ are two parameters that are depending on some aspects of the
channel of interest.
124
CHAPTER 5. THE SMARTPM SYSTEM
Figure 5.16: An OCTOPUS screenshot.
MainWindow
ConfigurationWindow
FunctionWindow
GUI
OctopusServer
0..N OctopusClientThread
Server
GUI
MobileNode
Location
<<Singleton>> Octopus
GatewayManager
SimulationEnvironment
<<Library>> JPCap
FunctionManager
<<Library>> JEP
0..N
0..N
Gateway
Obstacle
0..N
VoronoiGraph
DelaunayTriangulation
DijkstraPathfinder
Environmental Manager
Figure 5.17: The OCTOPUS’ class diagram
5.5. THE OCTOPUS VIRTUAL ENVIRONMENT
5.5.3
125
The OCTOPUS Architecture
octopus is completely written in Java; in particular, it has been tested both
on Windows and on Linux. The octopus architecture relies basically on four
modules:
Environment Manager. It is the core module and the octopus’ behavior
depends on its setting. Users can setup several parameters, such as area
size, node positions, radio ranges and obstacles. It also computes the
Voronoi’s graph. This module is used by the Gateway module to get
information to learn whether a packet has to be delivered or not.
Gateway. octopus plays the role of a gateway: this module intercepts all
packets sent by nodes involved in the emulation and addresses every network issue. It decides whether to forward by taking into account distance
information from the Environment Manager. The Gateway module implements the packet dropping policy described in 5.5.2.
Server. This module implements the TCP Server, listening on the 8888 port.
Such a server is intended to receive command from applications about
events (like movements) to trigger and to reply to queries coming from
clients. For instance, a client could ask which are neighbors or which is
the distance from them. The communication protocol is a trivial textual
protocol. We have also realized a C# module masking socket accesses
behind an easy API.
GUI. In order to minimize the effort to setup initial scenario and bind virtual
nodes to the actual devices, octopus is provided with a Graphical User
Interface. It enables to perform any configuration aspect in a friendly
fashion, without having users to learn any special scripting language.
At design-time users can insert in the virtual area nodes, obstacles and
buildings by “point-and-click”, as in any drawing software. GUI allows
users to load/save scenarios and settings from/to XML files without having to setup every time scenarios from scratch. At run-time, GUI shows
the exact position of virtual nodes in the maps. Figure 5.16 depicts an
octopus screenshot: the right panel shows the virtual area, whereas the
top part is used at design-time to configure scenarios (nodes, positions,
etc.) The left panel describes the nodes mappings and other information,
allowing, also, users to change position of nodes by firing manually some
events. The gray rectangles and lines represent, respectively, obstacles
and Voronoi lines, which nodes follow during motions. If a proper option
is active (as it is in the figure), the GUI shows virtual neighbor nodes
by a blue line connecting each couple of nodes in radio-range. Another
126
CHAPTER 5. THE SMARTPM SYSTEM
option enables the GUI to design a circle centered in every node to show
the radio range.
Figure 5.17 shows the classes composing octopus and classifies them with
respect to modules described above:
Environmental Manager. Octopus class is singleton (i.e., at most one
instance may exist) and derives from the SimulationEnvironment
class. SimulationEnvironment describes the physical environment to
be emulated and manages also the mobility aspect by VoronoiGraph
class. SimulationEnvironment class contains a list of MobileNode,
Location and Obstacle instances in order to get a complete environment description. Since Delaunay triangulation is dual of Voronoi
but computationally more efficient, a DelaunayTriangulation class is
used by VoronoiGraph class. DijkstraPathFinder class is used by
VoronoiGraph to compute a path from a source point to a destination.
Gateway. The network level is managed by the GatewayManager class that
uses the JPCap library 24 in order to capture and forward LAN packets. To evaluate whether a packet has to be delivered or lost, the
GatewayManager is supported by FunctionManager that makes use of
the JEP library in order to parse a user-defined loss function.
Server & GUI. The octopus TCP/IP server is multi-threaded and implemented by the OctopusServer class. It is multi-threaded as it manages
multiple connections at the same time: each connection is handled by a
different OctopusClientThread object.
5.6
Summary
This chapter has presented the SmartPM system, i.e. a Process Management System which features automatic adaptation using execution monitoring.
SmartPM has been built on top of the IndiGolog interpreter developed by University of Toronto and the RMIT university, Melbourne. Section 5.1 has given
an overview the interpreter platform and how it can be used for specifying
IndiGolog programs. After that, the SmartPM engine has been introduced in
detail describing how processes can be concretely coded in IndiGolog (Section 5.2). Programs that describe processes are ideally composed by a part
that is mostly static and does not depend on the process and a second part
which codes the specific process. The static part codes execution monitoring
and planning; it is worthy highlighting that even monitoring and planning
24
JPCAP Web site – http://netresearch.ics.uci.edu/kfujii/jpcap/doc
5.6. SUMMARY
127
is directly representable (and, in fact, practically represented) as IndiGolog
procedure. This makes the IndiGolog programs self-contained as regards to
process execution and adaptability. The strength of this chapter is that every aspect theoretically described in the previous chapter has been concretely
implemented and tested. Finally, we have complemented SmartPM with some
external modules to enable its use for emergency management. Specifically,
we have presented a technique based on Bayesian filtering for detecting one
particular type of change in the execution environment (Section 5.4.2): disconnection of devices of rescue operators. We have also provided SmartPM with
a network protocol discussing conceptual and technical aspects of the network
traffic (Section 5.3). Finally, this chapter describes octopus, an manet emulator that we have used for testing the disconnection sensor (Section 5.5).
Nevertheless, octopus can be useful for experimentation a variety of application domains, i.e. all of those domains in which people want to test the
concrete implementation of algorithms for manet and check for the practical
feasibility.
128
CHAPTER 5. THE SMARTPM SYSTEM
Chapter 6
Adaptation of Independent
and Concurrent Process
Branches
This chapter aims at improving upon what is described in Chapter 4. Indeed,
we propose a novel adaptation technique that is more efficient, being able to
exploit concurrent branches.
In the framework described in Chapter 4, whenever a process δ needs to
be adapted, it is blocked and a recovery program consisting of a sequence of
actions h = [a1 , a2 , . . . , an ] is placed before them, so as to obtain a new process
δ 0 = [h, δ]. The original process may consist of different concurrently running
branches δ = δa k δb k . . . k . . . δn , and in case of adaptations all of them are
temporarily interrupted. Thus, all branches can only resume the execution
after the whole recovery sequence has been executed. Although one knows
what branches cannot progress, the framework in Chapter 4 cannot adapt such
branches individually. Indeed, it is not known whether the different branches
act upon the same variables/conditions. And adapting branches one by one
could change some variables/conditions and, hence, “break” other branches,
which would be unable to progress.
Therefore, we refine here that approach by automatically identifying
whether concurrent branches are independent (i.e., neither working on the
same variables nor affecting the same conditions). If independent, we can
automatically synthesize a recovery process h for δ such that it affects only
the interested branch (say δa ), without having to block the other branches:
δ 0 = [h; δa ] k δb k . . . k . . . δn
In order to apply the proposed technique, some additional effort is required
by process designers with respect to the technique of Chapter 4. Indeed, the
technique is made possible by annotating processes in a “declarative” way.
129
130
CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES
We assume that the process designer can annotate actions/sequences with
the goals they are intended to achieve. On the basis of such declared goals,
independence among branches can be verified. And, later, a recovery process
which affects only the branch of interest, without side-effects on the others,
can be synthesized.
The framework described here is an extension of paper [33]. Section 6.1
gives an overall idea of the adaptation approach, pointing out the general
framework. Section 6.2 presents the sound and complete technique for adapting “broken” processes. Section 6.3 outlines an example stemming from emergency management scenarios, showing the use of the proposed technique.
On the contrary of the approach of Chapter 4, we have not yet developed
a prototype that exploits the technique proposed here.
6.1
General Framework
The general framework which we introduce here is derived from that of
Chapter 4. Like the previous, this framework considers processes as IndiGolog programs and conditions are expressed in SitCalc. The actions
that compose processes are of four types: Assign(a, x), Start(a, x, p),
AckT askCompletion(a, x) and Realise(a, x). Services can execute two actions, readyT oStart(a, x) and f inishedT ask(a, x, q). Parameters a,x, p and
q identify, respectively, services, tasks, inputs and outputs. The actions performed by both PMS and services work and are interleaved in the same way
as described in Section 4.3. The only difference here is that all assignments
of tasks belonging to parallel branches must be done before entering in the
branches themselves. Releases can happen only after completing all parallel
branches. We explain later the reason of these constraints.
These actions are acting on some domain-independent fluents, specifically
f ree(·) and enable(·), whose definitions are exactly the same as in Equations 4.2 and 4.5. Like in the approach of Chapter 4, there exist other fluents
that denote significant domain properties, whose values are modified by service
actions f inishedT ask(·).
Similarly, the PMS advances the process δ in the situation s by executing
an action, resulting in a new situation s0 with the process δ 0 remaining to be
executed. The state is represented as fluents that are defined on situations.
The process execution is continuously monitored to detect any deviation
between physical reality and virtual reality. The PMS collects data from the
environment through sensors (here sensor is any software or hardware component enabling to retrieve contextual information). If a deviation is sensed
between the virtual reality as represented by s0 and the physical reality as
s0e , the PMS internally generates a discrepancy ~e = (e1 , e2 , . . . , en ), which is a
6.2. THE ADAPTATION TECHNIQUE
131
sequence of actions called exogenous events such that s0e = do(~e, s0 ).
Let us consider the case in which the remaining program-process to be
executed δ is composed by n parallel sub-processes running concurrently: δ =
[~
p1 k . . . k p~n ] where every sub-process p~0 i = [a1,i , . . . , am,i ] is a sequence of
simple actions.1
The process designers are assumed to have associated every sub-process pi
with the goal Gi that pi is meant to achieve before the process enactment. In
addition, the concurrent sub-processes are also annotated with an invariant
condition C, expressed in the SitCalc.2 . Independence of these sub-processes
is maintained assuming this condition C, which must hold in every situation.
Checking for independence is a key point of the adaptation technique proposed
in this work (see next section).
When a divergence is sensed between the virtual and physical reality because of exogenous events, one or more concurrent processes can be broken
(i.e, they no longer achieve the associated goals). For each broken branch pi ,
the recovery procedure generates a handler hi , which is an action sequence
that, when placed before pi , allows p0i = (hi ; pi ) to reach goal Gi and, while
remaining independent of every parallel branch pj (with j 6= i) with respect
to invariant C.
6.2
The adaptation technique
This section describes the approach we use to adapt a process composed of
concurrent sequential sub-processes. We first give the formal foundations of
our adaptation technique, presenting the results that the “monitor and repair”
cycle relies upon. Then, we describe the “monitor and repair” cycle, and
discuss the conditions under which the technique is sound and complete.
6.2.1
Formalization
In order to capture formally the concept of independence among processes, we
introduce some preliminary definitions.
Definition 6.1. A ground action a preserves the achievement of goal G by a
sequence of ground actions [a1 , . . . , an ] under condition C if executing a at any
point during [a1 , . . . , an ] does not affect any of the conditions that are required
for the goal G to be achieved by [a1 , . . . , an ]. Moreover, executing a preserves
1
If this assumption does not hold, the approach in Chapter 4 is still usable and we do
not propose any improvement.
2
Goals Gi and invariant conditions C are given as arbitrary SitCalc formulas that take a
situation as last argument
132
CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES
C in any situation. Formally:
def
PreserveAch(a, G, [a1 , . . . , am ], C) = ∀s.C(s) ⇒
C(do(a, s)) ∧
(G(do([a1 , . . . , am ], s)) ⇒ G(do([a, a1 , . . . , am ], s))) ∧
(G(do([a2 , . . . , am ], s)) ⇒ G(do([a, a2 , . . . , am ], s))) ∧
...
(G(do(am , s)) ⇒ G(do([a, am ], s))) ∧
(G(s) ⇒ G(do(a, s))).
We then extend the notion above to the case of action sequences:
Definition 6.2. A sequence of ground actions [a1 , . . . , am ] preserves the
achievement of goal G by a sequence of ground actions p~ under condition C
if every action in [a1 , . . . , am ] preserves the achievement of goal G by p~ under
condition C. Formally:
def
PreserveAch([a
~, C) =
1 , . . . , am ], G, p
V
~, C).
i:1≤i≤n PreserveAch(ai , G, p
Given this, we can then define a notion of independence of processes.
Definition 6.3. A set of (sequential) processes p~1 , . . . , p~n where each p~i
achieves goal Gi are independent with respect to goals G1 to Gn under condition C if for all i and all j 6= i, p~j preserves the achievement of goal Gi by p~i
under condition C. Formally:
def
IndepProcess([~
V
Vp1 , . . . , p~n ], [G1 , . . . .Gn ], C) =
pj , Gi , p~i , C).
i:1≤i≤n j:1≤j≤n∧j6=i PreserveAch(~
Basically, Definition 6.3 looks at the independence of each and every pair
of concurrent (sub-)processes. If we assume that every process is composed by
m actions, checking this independence is polynomial in the number of actions
and concurrent processes. Specifically it requires
µ ¶
n
× m2 = O(m2 × n2 )
(6.1)
2
checks of PreserveAch(·) as in Definition 6.1 (one for each pair of actions in
the concurrent processes).
Firstly, we show that if the concurrent sub-processes are independent and
some of them progress, then the parts of them that remain to be executed will
always remain independent:
6.2. THE ADAPTATION TECHNIQUE
133
Theorem 6.1. For each i ≤ n and for all suffixes p~0 i of p~i
D |= IndepProcess([~
p1 , . . . , p~n ], [G1 , . . . .Gn ], C) ⇒
IndepProcess([p~0 1 , . . . , p~0 n ], [G1 , . . . .Gn ], C).
Proof. By Definition 6.3 IndepProcess([~
p1 , . . . , p~n ], [G1 , . . . .Gn ], C) holds iff for
all i ∈ [1, n] and all j ∈ [1, n] \ {i}:
PreserveAch(~
pj , Gi , p~i , C)
Let us fix an arbitrary value for i ∈ [1, n] and j ∈ [1, n] \ i. Let p~j =
[a1,j , a2,j , . . . , am,j ]. By Definition 6.2
^
PreserveAch(~
pj , Gi , p~i , C) ⇔
PreserveAch(ak,j , Gi , p~i , C)
(6.2)
k:1≤k≤m
Let us fix p~0j = [at,j , a2,j , . . . , am,j ] an arbitrary t-long suffix of p~j .
By Equation 6.2:
^
PreserveAch(ak,j , Gi , p~i , C)
k:t≤k≤m
and, consequently:
PreserveAch(p~0j , Gi , p~i , C).
If PreserveAch(~
pi , Gi , p~j , C) holds, then PreserveAch(~
pi , Gj , p~0j , C) does, as
well. After fixing an arbitrary suffix of p~i , namely p~0 , and repeating the steps
above, the following holds:
i
PreserveAch(p~0j , Gi , p~0i , C).
Values i and j have been given arbitrarily, as well as the process suffixes.
Therefore, for all suffixes p~0i of each p~i and all suffixes p~0j of each p~j s.t. pj 6= pi ,
the following holds:
PreserveAch(p~0j , Gi , p~0i , C).
Hence, the thesis is proven.
Next, we show that if n processes p~1 , . . . , p~n achieve their respective goals
G1 , . . . , Gn and are independent according to Definition 6.3 with respect to a
certain condition C, then any interleaving of the execution of the processes’
actions will achieve each process’s goal, and condition C will continue to hold.
Let D the current domain theory, i.e. the set of all fluents, all actions acting
on those fluents as well as all fluent pre-conditions. Then:
134
CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES
Theorem 6.2.
D |= IndepProcess([~
p1 , . . . , p~n ], [G1 , . . . .Gn ], C) ⇒
∀s.G1 (do([~
p1 ], s)) ∧ . . . ∧ Gn (do([~
pn ], s)) ∧ C(s) ⇒
0
0
[∀s .Do([~
p1 k . . . k p~n ], s, s ) ⇒
G1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )].
Proof. By induction on the total length of all processes. Let |pi | be the length
of process pi , i.e. the number of actions of p~i . If all processes are empty
sequences, then the result trivially follows (base case). Assume the result
holds if the total length of all processes is k. We will show that it must also
hold for k + 1 (induction step).
Assume
C(s),
G1 (do([~
p1 ], s)) ∧ . . . P∧ Gn (do([~
pn ], s))
and
n
0
Do([~
p1 k . . . k p~n ], s, s ). Processes are such that i=1 |pi | = k + 1.
Let us assume branch p~i = [a1,i , a2,i , . . . , am,i ] evolves by executing action
a1,i . Let p~0 i = [a2,i , . . . , am,i ] be what is left of p~i after executing a1,i and let
s1 = do(a1,i , s).
By applying Theorem 6.1:
IndepProcess([~
p1 , . . . , p~n ], [G1 , . . . .Gn ], C) ⇒
IndepProcess([~
p1 , . . . , p~i−1 , p~0 i , p~i+1 , . . . , p~n ], [G1 , . . . .Gn ], C).
Since IndepProcess([~
p1 , . . . , p~n ], [G1 , . . . .Gn ], C), then for all j 6= i
PreserveAch(~
pi , Gj , p~j , C)
and in particular
PreserveAch(a1,i , Gj , p~j , C).
Therefore by Definition 6.1:
G(do([~
p1 ], s1 )) ∧ . . . ∧ G(do([~
pi−1 ], s1 )) ∧ G(do([p~0 i ], s1 ))
∧ G(do([~
pi+1 ], s1 )) ∧ . . . ∧ G(do([~
pn ], s1 )) ∧ C(s1 )
In order to complete the proof, it is now to prove that
IndepProcess([~
p1 , . . . , p~i−1 , p~0 i , p~i+1 , . . . , p~n ], [G1 , . . . .Gn ], C)∧
G(do([~
p1 ], s1 )) ∧ . . . ∧ G(do([~
pi−1 ], s1 )) ∧ G(do([p~0 i ], s1 ))
∧ G(do([~
pi+1 ], s1 )) ∧ . . . ∧ G(do([~
pn ], s1 )) ∧ C(s1 ) ⇒
0
0
~
[∀s .Do([~
p1 k . . . k p~i−1 k p i k p~i+1 k . . . k p~n ], s1 , s0 ) ⇒
0
G1 (s ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )].
And that holds by the induction hypothesis as
|~
p1 | + . . . + |~
pi−1 | + |p~0 i | + |~
pi+1 | + . . . + |~
pn | = k.
6.2. THE ADAPTATION TECHNIQUE
135
Next, building on the previous results, we show that such an “independent
handler” ~h can be used for handling a discrepancy ~e breaking a process pi ,
while allowing all other processes to execute concurrently and achieve their
respective goals:
Theorem 6.3. Let p~0 i be the process broken by a discrepancy ~e.
D |= ∀s̃, ~e.IndepProcess([p~0 1 , . . . , p~0 i−1 , [~h; p~0 i ], p~0 i+1 , . . . , p~0 n ],
[G1 , . . . , Gi−1 , Gi , Gi+1 , . . . , Gn ], C) ∧
G1 (do([p~0 1 ], do(~e, s̃))) ∧ . . . ∧ Gi−1 (do([p~0 i−1 ], do(~e, s̃))) ∧
Gi+1 (do([p~0 i+1 ], do(~e, s̃))) ∧ . . . ∧ Gn (do([p~0 n ], do(~e, s̃))) ∧
Gi (do([~h, p~0 i ], do(~e, s̃))) ∧ C(do(~e, s̃)) ⇒
[∀s0 .Do([p~0 k . . . k p~0
k [~h, p~0 ] k p~0
k . . . k p~0 ],
1
i−1
i
i+1
do(~e, s̃), s0 ) ⇒ G1 (s0 ) ∧ . . . ∧ Gi−1 (s0 ) ∧ Gi (s0 )
∧ Gi+1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )].
n
Proof. That derives trivially from applying Theorem 6.2 where:
def
• p~i = [~h, p~0i ];
def
• pk = p~0k ∀k : 1 ≤ k ≤ n ∧ k 6= i;
def
• s = do(~e, s̃).
Finally, we show that adding an action sequence ~h for handling a discrepancy ~e that breaks a process pi will preserve process independence. That
holds provided ~h is built as independent of every sub-process different from
p~i with respect to condition C. Let R(Gi , do(~
pi )) be the situation-suppressed
expression for regression Rs (Gi (do(~
pi , s)).
Theorem 6.4. Let p~i be the process broken by a discrepancy ~e.
¡
D |= IndepProcess [~
p1 , . . . , p~i−1 , p~i , p~i+1¢, . . . , p~n ],
[G , . . . , Gi−1 , Gi , Gi+1 , . . . , Gn ], C ∧
¡
V1
~
~0j , C) ¢
j:1≤j≤n∧j6=i PreserveAch(h, Gj , p
s
0
∧PreserveAch(~
pj , R (Gi (do(~
pi , s)), h, C ⇒
IndepProcess([~
p1 , . . . , p~i−1 , [~h; p~i ], p~i+1 , . . . , p~n ],
[G1 , . . . , Gi−1 , Gi , Gi+1 , . . . , Gn ], C).
136
CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES
Proof. Let us denote ~h = [h1 , . . . , hm ] and p~i = [a1,i , . . . , al,i ].
Let us fix an arbitrary process p~j 6= p~i without loss of generality. By
hypothesis
^
PreserveAch(hk , Gj , p~j , C)
(6.3)
k:1≤k≤m
as well as
^
PreserveAch(ak,i , Gj , p~j , C)
(6.4)
k:1≤k≤l
From Equations 6.3 and 6.4, it results by Definition 6.2:
PreserveAch([~h; p~i ], Gj , pj , C)
(6.5)
Let se be the situation after the occurrence of a discrepancy ~e. Handler ~h is
built such that:
¡
¢
∃s̃.Rs̃ Gi (do(~
pi , s̃)) ∧ do(h, se ) = s̃ ∧ Gi (do(~
pi , s̃))
In the light of this:
PreserveAch(p~j , R(Gi (do(~
pi ))), ~h, C) ⇒ PreserveAch(p~j , Gi , ~h, C)
Therefore,
since by the hypothesis of
PreserveAch(p~j , Gi , p~i , C), the following holds:
processes
PreserveAch(p~j , Gi , [~h; p~i ], C)
independence
(6.6)
Since Equations 6.5 and 6.6 are true for every process pj and by the hypothesis of independence of all processes, the thesis holds.
6.2.2
Monitoring-Repairing Technique
On the basis of the results in the previous section, we propose in Figure 6.1
an algorithm for adaptation. This algorithm, which is meant to run inside
the PMS, relies on 2 arrays giving information about the status of the n
processes concurrently running: whether each is completed or not and, in case
of completion, whether successfully or unsuccessfully. Initially every element
of both arrays is set to false.
Routine monitor relies on every process pi sending a message to the PMS
when it either terminates successfully (message successf ullycompleted(i)) or
an exception is sensed such that such pi can no longer terminate successfully3
3
That is D 2 Rsnow (Gi (do(pi , snow ))) where snow is the current situation after sensing a
discrepancy.
6.2. THE ADAPTATION TECHNIQUE
completed[]: array of n elements
succeeded[]: array of n elements
initially()
1 for i ← 1 to n
2 do completed[i] ← false
3
succeeded[i] ← false
SUB monitor([p1 , . . . , pn ], [G1 , . . . , GN ], C, si )
1 if (¬IndepP rocess([p1 , . . . , pn ], [G1 , . . . , GN ], C))
2
then throw exception
3 while (∃i.¬completed[i])
4 do m ← waitForMessage()
5
if m = successf ullyCompleted(i)
6
then completed[i] ← true
7
succeeded[i] ← true
8
if m = exception(ie , se )
9
then h ← buildHandler(ie , se , [p1 , . . . , pn ], [G1 , . . . , Gn ], C)
10
if h = fail
11
then completed[ie ] ← true
12
throw exception
13
else
14
pie ← [h; pie ]
15
start(pie )
FUNCTION buildHandler(ie , se , [p1 , . . . , pn ], [G1 , . . . , Gn ], C)
1 for i ← 1 to n
2 do pi ← remains(pi , se )
3 h ← planByRegres(R(Gie , do(~
pie ))),
4
se , [p1 , . . . , pn ]/pie , [G1 , . . . , Gn ]/Gie , C)
FUNCTION planByRegres(Goalh , si , [p1 , . . . , pn−1 ], [G1 , . . . , Gn−1 ], C)
1 if D |= Goalh (si )
2
then return nil
3
else a ← chooseAction(Goalh , si , [p1 , . . . , pn−1 ],
4
[G1 , . . . , Gn−1 ], C)
5
if a = nil
6
then
7
return fail
8
else
9
h0 ← planByRegres(Rs (Goalh (do(a, s))),
10
si , [p1 , . . . , pn−1 ], [G1 , . . . , Gn−1 ], C)
11
if h = fail
12
then return fail
13
else return [h0 ; a]
FUNCTION chooseAction(Goalh , si , [p1 , . . . , pn−1 ], [G1 , . . . , GN ], C)
1 choose an action a s.t. {∃s(¬Goalh (s) ∧ Goalh (do(a, s)))}∧
2
∀i.1 ≤ i ≤ (n − 1) ⇒ P reserveAch(a, Gi , pi , C)
3
∧ P reserveAch(pi , Goalh , a, C)
4 return a //even nil if there exists no selectable action
Figure 6.1: Pseudo-Code for the adaptation technique.
137
138
CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES
(message exception(ie , se ) where ie is the “broken” process and se is the resulting situation after the discrepancy occurrence). We assume that the situation
representing the current state in the real word is known and that we have complete knowledge of the fluents in that situation. Moreover we assume that in
every situation we can get access to the fluent values in every past situation.4
Finally, we assume, as well, that every process pi is consuming and reducing
its size with the execution of tasks. Process pi denotes always the part of
process that still needs to be executed: the parts already execute are cut out.
The routine is applicable if all processes are independent of each other.
Therefore, before starting its monitoring and repairing, it checks whether the
process independence assumption holds (lines 1,2). If not, it throws an exception, assuming that in this case an alternative and more intrusive approach
would be used [40].
Later on, in the “monitor and repair” cycle, we listen for arriving messages
(line 4). If the message concerns the successful completion of a sub-process,
then the arrays are updated accordingly (lines 5-7). Otherwise, the message is
about a sub-process pie that has been broken by a discrepancy. pie is implicitly halted and we call function buildHandler to search for an adaptation
handler h. If such a handler h is found, it is prefixed to the broken process
pie , which becomes (h; pie ) (line 14). Finally, the adapted process is started
again (line 15).
How does the buildHandler function synthesize this handler? Lines 1-2
update all processes pi so that they represent the subparts that remain to
execute. Then, the function invokes a regression planner (line 3) [111, 52],
which searches for a plan backwards from the goal description. Specifically,
the regression planner tries to generate a sequential plan that, starting from
the current situation se , arrives at some situation sh such that pie can be
executed again and achieve Gie , i.e. Rsh (Gi (do(pie , sh ))).
The regression planning procedure planByRegression recursively and
incrementally builds a plan5 checking that every selected action is independent of each pj (with j 6= ie ) with respect to invariant condition C. Indeed,
Theorems 6.4 and 6.3 ensure that if the handler only includes actions that
are independent of each pj (with j 6= ie ), then, for all possible interleavings,
process (h; pie ) will achieve its goal Gi and every other process pj with j 6= ie
will continue to achieve its goals Gj .
Observe that Theorem 6.1 ensures if processes were originally independent
regarding their respective goals and no exceptions are raised, as they evolve,
they remain independent.
The technique proposed in this chapter is proved to be sound and complete
4
5
This could be done by logging and storing them in a repository.
Here we assume that plans are returned in form of IndiGolog programs.
6.2. THE ADAPTATION TECHNIQUE
139
if exactly one branch is broken and needs to be recovered.
Conversely, if a discrepancy breaks many processes, say k, we assume to
repair them one by one till the k-th, i.e. the i-th is repaired when the first i − 1
have been already repaired. Of course, this approach is greedy: we repair the
i-th without considering the next k − i branches to be recovered; in addition,
the sequence of repairing process is arbitrary. For instance, there might be
different choices to repair a given i-th branch. Some of them would make
impossible to repair one the next k − i branch, whereas other choices would
not. Since the technique does not take into account the next branches, all of
choices would be equivalent, and, hence, one might be done such that some
branches are not repairable anymore. Or, even, choosing a certain repairing
sequence allows to repair, whereas others do not. That is why the technique
is only sound for multiple breaks: it could not find a repairing plan, even if it
does exist.
More formally, let ~e be a discrepancy which breaks k processes, namely
p1 , p2 , . . . , pk , and let s̄ be the situation before the discrepancy has occurred.
Then:
Theorem 6.5 (Soundness). If the algorithm in Figure 6.1 produces handlers:
~h1 = buildHandler(1, do(~e, s̄), [~
p1 , p~2 . . . , p~n ], [G1 , . . . , Gn ], C)
~h2 = buildHandler(2, do(~e, s̄), [(~h1 ; p~1 ), p~2 . . . , p~n ], [G1 , . . . , Gn ], C)
...
~hk = buildHandler(k, do(~e, s̄), [(~h1 ; p~1 ), (~h2 ; p~2 ), . . . , (~hk−1 , p~k−1 ), p~k , . . . , p~n ],
[G1 , . . . , Gn ], C)
and
then
and
IndepProcess([~
p1 , . . . , p~n ], [G1 , . . . , Gn ], C)∧
G1 (do(~
p1 , s̄)) ∧ . . . ∧ Gn (do(~
pn , s̄))
∀s0 .Do([(~h1 ; p~1 ) k . . . k (~hk ; p~k ) k p~k+1 k p~n ],
do(~e, s̄), s0 ) ⇒ G1 (s0 ) ∧ . . . ∧ Gi−1 (s0 ) ∧ Gi (s0 )
∧Gi+1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )]
IndepProcess([(~h1 ; p~1 ), (~h2 ; p~2 ), . . . , (~hk ; p~k ),
p~k+1 , . . . , p~n ]), [G1 , . . . , Gk , Gk+1 , . . . , Gn ], C).
Proof. The adaptation algorithm will repair firstly process p~1 since we are
assuming that it is enqueued as first. Let h1 be the handler produced by
routine planByRegres to repair it. Handler ~h1 is built such that
V
~
~i )
i:2≤i≤n PreserveAch(h1 , Gi , p
∧PreserveAch(~
pi , R(G1 , p~1 )), ~h1 )
140
CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES
Since for hypothesis there holds IndepProcess([~
p1 , . . . , p~n ], [G1 , . . . , Gn ], C), by
Theorem 6.4 that follows:
IndepProcess([(~h1 ; p~1 ), p~2 , . . . , p~n ]), [G1 , . . . , Gn ], C)
(6.7)
Let ~h2 be the handler produced by routine planByRegres to repair p~2 .
Handler ~h2 is built such that
PreserveAch(~h2 , G1 , (~h1 ; p~1 ))∧
PreserveAch((~h1 ; p~1 ), R(G2 , do(~
p2 )), ~h2 )
V
~
~i )
i:3≤i≤n PreserveAch(h2 , Gi , p
∧PreserveAch(~
pi , R(G2 , do(~
p2 )), ~h2 )
From Equation 6.7 and by Theorem 6.4, that follows:
IndepProcess([(~h1 ; p~1 ), (~h2 ; p~2 ), . . . , p~n ]), [G1 , . . . , Gn ], C).
Therefore, after repairing p~k , we obtain:
IndepProcess([(~h1 ; p~1 ), (~h2 ; p~2 ), . . . , (~hk ; p~k ),
p~k+1 , . . . , p~n ]), [G1 , . . . , Gk , Gk+1 , . . . , Gn ], C).
(6.8)
For all i ∈ [1, k], hi has been built such
¡ as do(hi , do(~e, s))¢ takes to any situation
s̃ where Rs̃ (G(pi , s̃)) and, hence, G do([~h1 ; p~1 ], do(~e, s)) . From this result and
from Equation 6.8, Theorem 6.2 proves:
∀s0 .Do([(~h1 ; p~1 ) k . . . k (~hk ; p~k ) k p~k+1 k p~n ],
do(~e, s̄), s0 ) ⇒ G1 (s0 ) ∧ . . . ∧ Gi−1 (s0 ) ∧ Gi (s0 )
∧Gi+1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )]
Theorem 6.6 (Completeness). If
¡
∃~h. ∀s0 .Do([~
p1 k . . . k p~i−1 k (~h; p~i ) k p~i+1 k . .¢. k p~n ],
0
do(~e, s̄), s ) ⇒ G1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )
and
IndepProcess([p1 , . . . , pi−1 (~h; pi ), , pi+1 , . . . , pn ],
[G1 , . . . , Gn ], C)
then buildHandler returns a repairing handler:
~h = buildHandler(i, do(~e, s̄), [p1 , . . . , pn ],
[G1 , . . . , Gn ], C)
6.2. THE ADAPTATION TECHNIQUE
141
such that
∀s0 .Do([~
p1 k . . . k p~i−1 k (~h; p~i ) k p~i+1 k . . . k p~n ],
do(~e, s̄), s0 ) ⇒ G1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )
(6.9)
and
indepP rocesses([p1 , . . . , (h; pi ), pn ], [G1 , . . . , Gi , . . . , Gn ], C)
(6.10)
Proof. Let us assume p~i be the process broken by a discrepancy ~e. Let se =
do(~e, s) be the resulting situation.
If there exist handlers meeting the requirements of the hypothesis, the
invocation of procedure buildHandler returns one of them, namely ~h:
~h = buildHandler(i, do(~e, s̄), [p1 , . . . , pn ],
[G1 , . . . , Gn ], C)
if and only if planByRegress returns ~h:
~h = planByRegres(R(Gi , do(~
pi ))),
se , [p1 , . . . , pn ]/pi , [G1 , . . . , Gn ]/Gi , C)
Therefore, we move to prove that if any handler ~h exists such that it achieves
any goal Goalh and hypotheses 6.9 and 6.10 hold, then it can be returned by
planByRegres. We prove by induction on the length of ~h.
Let us assume that there may be several plans h~1 , . . . , h~n . Let fix an
arbitrary one, namely ~h, whose length is k:
~h = [h1 , . . . , hk ]
Let denote Goalh = R(Gi , do(~
pi )).
If k = 0 (base step), then Goalh already holds, and, hence, planByRegres returns nil (line 2 of the procedure).
Otherwise, let us assume by induction hypothesis that the plan
h~0 = [h1 , . . . , hk−1 ], being (k − 1)-long, can be returned by invoking:
~h = planByRegres(Rs (Goalh (do(a, s))),
si , [p1 , . . . , pn−1 ], [G1 , . . . , Gn−1 ], C)
In fact, since do(h~0 , se ) = se and Goalh (do(a, se)) holds, regression
Rs (Goalh (do(a, s))) is true in se.
142
CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES
In order that h~0 could be the returned handler [h0 , a], it is to prove a can
be coinciding with hk . Please note that, given that in situation se = do(h~0 , se )
regression Rs (Goalh (do(a, s))) holds, the following must also hold:
¬Goalh (e
s)
(6.11)
In addition, since handler ~h is such that Goalh (do(~h, se )) is true, hk has to be
the action turning formula Goalh (·) to true:
Goalh (do(hk , se))
(6.12)
Moreover, since h is one of the handler meeting the hypothesis:
IndepProcess([p1 , . . . , pi−1~h, pi+1 , . . . , pn ], [G1 , . . . , Gn ], C)
then
∀j. (j 6= i) ⇒ P reserveAch(hk , Gj , pj , C)∧
P reserveAch(pj , Goalh , hk , C)
(6.13)
Therefore, action hk meets the constraints in Equations 6.11, 6.12 and 6.13,
according to which procedure chooseAction picks an action a. In the light
of this, hk is, in fact, an action returnable by such a procedure.
Note that the above procedure becomes easily realizable in practice if the
PMS works in a finite domain (e.g., using discretized positions based on actual
GPS positions) and propositional logic is sufficient. In fact, one can use an
off-the-shelf regression planner such those mentioned in [111, 52].
6.3
An Example from Emergency Management
In this section, we discuss an example of adaptation in a process concerning emergency management. A team is sent to the affected area. Actors are
equipped with PDAs which are used to carry on process tasks and communicate with each other through a Mobile Ad hoc Network (manet).
A possible process for coping with the aftermath of an earthquake is depicted in Figure 6.2. Some actors are assessing the area for dangerous partiallycollapsed buildings. Meanwhile others are giving first aid to the injured people
and sending information about required ambulances and filling in a questionnaire about the injured people, which are required by the headquarter. The
corresponding IndiGolog program is depicted in .
In the activity diagram in Figure 6.2, we have labeled every task with
the fluents (in situation-suppressed form) that hold after the successful task
execution.
6.3. AN EXAMPLE FROM EMERGENCY MANAGEMENT
143
It is worthy noting that, as already told in Section 6.1 all assignments in
the IndiGolog program in Figure 6.3 are made together before the parallel
branches are started. The reason steps from the fact that we are willing to
make all branches independent. Indeed, if the assignments were done inside
the branches themselves, then the assignment made in a branch might be
depending on other assignment made in some parallel branches.
For the sake of simplicity, we detail only those fluents that we are used
later in the example (see later in this section for the successor state axioms):
proxy(w, y, s) is true if in situation s, service y can work as proxy for service
w. In the starting situation S0 for every pair of services w, y we have
proxy(w, y, S0 ) = false, denoting that no proxy has yet been chosen
for w.
at(w, p, s) is true if service w is located at coordinate p = hpx , py , pz i in
situation s. In the starting situation S0 , for each service wi , we have
at(wi , pi , S0 ) where location pi is obtained through GPS sensors.
inf oSent(d, s) is true in situation s if the information concerning injured people at destination d has been successfully forwarded to the headquarter.
For all destinations d inf oSent(d, S0 ) = false.
evaluationOK(s) is true if the photos taken are judged as having a good
quality, with evaluationOK(S0 ) = false.
assisted(z, s) is true if the injured people in area z have been supported through a first-aid medical assistance. We have that for all z
assisted(z, S0 ) = false.
We assume that the process designers have defined the following goals of
the three concurrent sub-processes (as required by the framework):
def
G1 (s) = Q1Compiled(A, s) ∧ EvaluationOK(s)
def
G2 (s) = assisted(A, s)
def
G3 (s) = Q2Compiled(A, s) ∧ inf oSent(A, s)
In addition, we are using in this example the invariant condition C(s) = true
for all situations s, meaning that we are not using any assumption to show
process independence.
Before formally specifying the aforementioned fluents, we define some abbreviations:
available(w, s): which states a service w is available if it is connected to the
coordinator device (denoted by Coord) and is free.
144
CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES
connected(w, z, s): which is true if in situation s the services w and z are
connected through possibly multi-hop paths.
neigh(w, z, s): which holds if the services w and z are in radio-range in the
situation s.
Their definitions coincide with those given in the example of Chapter 4 and,
hence, we do not duplicate them here.
The successor state axioms for the aforementioned fluents are as follows:
at(w,
¡ loc, do(t, s)) ⇔ 0
¢
at(w,
loc, s) ∧ ∀loc .t 6= f inishedT ask(w, Go, loc0 )
¡
¢
∨ ¬at(w, loc, s) ∧ t = f inishedT ask(w, Go, loc) ∧ started(a, Go, loc, s)
¡
proxy(w, y, do(x, s)) ⇔ proxy(w, y, s)∧ ¢
¡∀q.x 6= f inishedT ask(w, FindProxy, q)
∨ ∃w.x = f inishedT ask(w, FindProxy,
∅)∧
¢
∃p.started(w, FindProxy, p, s ∧ isClosestAvail(w, y, s))
inf oSent(loc,
do(t, s)) ⇔ inf oSent(loc, s)
¡
∨ ∃w.t = f inishedT ask(w, SendToHeadquarter, hloc, OKi)
∧at(w, d, s) ∧ ∃y.(proxy(w, y, z) ∧ neigh(w,
¢ y, s))
∧∃p.started(a, SendToHeadquarter, p, s)
evaluationOK(do(t,
s)) ⇔ evaluationOK(s)
¡
∨ ∃w.t = f inishedT ask(w, Evaluate,¢OK) ∧ ∃p.started(a, Evaluate, p, s)
∧photoT aken(s) ∧ n ≥ threshold
assisted(z,
do(x, s)) ⇔ assisted(z, s)
¡
∨ ¬assisted(z, s) ∧ ∃p.started(a, assisted, p, s)∧ ¢
∃w.x = f inishedT ask(w, assisted, ∅) ∧ at(w, z, s)
In the above, we use the abbreviation isClosestAvail(w, y, s), which holds if
y is the geographically closest service to w that is able to act as proxy; if there
is no available proxy in w’s radio range, y = nil:
def ¡
isClosestAvail(w, y, s) = available(y) ∧ at(w, pw , s)∧
at(y,¡py , s) ∧ provide(y, proxy) ∧ neigh(w, y, s)
∧∀z. z 6= y ∧ available(z) ∧¢¢provide(z, proxy) ⇒
¡k pz − pw k>k¡py − pw k
∨ y = nil ∧ ¬∃z. ¢¢
available(z) ∧ provide(z, proxy)
∧neigh(w, z, s)
6.4. A SUMMARY
145
Automatic Adaptation: an example. To show how our automatic adaptation technique is meant to work, let us consider an example of discrepancy
and a handler plan to cope with it.
Firstly, let us consider the case where the process execution reaches line 6
of procedure ReportAssistanceInjured. At this stage, a proxy has been
found, namely wpr , and the information about injured people is about to be
sent to the headquarter to request a sufficient number of ambulances to take
them to a hospital. Let s̄ be the current situation. Of course, wpr is selected
to be in radio range of actor w4 : neigh(w4 , wpr , s̄).
Let’s assume now that wpr moves away for any reason to a position p0 .
This corresponds to a discrepancy
~e = [
Assign(wpr , Go);
Start(wpr , Go, p0 );
AckT askCompletion(wpr , Go);
Release(wpr , Go); ]
After the internal execution of the discrepancy, the new current situation is se = do(~e, s̄) where neigh(w4 , wpr , s¯e ) = false. Therefore, action
f inishedT ask(wpr , InformInjured()) does not make inf oSent(A) become
true as it was supposed to.
Since sub-processes EvalTake, AssistInjured and ReportAssistanceInjured are independent, the latter, which is affected by the discrepancy,
can be repaired without having to stop the other processes.
The
goal
given
to
the
regression
planner
is
Goalh = photoT aken() ∧ inf oSent(A) ∧ Q1Compiled(A)
in
situationsuppressed form. The planner returns possibly the following plan, which
achieves Goalh while preserving independence:
Start(wpr , Go, A);
AckT askCompletion(wpr , Go);
Start(w4 , InformInjured, A);
AckT askCompletion(w4 , InformInjured);
Adaptation can be performed by inserting it after line 5 of procedure
ReportAssistanceInjured, ensuring that it can achieve its goal without
interfering with the other sub-processes.
6.4
A summary
In this chapter we have proposed a sound and complete technique for adapting
sequential processes running concurrently. Such a technique improves, under
146
CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES
the assumption of independence of the different processes, that proposed in
Chapter 4, while adopting the same general framework based on planning
techniques in AI.
In the previous approach, whenever a process needs to be adapted, the
different concurrently running branches are all interrupted. And a sequence
of actions h = [a1 , a2 , . . . , an ] is placed before them. Therefore, all of the
branches could only resume after the execution of the whole sequence. The
adaption technique proposed here works on identifying whether concurrent
branches are independent (i.e., neither working on the same variables nor
affecting some conditions). And, if independent, it can synthesize a recovery
process that affects only the branch of interest, without having to block the
other branches. Concurrency is a key characteristic of business processes,
such that the independence of different branches is likely to yield benefits in
practice.
The proposed technique is made possible by annotating processes in a
“declarative” way. We assume that the process designer can annotate actions/sequences with the goals they are intended to achieve, and on the basis of
such declared goals, independence among branches can be verified, and then a
recovery process which affects only the branch of interest, without side-effects
on the others, is synthesized.
6.4. A SUMMARY
147
Figure 6.2: A possible process to be carried on in disaster management
scenarios
148
CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES
Main()
1 π.w0 [available(w0 ) ∧ ∀c.require(c, QuestBuildings) ⇒ provide(w0 , c)];
2
Assign(w0 , QuestBuildings);
3 π.w1 [available(w1 ) ∧ ∀c.require(c, TakePhoto) ⇒ provide(w1 , c)];
4
Assign(w1 , TakePhoto);
5 π.w2 [available(w2 ) ∧ ∀c.require(c, EvalPhoto) ⇒ provide(w2 , c)];
6
Assign(w2 , EvalPhoto);
7 π.w3 [available(w3 ) ∧ ∀c.require(c, FirstAid) ⇒ provide(w3 , c)];
8
Assign(w3 , FirstAid);
9 π.w4 [available(w4 ) ∧ ∀c.require(c, InformInjured) ⇒ provide(w4 , c)];
10
Assign(w4 , InformInjured);
11 π.w5 [available(w5 ) ∧ ∀c.require(c, InjuredQuest) ⇒ provide(w5 , c)];
12
Assign(w5 , InjuredQuest);
13 (EvalTake(w0 , w1 , w2 , Loc) k AssistInjured(w3 , Loc) k
14
ReportAssistanceInjured(w4 , w5 , Loc));
15 Release(w0 , QuestBuildings);
16 Release(w1 , TakePhoto);
17 Release(w2 , EvalPhoto);
18 Release(w3 , FirstAid);
19 Release(w4 , InformInjured);
20 Release(w5 , InjuredQuest);
21 Release(w6 , SendByGPRS);
22 π.w6 [available(w6 ) ∧ ∀c.require(c, SendByGPRS) ⇒ provide(w6 , c)];
23
Assign(w6 , SendByGPRS);
24
Start(w6 , SendByGPRS, ∅);
25
AckT askCompletion(w6 , SendByGPRS);
26
Release(w6 , SendByGPRS);
EvalTake(w0 , w1 , w2 , Loc)
1 Start(w0 , QuestBuildings, Loc);
2 AckT askCompletion(w0 , QuestBuildings);
3 Start(w1 , Go, Loc);
4 AckT askCompletion(w1 , Go);
5 Start(w1 , TakePhoto, Loc);
6 AckT askCompletion(w1 , TakePhoto);
7 Start(w2 , Evaluate, Loc);
8 AckT askCompletion(w2 , Evaluate);
AssistInjured(w3 , Loc)
1 Start(w3 , Go, Loc);
2 AckT askCompletion(w3 , Go);
3 Start(w3 , FirstAid, Loc);
4 AckT askCompletion(w3 , FirstAid);
ReportAssistanceInjured(w4 , w5 , Loc)
1 Start(w4 , Go, Loc);
2 AckT askCompletion(w4 , Go);
3 Start(w4 , FindProxy, Loc);
4 AckT askCompletion(w4 , FindProxy);
5 Start(w4 , InformInjured, Loc);
6 AckT askCompletion(w4 , InformInjured);
7 Start(w5 , Go, Loc);
8 AckT askCompletion(w5 , Go);
9 Start(w5 , InjuredQuest, Loc);
10 AckT askCompletion(w5 , InjuredQuest);
Figure 6.3: The IndiGolog program corresponding to the process in Figure 4.2
Chapter 7
Some Covered Related Topics
In the previous chapters we have discussed a technique to deal with the issue of
automatically adapting process instances when some exogenous events produce
discrepancies that make impossible them to be completed. We have described,
as well, a concrete implementation and, hence, shown the feasibility of the
approach.
The approach implemented so far can be summarized at high level as in
Figure 7.1(a) where the PMS engine of SmartPM is deployed on a certain
device, usually an ultra mobile laptop (such as Asus Eep PC 1 ), and, then,
other devices exist in which some services, human controlled and/or automatic,
are installed.
We envision to move from a centralized approach to a distributed one,
where the workflow (i.e., process) execution is not orchestrated by a sole node
but all devices contribute distributively to carry on the workflow. Indeed, generally speaking, devices may not be powerful enough as well as they might
not be continuously connected to this central orchestrator. The different local
PMSs coordinate through the appropriate exchange of messages, conveying
synchronization information and the outputs of the performed actions by the
services. The technique described applies successfully the “Roman Model” for
Service Composition [18] to workflow management. Section 7.1 wants to be
a first step towards the conceptualization of decentralized orchestrators. In
the approach proposed, we do not need the fine granularity to specify explicitly the PMS actions assign, start, ackTaskCompletion, release. Therefore, we
model workflow as final-state automata, and the sequence of PMS actions as
transactions (i.e., arcs) of the automata. At the same time, the distributed
approach of Section 7.1 aims at trying to find a solution for the challenging
issue of synthesizing the schema of a workflow according to the available services. Typically, like widely demonstrated in WORKPAD as well as many
1
http://eeepc.asus.com/
149
150
CHAPTER 7. SOME COVERED RELATED TOPICS
(a) The centralized approach
(b) The distributed approach
Figure 7.1: The centralized vs distributed approach to process management
(Some arrows not shown to preserve the figure readability)
7.1. AUTOMATIC WORKFLOW COMPOSITION
151
other research projects 2 , generic workflows for pervasive scenarios are designed a-priori and, then, just before a team is dropped off in the operation
field, they need to be customized on the basis of the currently available services
offered by the mobile devices of operators actually composing the team.
Both in the centralized and distributed approach, some services are humanbased, in the sense that the work performed by services is done by humans
with the support of a so-called work-list handler. So far, we have developed
only a proof-of-concept implementation for the sake of testing the SmartPM
features. We aim in the near future at providing SmartPM with full-fledged
work-list handlers. We envision two two types of work-list handler: a version
for ultra-mobile PCs and a lighter version for PDAs, “compact” but providing
less features. First steps have been already done in these directions.
As far as the PDA version, we have already implemented a version during
this thesis for the ROME4EU Process Management System, a previous valuable attempt to deal with unexpected deviations (see [7]). A screen shot of the
system is also depicted in Figure 5.6(d). We plan to port it to the SmartPM
system.
As far as the more powerful version for ultra-mobile PCs, Section 7.2 illustrates a possible version. It refers to a new visual tool that can aid users in
selecting the “right” task among a potentially large number of tasks offered to
them. Indeed, in many scenarios, many different processes need to be carried
on at the same time by the same organisation. Therefore, participants can
be confronted with a great deal of processes and, hence, tasks among which
to pick the next one to work on. Many tools are presenting assigned tasks
as a mere list without giving any contextual information helpful for such a
choice. At the moment, the tool works in concert with the YAWL Process
Management System, but the framework is applicable to any PMS, in general,
and to SmartPM, in particular.
7.1
Automatic Workflow Composition
This section proposes a novel technique, sound, complete and terminating,
able to automatically synthesize such distributed orchestrators, given (i) the
target generic workflow to be carried out, in the form of a finite transition
system, and (ii) the set of behaviorally-constrained services, again in the form
of (non deterministic) finite transition systems. This technique deals with the
problem of synthesizing the distributed orchestrator in presence of services
with constrained behaviours.
2
cfr. SHARE (http://www.share-project.org), EGERIS (http://www.egeris.org),
ORCHESTRA
(http://www.eu-orchestra.org),
FORMIDABLE
(http://www.formidable-project.org)
152
CHAPTER 7. SOME COVERED RELATED TOPICS
This issue has some similarities with the one of automatically synthesizing
composite services starting from available ones [93, 95, 80, 128, 65, 9]. In
particular, [9] considers the issue of automatic composition in the case in
which available services are behaviorally constrained, and [11] in the case in
which the available services are behaviorally constrained and the results of the
invoked actions cannot be foreseen, but only observable afterwards. All the
previous approaches consider the case in which the synthesized orchestrator is
centralized.
On the other side, the issue of distributed orchestration has been considered
in the context of Web service technologies [8, 94, 25], but with emphasis on
the needed run-time architectures. Our work can exploit such results, even if
they need to be casted into the mobile scenario (in which service providers are
less powerful).
The remaining of the section is as follows. In Section 7.1.1, the general
framework is presented. Section 7.1.2 presents a complete example, in which a
target workflow, possible available services and the automatically synthesized
orchestrators are shown. Section 7.1.3 presents the proposed technique, and
finally Section 7.1.4 presents some discussions and future work.
7.1.1
Conceptual Architecture
As previously introduced, we consider scenarios in which a team consists of
different operators, each one equipped with PDAs or similar hand-held devices, running specific applications. The interplay of (i) software functionalities running on the device and (ii) human activities to be carried out by the
corresponding operator, are regarded as services, that suitably composed and
orchestrated form the workflow that the team need to carry out. Such a workflow is enacted, during run-time, by the PMS/orchestrator a.k.a. workflow
management system).
The service behavior is modeled by the possible sequences of actions. Such
sequences can be nondeterministic; indeed nondeterministic sequences stem
naturally when modeling services in which the result of each action on the state
of the service can not be foreseen. Let us consider as an example, a service
that allows taking photos of a disaster area; after invoking the operation, the
service can be in a state photo OK (if the overall quality is appropriate), or
in a different state photo bad, if the operator has taken a wrong photo, the
light was not optimal, etc. Note that the orchestrator of a nondeterministic
service can invoke the operation but cannot control what is the result of it.
In other words, the behavior of the service is partially controllable, and the
orchestrator needs to cope with such partial controllability. Note also that if
the orchestrator observes the status in which the service is after an operation,
then it can understand which transition, among those nondeterministically
7.1. AUTOMATIC WORKFLOW COMPOSITION
153
possible in the previous state, has been undertaken by the service. We assume
that the orchestrator can indeed observe states of the available services and
take advantage of this in choosing how to continue in executing the workflow.
The workflow is specified on the basis of a set of available actions (i.e.,
those ones potentially available) and a blackboard, i.e., a conceptual shared
memory in which the services provide information about the output of an
action (cfr. complete observability wrt. the orchestrator). Such a workflow is
specified a-priori (i.e., it encodes predefined procedures to be used by the team,
e.g., in emergency management), without knowing which effective services are
available for its enactment.
The issue is then how to compose (i.e., realize) such a workflow by suitably
orchestrating available services. In the proposed scenario, such a composition
of the workflow is useful when a team leader, before arriving on the operation
field, by observing (i) the available devices and operators constituting the
team (i.e., the available services), and (ii) the target workflow the team is in
charge of, need to derive the orchestration.
At run-time (i.e., when the team is effectively on the operation field), the
orchestrator coordinates the different services in order to enact the workflow.
As a matter of fact, the orchestrator is distributed, i.e., there is not any
coordination device hosting the orchestrator; conversely, each and every device
hosts a local orchestrator, which is in charge of invoking the services residing
on its own device. The various local orchestrators have to communicate with
each other in order to make an agreement on the right sequence of services to
be called.
The communications among orchestrators and between the local orchestrator and the services are carried out through an appropriate middleware,
which offers broadcasting of messages and a possible realization of the blackboard [102]. The blackboard, from an implementation point of view, is also
realized in a distributed fashion.
7.1.2
A Case Study
Let’s consider a scenario where a disastrous event (e.g., an earthquake) breaks
out. The scenario is very similar to those already introduced in other chapters:
after giving first assistance to people involved in the affected area, a civil
protection’s team is sent to the spot. Team members, equipped with mobile
devices, need to document damage directly on a situation map so that following
activities can be scheduled (e.g., reconstruction jobs). Specifically their work
is supposed to be focused on three buildings A, B and C. For each building
a report has to be prepared. Those report should contain: (i) a preliminary
questionnaire giving a description of the building and (ii) some photos of the
building conditions. Filling questionnaires does not require to stay very close
154
CHAPTER 7. SOME COVERED RELATED TOPICS
to buildings, whereas taking photos does.
We suppose team to be composed of three mobile services M S1 , M S2 , M S3 ,
whose capabilities include compiling questionnaires and taking/evaluating
building pictures, in addition to a repository service RS, which is able to forward the documents (questionnaires and pictures) produced by mobile units
to a remote storage in a central hall. Services can read and write some shared
boolean variables, namely {qA,qB,qC,pA,pB,pC,available}, held in a blackboard.
Each service has its own capabilities and limitations, basically depending
on technological, geographical and historical reasons – e.g., a team member
who, in the past, visited building A, makes its respective unit able to compile
questionnaire A; a unit close to building B can move there, and so on. Mobile
services are described by state-transition diagrams where non-deterministic
transitions are allowed. Diagrams of Figures 7.2(a) – 7.2(d) describe, respectively, units M S1 – M S3 and RS. An edge outcoming from a state s is labeled
by a triple E[C]/A, where both [C] and A are optional, with the following semantics: when the service is in state s, if the set of events E occurs and
condition C holds, then: i) change state according to the edge and ii) execute
action A. In this context, a set of events represents a set of requests assigned
to the service, which can be satisfied only if the condition (or guard) holds (is
true). Actions correspond to writing messages on the blackboard, while the
actual fulfillment of requests is implicitly assumed whenever a state transition
takes place. In other words, each set of events represents a request for some
tasks, which are actually performed, provided the respective condition holds,
during the transition. Moreover, blackboard writes can be possibly performed.
For instance, consider Figure 7.2(a). Initially (state S0), M S1 is able
to serve requests: {compile qB} (compile questionnaire about building B),
{read pC} (get photo of building C from repository), {move A} (move to, or
possibly around, building A) and {req space} (ask remote storage for freeing
some space). In all such cases, neither conditions nor actions are defined,
meaning that, e.g., {move A} simply requires the unit to reach, i.e., actually
moving to, building A, independently of any condition and without writing
anything on the blackboard. After building A is reached (S1), a photo can be
taken ({take pA}). A request for this yields a non-deterministic transition,
due to the presence of two different outgoing edges labeled with the same
event and non-mutually-exclusive conditions (indeed, no condition is defined
at all). Note that, besides possibly leading to different states (S2 or S3), a
non-deterministic transition may, in general, give raise to different blackboard
writes, as it happens, e.g., if a request for {eval pC} is assigned when the
service is in state S5. State S2 is reached when, due to lack of light, the
photo comes out too dark. Then, only photo modification ({modify pA},
which makes it lighter) is allowed. On the other hand, state S3 (the photo
7.1. AUTOMATIC WORKFLOW COMPOSITION
155
req_space / { available = T }
{ eval_pC } / { pC = F }
S5
{ write_qB } / { qB = T }
{ eval_pC } / { pC = T}
{ compile_qB }
S0
{ read_pC }
{ move_A }
S4
{ write_pA } [available] / { pA = T }
{ take_pA }
S1
S3
{ modify_pA }
{ take_pA }
S2
{ modify_pA }
{ modify_pA, req_space} / { available = T }
(a) Mobile Service M S1
{ eval_pB } / { pB=F }
{ eval_pB } / { pB=T }
S1
{ write_qB } / { qB=T }
{ compile_qB }
S0
{ read_pB }
{ compile_qC }
S2
{ write_pC } [available] / { pC=T }
{ move_C }
S3
{ write_qC } / { qC=T }
S4
{ take_pC }
{ move_C }
S5
{ modify_pC }
S6
{ modify_pC, req_space } / {available=T}
(b) Mobile Service M S2
{ eval_pA } / { pA = F }
S1
{ eval_pA } / { pA = T }
{ write_qA } / { qA = T }
{ compile_qA }
S0
S2
{ read_pA }
{ write_pB } [available] / { pB = T }
{ move_B }
S3
{ take_pB }
S4
{ modify_pB }
(c) Mobile Service M S3
{ forward } / {available=T}
{ commit } /
{pA=pB=pC=qA=qB=qC=F}
S0
{ forward } / {available=F}
(d) Repository Service RS
Figure 7.2: Mobile services
156
CHAPTER 7. SOME COVERED RELATED TOPICS
S0
{ [¬qA] / compile_qA,
[¬qB] / compile_qB,
[¬qC] / compile_qC}
S1
{ [pA & pB & pC] / commit }
{ [¬pA] / move_A,
[¬pB] / move_B,
[¬pC] / move_C }
{ [¬qA] / write_qA,
[¬qB] / write_qB,
[¬qC] / write_qC,
/ forward }
S3
{ [¬pA] / eval_pA,
[¬pB] / eval_pB,
[¬pC] / eval_pC }
S7
{ [¬pA] / read_pA,
[¬pB] / read_pB,
[¬pC] / read_pC }
S6
S2
{ [¬pA] / move_A,
[¬pB] / move_B,
[¬pC] / move_C }
S8
{ [¬pA] / modify_pA,
[¬pB] / modify_pB,
[¬pC] / modify_pC,
[¬available] / req_space }
{ [¬pA] / take_pA,
[¬pB] / take_pB,
[¬pC] / take_pC }
S4
{ [¬pA] / write_pA,
[¬pB] / write_pB,
[¬pC] / write_pC,
/ forward }
S5
Figure 7.3: The target workflow
is quite fine) gives also the possibility to ask the repository for additional
space while photo modification is being performed ({modify pA,req space}).
In such case, {available=T} is written on the blackboard, which announces
that some space is available in the repository and, thus, additional data can
be stored there. Moreover, state S3 allows for serving a {write pA} request,
which has the effect of writing the taken photo into the remote storage. Such
task can be successfully completed only if there is available space, as required
by condition [available], and, in such case, it is to be followed by action
{pA=T}, in order to announce the availability, in the storage, of a picture of
building A. Now, consider the request for {read pC} outgoing from state S0.
Such task gets a photo of building C, if any, from the remote storage, and
forces a service transition to state S5. Then, {evaluate pC} can be requested
with the aim of checking whether or not the photo captures relevant aspects
of building C and consequently accepting or rejecting it. Recall that the photo
could be not in the storage. If so, a {pC=F} write is performed. Otherwise,
either {pC=T} or {pC=F} can be written on the blackboard, depending on
whether the picture is accepted or not. Finally, we complete the description
of the service by observing that task {write qB} can be requested in order to
write a filled questionnaire in the remote storage, assuming it is small enough
to be written without satisfying any additional space condition.
Semantics of other actions, e.g. write qA, is straightforward and, consequently, diagrams of units M S2 , M S3 and RS can be similarly interpreted.
RS is a service representing an interface between mobile units and the communication channel used for sending data to remote storage. In facts, task
7.1. AUTOMATIC WORKFLOW COMPOSITION
157
forward must be performed by RS whenever a mobile unit is asked for writing (e.g. write pC or write qB) some data. Forwarding means receiving data
from mobile services and writing it to remote storage. For security reasons,
only mobile services are trusted systems which can ask the storage for freeing
space (req space) and can access the storage for reading (e.g., read pC), while
sending data can be performed only by the repository service.
After each forwarding, it may happen that the storage becomes full. This
is why the forward task is non-deterministic and may yield either a
{available=T} or a {available=F} write on the blackboard. On the other
hand, a mobile service performing a {req space} guarantees that remote
storage will free some space, consequently it is deterministic and yields a
{available=T} write on the blackboard. Finally, RS is allowed to send the
remote storage a commit message, which asks the storage for compressing last
received data and consequently makes files no longer available for reading.
The goal of the team is to collect both questionnaires and photos about
all buildings. In Figure 7.3, a graphical representation of the desired workflow is shown where, initially: (i) all services are assumed to be in state S0
and (ii) blackboard state is {qA=qB=qC=pA=pB=pC=F, available=T}. Edges
outcoming from each state are labeled by sets of pairs [C]/T , whith the following semantics: if, in current state, condition (guard) C holds, then task T
must be assigned to some service. Hence, each state transition may require,
in general, the execution of a set of tasks. Observe that the target workflow
is deterministic, that is, no two guards appearing inside different sets which
label different edges outcoming from the same state can be true at the same
time.
Intuitively, after having filled all questionnaire and taken one photo per building, the target workflow requires services to iterate between states S3-S8 until
a a good photo for each building has been sent to the remote storage. Then,
the team must be ready to perform the operation again. In order to guarantee
that pictures actually capture relevant aspects of the buildings, a sort of peer
review strategy is adopted, i.e., each photo a unit writes in the remote storage must be read, evaluated and approved/rejected by a second unit. Both
approval and rejection are publicly announced by writing a proper message
on the blackboard (indeed, it is sufficient {pC=F} or {pC=T}). When all documents are sent (questionnaires are not subject to review process) a commit
message is sent to the remote storage and the team can start a new iteration.
Finally, in Figures 7.4 and 7.5 a solution to the distributed composition
problem is presented which consists of a set of local orchestrators which, upon
execution, coordinate the services in order to realize the target workflow of
Figure 7.3. Recall that each mobile service is attached to a local orchestrator
which is able to both assigning tasks to the service itself and broadcasting messages. In order to accomplish their task, that is, realizing workflow transitions
158
CHAPTER 7. SOME COVERED RELATED TOPICS
by properly assigning a subset of workflow requests to the respective services,
local orchestrators need to access, for each transition: (i) the set of workflow
requests and (ii) the whole set of messages other orchestrators sent. For this
reason, both workflow requests and orchestrator messages are broadcasted.
Each orchestrator transition is labeled by a pair I/O, which means: if, in current state, I occurs, then perform O, where I = hA, M, si and O = hA0 , M 0 i
with the following semantics: A is the set of tasks the workflow requests, M is
the set of (broadcasted) messages the orchestrator received (including its own
messages), s is the state reached by the attached service after tasks assigned by
the orchestrator (A0 , see below) have been performed, A0 ⊆ A is the subset of
actions the local orchestrator assigns to the attached service and M 0 is the set
of messages the orchestrator broadcasts after the service performed A0 . Notation has been compacted by introducing some shortcuts for set representation.
In details, (i) “. . .” stands for “any set of elements”: for instance, in the transition between states S0 and S1 of local orchestrator for M S1 (Figure 7.4(a)),
the set {. . . commit} represents any set (of tasks) containing commit; (ii) an
element with the prefix “-” stands for “anything but the element, possibly
nothing”: for instance, in the first (from top) transition between states S4 and
S5 of Figure 7.4(a), the set {. . . modify pA, -req space} stands for “any set
(of tasks) not including req space and including modify pA”.
Observe that local orchestrators are deterministic, that is, at each state,
no ambiguity holds on which transition, if any, has to be selected. In general,
this is due to the presence of messages, which are useful for selecting which
tasks are to be assigned to each service. As an example, observe that third
and fourth transitions of Figure 7.4(a) can be performed when a same set of
tasks ({. . . req space, modify pA}) is requested by the workflow. The choice
of which one is to be assigned to attached service depends on the messages the
orchestrator received, which somehow represent other services current capabilities. So, in state S4, when the set of requested tasks includes both req space
and modify pA: (i) if received messages include m13 (that is, the message local orchestrator for M S1 sends when the service reaches state S3 from S1),
then the orchestrator assigns tasks {modify pA, req space} to the service;
(ii) otherwise, the set of assigned tasks is {modify pA} and, consequently, there
will be another local orchestrator assigning a set of tasks including req space
to its respective service, basically depending on the messages it received.
The orchestrators for M S2 and M S3 are roughly similar. The only noticeable difference is in transition between state S4 and S5 where the local
orchestrator for M S3 assigns the same action modify pB for the attached service, independently of the other actions to be assigned. Indeed, orchestrators
M S1 and M S2 makes this assignment dependent of the actions which are to
be assigned to other services.
7.1. AUTOMATIC WORKFLOW COMPOSITION
< { … eval_pC }, { … m 51 },S0 >
/ < { eval_pC }, { m01} >
< { … commit }, { … }, { … } > / < { }, { } >
S0
159
S8
S7
< { ... compile_qB }, { … m01 }, S4 > / < { compile_qB }, { m41 } >
< { … read_pC }, { … m 01 }, S5 >
/ < { read_pC }, { m51 } >
S1
< { ... write_qB } , { ...
m41}
, S0 > / < { write_qB}, {
m01 }
>
S6
S2
< { ... move_A }, { ... m 01}, S1 >
/ < { move_A }, { m 11 } >
1
<{
…
A}
ve _
mo
,{
0
…m
}, S
{
/<
1>
mo
A
ve_
}, {
1
m1
}>
< { … write_pA }, { … m 31 }, S0 >
/ < { write_pA }, { m01 } >
< {… - req_space modify_pA }, { ... m21 } ∪ { … m31 }, S3 > /
< { modify_pA }, { m31 } >
< { ... req_space }, { ... m 01 }, S0 > /
< { req_space }, { m 11 } >
S3
< { ... take_pA }, { ... m 11 }, S2 >
/ < { take_pA }, { m 21 } >
< { ... take_pA } , { ...
/ < { take_pA }, {
m 11 }, S3
m 31 } >
< { ... req_space, modify_Pa }, { ... m 31 }, S3 > /
< {modify_pA, req_space }, { m31 } >
S4
>
< { … req_space, modify_pA }, { … m 21, m02 }, S3 > /
< { modify_pA }, { m30 } >
S5
< { … req_space, modify_pA }, { … m21, m62 }, S3 > /
< { modify_pA }, {m3 1 } >
(a) Local orchestrator for M S1
< { … commit }, { … }, { … } > / < { }, { } >
S0
S8
S7
< { ... compile_qC }, { … m 02 }, S3 > / < { compile_qC }, { m 32 } >
< { … read_pB }, { … m 02 }, S1 >
/ < { read_pC }, { m12 } >
S1
< { ... write_qC } , { ... m32} , S4 > / < { write_qC}, { m 42 } >
S6
S2
< { ... move_C }, { ... m 42 }, S5 >
/ < { move_C }, { m 52 } >
<{
C },
ove_
.. . m
{ . ..
>/<
2 } , S5
m4
{ mo
ve_C
}, {
2
m5
}>
< { … write_pC }, { … m 62 }, S0 >
/ < { write_pA }, { m02 } >
< {… - req_space modify_pC }, { ... m62 }, S6 > /
< { modify_pC }, { m62 } >
< { ... req_space, modify_Pa }, { ... m 32 }, S6 > /
< {modify_pC }, { m62 } >
S3
< { ... take_pC }, { ... m42 }, S6 >
/ < { take_pC }, { m62 } >
S4
< { … req_space, modify_pA }, { … m 21, m02 }, S6 > /
< { req_space }, { m62 } >
S5
< { … req_space, modify_pC }, { … m21, m 62}, S6 > /
< {req_space, modify_pC }, {m62 } >
(b) Local orchestrator for M S2
Figure 7.4: Local orchestrators for services of Figure 7.2 (to be continued)
160
CHAPTER 7. SOME COVERED RELATED TOPICS
< { … eval_pA }, { … m 13 }, S0 >
/ < { eval_pA }, { m 03} >
< { … commit }, { … }, { … } > / < { }, { } >
S0
S8
S7
< { ... compile_qA }, { … m03 }, S2 > / < { compile_qB }, { m23 } >
< { … read_pA }, { … m 03 }, S1 >
/ < { read_pA }, { m13 } >
S1
< { ... write_qA } , { ... m23} , S0 > / < { write_qA }, { m 03 } >
S6
S2
< { ... move_B }, { ... m 03 }, S3 >
/ < { move_B }, { m 33 } >
<
B
ve_
mo
{ . ..
}, {
... m
3
3 }, S
<
m
}, {
3
3
}>
< { … write_pB }, { … m 43 }, S0 >
/ < { write_pB }, { m03 } >
0
< { ... take_pB }, { ... m 33 }, S4 >
/ < { take_pB }, { m 43 } >
S3
>/
_B
ove
{m
< { ... modify_pB }, { ... m43 }, S4 > / < { modify_pB }, { m43 } >
S4
S5
(a) Local orchestrator for M S3
< { … commit } , { … }, S0 > / < { commit }, {m04} >
S0
τ
< { - commit } , { … }, S0 > / < { }, {m04} >
S1
τ
S8
S7
τ
S6
< { … forward } , { … }, S0 > /
< { forward }, {m04} >
S5
< { … forward } , { … }, S0 > /
< { forward }, {m 04} >
S2
τ ≡ < { …} , { … }, { … } > / < { }, { } >
τ
S3
τ
S4
(b) Local orchestrator for RS
Figure 7.5: Local orchestrators for services of Figure 7.2 (continued) and target
workflow of Figure 7.3
7.1. AUTOMATIC WORKFLOW COMPOSITION
7.1.3
161
The Proposed Technique
The formal setting. A Workflow Specification Kit (WfSK) K = (A, V)
consists of a finite set of actions A and a finite set of variables V, also called
blackboard, that can assume only a finite set of values. Actions have known
(but not modeled here) effects on the real world, while they do not change
directly the blackboard.
Using a WfSK K one can define workflows over K. Formally a workflow
W over K is defined as a tuple: W = (S, s0 , G, δW , F ), where:
• S is a finite set of workflow states;
• s0 ∈ S is the single initial state;
• G is a set of guards, i.e., formulas whose atoms are equalities (interpreted
in the obvious way) involving variables and values.;
• δW ⊆ S ×G×2A−{∅} ×S is the workflow transition relation: (s, g, A, s0 ) ∈
δW denotes that in the state s, if the guard g is true in the current blackboard state, then the set of (concurrent) actions A ⊆ A is executed and
the service changes state to s0 ; we insist that such a transition relation is
actually deterministic: for no two distinct transitions (s, g1 , A1 , s1 ) and
(s, g2 , A2 , s2 ) in δW we have that g1 (γ) = g2 (γ) = true, where γ is the
current blackboard state;
• finally, F ⊆ S is the set of states of the workflow that are final, that is,
the states in which the workflow can stop executing.
In other words a workflow is a finite state program whose atomic instructions
are sets of actions of A (more precisely invocation of actions), that branches
on conditions to be evaluated on the current state of the blackboard V.
What characterizes our setting however is that actions in the WfSK do
not have a direct implementation, but instead are realized through available
services. In other words action executions are not independent one from the
other but they are constrained by the services that include them. A service is
essentially a program for a client (actually the orchestrator, as we have seen).
Such a program, however, leaves the selection of the set of actions to perform
next to the client itself (actually the orchestrator). More precisely, at each
step the program presents to the client (orchestrator) a choice of available sets
of (concurrent) actions; the client (orchestrator) selects one of such sets; the
actions in the selected set are executed concurrently; and so on.
Formally, a service S is a tuple S = (S, s0 , G, C, δS , F ) where:
• S is a finite set of states;
162
CHAPTER 7. SOME COVERED RELATED TOPICS
• s0 ∈ S is the single initial state;
• G is a set of guards, as described for workflows;
• C is a set of partial variable assignment for V, that is used to update
the state of the blackboard;
• δS ⊆ S × G × 2A−{∅} × C × S is the service transition relation, where
(s, g, A, c, s0 ) ∈ δS denotes that in the state s, if the guard g is true in
the current blackboard state and it is requested the execution of the set
of actions A ⊆ A, then the blackboard state is updated according to c
and the service changes state to s0 ;
• finally, F ⊆ S is the set of states that can be considered final, that is, the
states in which the service can stop executing, but does not necessarily
have to.
Observe that, in general, services are nondeterministic in the sense that they
may allow more than one transition with the same set A of actions and compatible guards evaluating to the same truth value 3 . As a result, when the
client (orchestrator) instructs a service to execute a given set of actions, it
cannot be certain of which choices it will have later on, since that depends
on what transition is actually executed – nondeterministic services are only
partially controllable.
To each service we associate a local orchestrator. A local orchestrator is
a module that can be (externally) attached to a service in order to control
its operation. It has the ability of activating-resuming its controlled service
by instructing it to execute a set of actions. Also, the orchestrator has the
ability of broadcasting messages from a given set of M after observing how the
attached service evolved w.r.t. the delegated set of actions, and to access all
messages broadcasted by the other local orchestrators at every step. Notice
that the local orchestrator is not even aware of the existence of the other
services: all it can do is to access their broadcasted messages. Lastly, the
orchestrator has full observability on the blackboard state.
A (messages extended) service history h+
S for a given service S =
(S, s0 , G, C, δS , F ), starting in a blackboard state γ0 , is any finite sequence
of the form (s0 , γ 0 , M 0 )·A1 ·(s1 , γ 1 , M 1 ) · · · (s`−1 , γ `−1 , M `−1 )·A` · (s` , γ ` , M ` ),
for some ` ≥ 0, such that for all 0 ≤ k ≤ ` and 0 ≤ j ≤ ` − 1:
• s0 = s0 ;
• γ 0 = γ0 ;
3
Note that this kind of nondeterminism is of a devilish nature – the actual choice is out
of the client (orchestrator) control.
7.1. AUTOMATIC WORKFLOW COMPOSITION
163
• Ak ⊆ A;
• (sj , g, Aj+1 , c, sj+1 ) ∈ δi with g(γ j ) = true and c(γ j ) = γ j+1 that is,
service S can evolve from its current state sj to state sj+1 while updating
the backboard state from γ j to γ j+1 according to what specified in c;
• M 0 = ∅ and M k ⊆ M, for all k ∈ {0, . . . , `}.
The set HB+ denotes the set of all service histories for S.
Formally, a local orchestrator O = (P, B) for service S is a pair of functions
of the following form:
P : HB+ × 2A → 2A ;
B : HB+ × 2A × S → 2M .
Function P states what actions A0 ⊆ A to delegate to the attached service
at local service history h+
B when actions A were requested. Function B states
what messages, if any, are to be broadcasted under the same circumstances
and the fact that the attached service has just moved to state s after executing
actions A0 . We attach one local orchestrator Oi to each available service Si .
In general, local orchestrators can have infinite states.
A distributed orchestrator is a set X = (O1 , . . . , On ) of local orchestrators,
one for each available service Si .
We call device the pair D = (S, O) constituted by a service S and its local
orchestrator O.
A workflow mobile environment (WfME) is constituted by a finite set of
devices E = (D1 , . . . , Dn ) defined over the same WfSK K.
Local Orchestrator Synthesis. The problem we are interested in is the
following: given n services S1 , . . . , Sn over WfSK K = (A, V) and an initial
blackboard state γ0 , and a workflow W over K, synthesize a distributed orchestrator, i.e., a team of n local orchestrators, such that the workflow is realized
by concurrently running all services under the control of their respective orchestrators.
More precisely, let S1 , . . . , Sn be the n services, each with Si =
(Si , si0 , Gi , Ci , δi , Fi ), γ0 be the initial state of the blackboard, and W =
(SW , sW0 , GW , δW , FW ) the workflow to be realized.
We start by observing that the workflow (being deterministic) is completely
characterized by its set of traces, that is, by the set of infinite action sequences
that are faithful to its transitions, and of finite sequences that in addition lead
to a final state. More formally, a trace for W is a sequence of pairs (g, A),
where g ∈ G is a guard over V and A ⊆ A is non-empty set of actions, of the
form t = (g 1 , A1 ) · (g 2 , A2 ) · · · such that there exists an execution history 4 for
4
Analogous the execution histories defined for services except that they do not include
messages.
164
CHAPTER 7. SOME COVERED RELATED TOPICS
W, (s0 , γ 0 )·A1 ·(s1 , γ 1 ) · · · where g i (γ i−1 ) = true for all i ≥ 1. If the trace
t = (g 1 , A1 ) · · · (g ` , A` ) is finite, then there exists a finite execution history
(s0 , γ 0 )· · ·(s` , γ ` ) · · · with s` ∈ FW .
Now, given a trace t = (g 1 , A1 ) · (g 2 , A2 ) · · · of the workflow W, we say
that a distributed orchestrator X = (O1 , . . . , On ) realizes the trace t iff for all
`
` and for all “system history” h` ∈ Ht,X
(formally defined defined below) with
`+1
`
g (γ ) = true in the last configuration of h` , we have that Extt,X (h` , A`+1 )
is nonempty, where Extt,X (h, A) is the set of (|h| + 1)-length system histories
|h|+1
|h|+1
of the form h · [A1 , . . . , An ] · (s1
, . . . , sn , γ |h|+1 , M |h|+1 ) such that:
|h|
|h|
• (s1 , . . . , sn , γ |h| , M |h| ) is the last configuration in h;
S
• A = ni=1 Ai , that is, the requested set of actions A is fulfilled by putting
together all the actions executed by every service.
• Pi (h|i , A) = Ai for all i ∈ {1, . . . , n}, that is, the local orchestrator Oi
instructed service Si to execute actions Ai ;
|h|
|h|+1
• (si , gi , Ai , ci , si
) ∈ δi with gi (γ |h| ) = true, that is, service Si can
|h|
|h|+1
evolve from its current state si to state si
w.r.t. the (current)
variable assignment γ |h| ;
• γ |h|+1 ∈ C(γ |h| ), where C = {c1 , . . . , cn } is the set of the partial variable
assignments ci due to each of the service, and C(γ |h| ) is the set of blackboard states that are obtained from γ |h| by applying each c1 , . . . , cn in
every possible order;
S
• M |h|+1 = ni=1 Bi (h|i , A, s|h|+1 ), that is, the set of broadcasted messages
is the union of all messages broadcasted by each local orchestrator.
k
The set Ht,X
of all histories that implement the first k actions of trace t
and is prescribed by X is defined as follows:
0 = {(s , . . . , s , γ , ∅)};
• Ht,X
10
n0 0
k+1
• Ht,X
=
S
k
hk ∈Ht,X
Extt,X (hk , Ak+1 ), k ≥ 0;
In addition if a trace is finite and ends after m actions, and all along all its
m end with all services in
guards are satisfied, we have that all histories in Ht,X
a final state. Finally, we say that a distributed orchestrator X = (O1 , . . . , On )
realizes the workflow W if it realizes all its traces.
In order to understand the above definitions, let us observe that, intuitively, the team of local orchestrators realizes a trace if, as long as the guards
in the trace are satisfied, they can globally perform all actions prescribed by
7.1. AUTOMATIC WORKFLOW COMPOSITION
165
the trace (each of the local orchestrators instructs its service to do some of
them). In order to do so, each local orchestrator can use the history of its
service together with the (global) messages that have been broadcasted so far.
In some sense, implicitly through such messages, each local orchestrator gets
information on the other service local histories in order to take the right decision. Furthermore, at each step, each local orchestrator broadcasts messages.
Such messages will be used in the next step by all service orchestrators to
choose how to proceed.
Our technical results make use of some outcomes given in [122], which can
be summarised by the following theorem.
Theorem 7.1. There exists a sound, complete and terminating procedure for
computing a distributed orchestrator X = (O1 , . . . , On ) that realizes a workflow
W over a WfSK K relative to services S1 , . . . , Sn over K and blackboard state
γ0 . Moreover each local orchestrator Oi returned by such a procedure is finite
state and require a finite number of messages (more precisely message types).
Observe that there exists no finiteness limitation on the number of states
of the local orchestrators, nor on the number of messages to be exchanged.
Therefore, it does not lose generality.
The synthesis procedure is based on the general techniques proposed in
[10, 11, 32], based on a reduction of the problem to satisfiability of a Propositional Dynamic Logic formula [61] whose models roughly correspond to orchestrators.5 From a realization point of view, such a procedure can be implemented through the same basic algorithms behind the success of the description logics-based reasoning systems used for OWL6 , such as FaCT7 , Racer8 ,
Pellet9 , and hence its applicability appears to be quite promising. The reader
should note that the technique is not exploited at run-time, but before the execution of the services and the local orchestrators effectively happens, therefore
the requirements of mobile scenarios are not violated (e.g., just to have a concrete example, it can be run on a laptop on the jeep taking the team on the
operation field).
7.1.4
Final remarks
This section has studied the workflow composition problem within a distributed general setting; the solutions proposed here are therefore palatable to
a wide range of contexts, e.g., nomadic teams in emergency management, in
5
There are also works that study alternatives based on model checking techniques.
http://www.omg.org/uml/
7
http://www.cs.man.ac.uk/ horrocks/FaCT/
8
http://www.sts.tu-harburg.de/ r.f.moeller/racer/
9
http://www.mindswap.org/2003/pellet/
6
166
CHAPTER 7. SOME COVERED RELATED TOPICS
which we have multiple independent agents and a centralized solution is not
conceivable.
We plan to implement concretely this approach in the context of the research project WORKPAD, widely introduced in Section 2, as well as in another project SM4All [21]. SM4All is investigating an innovative platform for
collaborating smart embedded services in pervasive and person-centric environments, through the use of semantic techniques and workflow composition.
In conclusion, the kind of problems we dealt with are special forms of reactive process synthesis [109, 110]. It is well known that, in general, distributed
solutions are much harder to get than centralized ones [110, 78]. This has
not hampered our approach since we allow for equipping local controllers with
autonomous message exchange capabilities, even if such capabilities are not
present in the services that they control.
7.2
Visual Support for Work Assignment in Process
Management Systems
This section describes a novel work-list handler that is able to support process
participants when choosing the next work item to work on. The work list
handler component takes care of work distribution and authorisation issues
by assigning work items to appropriate participants. Typically, it uses a socalled “pull mechanism”, i.e., work is offered to all resources that qualify and
the first resource to select the work item will be the only one executing it. To
allow users to “pull the right work items in the right order”, basic information
is provided, e.g., task name, due date, etc. However, given the fact that the
work list is the main interface of the PMS with its users it seems important
to provide support that goes beyond a sorted list of items. If work items
are selected by less qualified users than necessary or if users select items in a
non-optimal order, then the performance of the overall process is hampered.
Assume the situation where multiple resources have overlapping roles and
authorisations and that there are times where work is piling up (i.e., any
normal business). In such a situation the questions listed below are relevant.
• “What is the most urgent work item I can perform?”
• “What work item is, geographically speaking, closest to me?”
• “Is there another resource that can perform this work item that is closer
to it than me?”
• “Is it critical that I handle this work item or are there others that can
also do this?”
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
167
• “How are the work items divided over the different departments?”
To our knowledge, commercial as well as open source PMSs present work
lists simply as a list of work items each with a short textual description.
Some products sort the work items in a work list using a certain priority
scheme specified at design time and not updated at run time. To support
the user in a better way and assist her in answering the above questions, we
use diagrams. A diagram can be a geographical diagram (e.g., the map of a
university’s campus). But other diagrams can be used, e.g., process schemas,
organisational diagrams, Gantt charts, etc. Work items can be visualised by
dots on diagrams. By not fixing the type of diagram, but allowing this choice
to be configurable, different types of relationships can be shown thus providing
a deeper insight into the context of the work to be performed.
Work items are shown on diagrams. Moreover, for some diagrams also
resources can be shown, e.g., the geographical position of a user. Besides
the “diagram metaphor” we also use the “distance metaphor”. Seen from
the viewpoint of the user some work items are close while others are farther
away. This distance may be geographic, e.g., a field service engineer may
be far away from a malfunctioning printer at the other side of the campus.
However, many other distance metrics are possible. For example, one can
support metrics capturing familiarity with certain types of work, levels of
urgency, and organisational distance. It should be noted that the choice of
metric is orthogonal to the choice of diagram thus providing a high degree of
flexibility in context visualisation. Resources could for example opt to see a
geographical map where work items, whose position is calculated based on a
function supplied at design time, display their level of urgency.
This section proposes different types of diagrams and distance metrics.
Moreover, the framework has been implemented and integrated in YAWL.10
YAWL is an open source workflow system based on the so-called workflow
patterns. However, the framework and its implementation are set-up in such
a way that it can easily be combined with other PMSs.
The section is structured as follows. Section 7.2.1 discusses the state of
the art in work list visualisation in PMSs, whereas Section 7.2.2 provides a
detailed overview of the general framework. Section 7.2.5 focusses on the implementation of the framework and highlights some design choices in relation
to user and system interfaces. In Section 7.2.9 the framework is illustrated
through a case study. Section 7.2.10 summarises the contributions and outlines avenues of future work aimed at improving the operationalisation of the
framework.
10
www.yawlfoundation.org
168
7.2.1
CHAPTER 7. SOME COVERED RELATED TOPICS
Related Work
Little work has been conducted in the field of work list visualisation. Visualisation techniques in the area of PMS have predominantly been used to aid in the
understanding of process schemas and their run time behaviour, e.g. through
simulation [60] or process mining [135]. Although the value of business process
visualisation is acknowledged, both in the literature [15, 86, 124, 145] and in
the industry, little work has been done in the context of visualising work items.
The aforementioned body of work does not provide specific support for
context-dependent work item selection. This is addressed though in the work
by Brown and Paik [17], whose basic idea is close to the proposal here. Images
can be defined as diagrams and mappings can be specified between work items
and these diagrams. Work items are visualized through the use of intuitive
icons and the colour of work items changes according to their state. However,
the approach chosen does not work so well in real-life scenarios where many
work items may have the same position (especially in course-grained diagrams)
as icons with the same position are placed alongside each other. This may lead
to a situation where a diagram is completely obscured by its work items. In
our approach, these items are coalesced in a single dot of which the size is
proportionate to their number. By gradually zooming in on such a dot, the
individual work items cam become visible again. In addition, in [17] there
is no concept similar to our distance notion, which is an ingredient that can
provide significant assistance with work item selection to resources. Finally,
the work of Brown and Paik does not take the visualisation of the positions of
resources into account.
Also related is the work presented in [77], where proximity of work items
is considered without discussing their visualization.
Most PMSs present work lists as a simple enumeration of their work items,
their textual descriptions, and possibly information about their priority and/or
their deadlines. This holds both for open source products, as e.g. jBPM11 and
Together Workflow12 , as for commercial systems, such as SAP Netweaver13 and
Flower14 . An exception is TIBCO’s iProcess Suite15 which provides a richer
type of work list handler that partially addresses the problem of supporting
resources with work item selection. Figure 7.6 depicts a screen shot of the
work list handler. In the bottom left corner a resource’s work list is shown,
and above this the lengths of the work lists of other resources is shown. By
clicking on a work item, a resource can see it on a Google Map positioned
11
jBPM web site - http://www.jboss.com/products/jbpm
Together Workflow web site - http://www.together.at/together/prod/tws/
13
Netweaver web site - http://www.sap.com/usa/platform/netweaver
14
Flower web site - http://global.pallas-athena.com/products/bpmflower product/
15
iProcess Suite web site - http://www.tibco.com/software/business process management/
12
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
169
Figure 7.6: TIBCO’s iProcess Client
where it should be executed. The iProcess Suite also supports a kind of lookhead in the form of a list of “predicted” work items and their start times.
One can also learn about projected deadline expirations and exception flows.
This is achieved through the use of expected durations specified at design
time for the various tasks. Our visualisation framework is more accurate as it
can take actual execution times of work items of a task into account through
the use of log files when considering predictions for new work items of that
task. Basically, the iProcess Suite provides support for some specific views
(geographical position, deadline expiration) but these are isolated from each
other. Our approach allows these views (and others) to be combined (e.g. a
geographical view where deadlines are also visualised) thus enabling the use of
views that may prove useful in certain contexts. Our approach also generalises
over the type of diagram and goes beyond support for a single diagram as in
the iProcess Suite (a geographical map).
7.2.2
The General Framework
The proposed visualisation framework is based on a two-layer approach: (1)
diagrams and (2) the visualisation of work items based on a distance notion.
A work item is represented as a dot positioned along certain coordinates on a
background diagram. A diagram is meant to capture a particular perspective
of the context of the process. Since a work item can be associated with several
perspectives, it can be visualised in several diagrams (at different positions).
Diagrams can be designed as needed. When the use of a certain diagram
170
CHAPTER 7. SOME COVERED RELATED TOPICS
Process context view
The physical environment
where tasks are going to be
performed.
The process schema of the
case that work items belong
to.
Deadline expiration of work
items.
The organisation that is in
charge of carrying out the process.
The materials that are needed
for carrying out work items.
Costs versus benefits in executing work items.
Possible diagram and mapping
A real geographical diagram (e.g., Google map). Work items are placed
where they should be performed and resource are placed where they
are located.
The process schema is the diagram and work items are placed on top
of tasks that they are an instance of.
The diagram is a time-line where the origin is the current time. Work
items are placed on the time-line at the latest moment when they can
start without their deadline expiring.
The diagram is an organizational chart. Work items are associated
with the role required for their execution. Resources are also shown
based on their organizational position.
The diagram is a multidimensional graph where the axes are the materials that are needed for work item execution. Let us assume that
materials A and B are associated with axes x and y respectively. In
this case, a work item is placed on coordinates (x, y) if it needs a
quantity of x of material A and a quantity y of material B.
In this case, the axes represent “Revenue” (the amount of money received for the performance of work items) and “Cost” (the expense of
their execution). A work item is placed on coordinates (x, y) if the
revenue of its execution is x and its cost is y. In this case one is best
off executing work items close to the x axis and far from the origin.
Table 7.1: Examples of diagrams and mappings.
is envisaged, the relationship between work items and their position on the
diagram should be specified through a function determined at design time.
Table 7.1 gives some examples of context views and the corresponding work
item mapping.
Several active “views” can be supported whereby users can switch from
one view to another. Resources can (optionally) see their own position on the
diagram and work items are coloured according to the value of the applicable
distance metric. Additionally, it may be helpful to show executing work items
as well as the position of other resources. Naturally, these visualisations are
governed by the authorisations that are in place.
Our framework assumes a generic lifecycle model as described in [120],
which is slightly more elaborated than the SmartPM one. First, a work item
is created indicating that it is ready for distribution. The item is then offered
to appropriate resources. A resource can commit to the execution of the item,
after which it moves to the allocated state. The start of its execution leads it
to the next state, started, after which it can successfully complete, it can be
suspended (and subsequently resumed ) or it can fail altogether. During runtime a workflow engine (in our case the YAWL engine) informs the framework
about the lifecyle states of work items.
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
7.2.3
171
Fundamentals
In this section the various notions used in our framework, e.g. work item and
resource, are defined formally.
Definition 7.1 (Work item). A work item w is a tuple (c, t, i, y, e, l),
where:
• c is the identifier of the case that w belongs to.
• t is the identifier of the task of which w is an instance.
• i is a unique instance number.
• y is the timestamp capturing when w moved to the “offered” state.
• e is the (optional) deadline of w.
• l represents the (optional) GPS coordinates where w should be executed.
Dimensions y and l may be undefined in case the work item w is not
yet offered or no specific execution location exists respectively. The e value
concerns timers which may be defined in YAWL processes. A process region
may be associated with a timer. When the timer expires, the work items part
of the region are cancelled. Note that a work item can potentially be a part
of more than one cancellation region and that this has implications for the
definition of y. In such a case the latest possible completion time with respect
to these cancellation regions is assumed.
Definition 7.2 (Resource). A resource r is a pair (j, l), where:
• j is the identifier of the resource.
• l represents the (optional) GPS coordinates where the resource is currently located.
The notation wx is used to denote the projection on dimension x of work
item w, while the notation ry is used to denote the projection on dimension
y of resource r. For example, wt yields the task of which work item w is an
instance.
Work items w0 and w00 are considered to be siblings iff wt0 = wt00 . The set
Coordinates consists of all possible coordinates. Elements of this set will be
used to identify various positions on a given map.
Definition 7.3 (Position function). Let W and R be the set of work items
and resources. Let M be the set of available maps. For each available map
m ∈ M , there exists a function position m : W ∪ R 6→ Coordinates which
returns the current coordinates for work items and available resources on map
m.
172
CHAPTER 7. SOME COVERED RELATED TOPICS
Metric
distanceF amiliarity (w, r)
distanceGeo
Distance (w, r)
distanceP opularity (w, r)
distanceU rgency (w, r)
distanceP ast
Execution(w,r)
Returned Value
How familiar is resource r with performing work item w. This can be
measured through the number of sibling work items the resource has
already performed.
How close is resource r to work item w compared to the closest resource
that was offered w. For the closest resource this distance is 1. In case
w does not have a specific GPS location where it should be executed,
this metric returns 1 for all resources.
The ratio of logged-in resources having been offered w to all loggedin resources. This metric is independent from resource r making the
request.
The ratio between the current timestamp and the latest timestamp
when work item w can start but is not likely to expire. The latter
timestamp is obtained as the difference between we , the latest timestamp when w has to be finished without expiring, and w’s estimated
duration. This estimation is based on past execution of sibling work
items of w by r.
How familiar is resource r with work item w compared to the familiarity
of all other resources that w has been offered to. More information
about this metric is provided in the text.
Table 7.2: Distance Metrics currently provided by the implementation
For a map m ∈ M , the function position m may be partial, since some
elements of W and/or R may not have an associated position. Consider for
example the case where a work item can be performed at any geographical
location or where it does not really make sense to associate a resource with
a position on a certain map. As the various attributes of work items and
resources may vary over time it is important to see the class of functions
position m as time dependent.
To formalise the notion of distance metric, a distance function is defined
for each metric that yields the distance between a work item and a resource
according to that metric.
Definition 7.4 (Distance function). Let W and R be the set of work items
and resources. Let D be the set of available distance metrics. For each distance
metric d ∈ D, there exists a function distanced : W × R → [0, 1] that returns
a number in the range [0,1] capturing the distance between work-item w ∈ W
and resource r ∈ R with respect to metric d.16
Given a certain metric d and a resource r, the next work item r should
perform is a work item w for which the value distanced (w, r) is the closest to
1 among all offered work items.
7.2.4
Available Metrics
In Table 7.2 a number of general-purpose distance metrics are informally explained. These are all provided with the current implementation. Later in
16
Please note the value 1 represents the minimum distance while 0 is the maximum.
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
173
this section, we formalise the notion of metrics. Let us denote R as the set
of resources currently logged in. In order to make explanations easier, some
auxiliary functions are introduced.
past execution(w,r) yields the weighted mean of the past execution times of
the last h-th work items performed by r among all work item siblings of
w. In this context, the past execution time of work item w0 is defined as
the duration that elapsed between its assignment to r and its successful
completion. Let timei (w, r) be the execution time of the i-th last work
item among w’s siblings performed by r, then:
j(w,r,h)
X
αi−1 · timei (w, r)
i=1
past execution(w, r) =
(7.1)
j(w,r,h)
X
αi−1
i=1
where constant α ∈ [0, 1] and value j(w,r,h) is the minimum between a
given constant h and the number of sibling work items of w performed
by r. Both h and α have to be tuned through testing. If value j(w,r,h) is
equal to zero, past execution(w, r) is assumed to take an arbitrary large
number.17 The intuition behind this definition stems from the fact that
more recent executions should be given more consideration and hence
weighted more as they better reflect resources gaining experience in the
execution of instances of a certain task.
Res(w) returns all currently logged-in resources that have been offered w:
Res(w) = {r ∈ R | w is offered to r}.
best past execution(w) denotes the smallest value for past execution(w, r)
computed among all logged-in resources r qualified for w. Specifically:
best past execution(w) =
min past execution(w, r0 )
r0 ∈Res(w)
bestDistance(w) returns the minimum geographic distance between a given
work-item w and all qualified resources:
best Distance(w) =
min kwl − rl0 k
r0 ∈Res(w)
where kwl − rl0 k stands for the Euclidian distance between the GPS coordinates where w should be executed and the GPS location of resource
r. Function best Distance(w) is not total since wl may be undefined for
certain work items w.
17
Technically, we set it as the maximum floating value.
174
CHAPTER 7. SOME COVERED RELATED TOPICS
Using these auxiliary functions the following metrics can be defined:
1. Familiarity. How familiar is resource r with performing work item w.
This can be measured through the number of sibling work items the resource
has already performed:
½
distanceF amiliarity (w, r) =
0
best past execution(w)
past execution(w,r)
best past execution(w) → ∞
otherwise
The best past execution(w) value can tend to infinite, if nobody has ever executed work items for task wt . Otherwise, if someone executed work item
siblings of wt but r did not, then past execution(w, r) → ∞ and, hence,
distanceF amiliarity (w, r) → 0.
2. Popularity. The ratio of logged-in resources having been offered w to
all logged-in resources. This metric is independent from resource r making
the request. The intuition is that if many resources can perform w then it is
quite distant from every resource. Indeed, even if a resource doesn’t pick w
for performance, it is likely someone else may execute w. Therefore:
distanceP opularity (w, r) = 1 −
|Res(w)|
|R|
If every resource can perform w, then the distance is 0. If many resources can
perform w, then the value is near to 1.
3. Urgency. The ratio between the current timestamp and the latest timestamp when work item w can start but is not likely to expire. This second
timestamp is obtained from we , the latest timestamp when w has to be finished without expiring, and w’s estimated duration. This estimation relies on
the past execution by r of w’s sibling work items. Specifically:
½
distanceU rgency (w, r) =
1−
0
tnow
we −pastExecution(w,r)
we is def ined
we is undef ined
where tnow stands for the current timestamp. If r has never performed
work-items for the same task wt , pastExecution(w, r) → ∞ and, hence,
distanceU rgency (w, r) → 0.
4. Relative Geographic Distance. How close is resource r to work item
w compared to the closest resource that was offered w. For the closest resource
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
175
this distance is 1. In case w does not have a specific GPS location where it
should be executed, this metric returns 1 for all resources. Its definition is:

kwl −rl k

bestDistancet (w) > 0
 1 − bestDistance(w)
distanceRelative Geo (w, r) =
0
bestDistancet (w) = 0

 1
bestDistancet (w) is undef
5. Relative Past Execution. The metric chosen combines the familiarity
of a resource with a certain work item and the familiarity of other resources
that are able to execute that work item:
±
1 past execution(w, r)
distanceRelative P ast Execution (w, r) = X
.
±
1 past execution(w, r0 )
r0 ∈Res(w)
(7.2)
Let us give an informal explanation. First observe that if exactly one resource
r exists capable of performing work item w, then the equation yields one.
If n resources are available and they roughly have the same familiarity with
performing work item w, then for each of them the distance will be about
1/n. It is clear then that as n increases in value, the value of the distance
metric approaches zero. If on the other hand many resources exist that are
significantly more effective in performing w than a certain resource r, then the
value of the denominator increases even more and the value of the metric for
w and r will be closer to zero.
For instance, let us suppose that at time t̂ there are n resources capable
of performing w. Let us assume that, on average, one of them, namely r1 is
˜ Moreover, let us also assume that the
such that past execution(w, r1 ) = d.
other resources required twice this amount of time on average in the past, i.e.
˜
for each resource ri (with i > 1) past execution(w, ri ) = 2d.
In such a situation, the distance metric value for r1 is as follows:
distance(w, r1 , Relative Past Execution) =
=
=
1
past execution(w,r1 )
Pn
1
1
i=2 past execution(w,ri )
past execution(w,r1 ) +
1
˜
1
+
d˜
Pdn
1
i=2 2d˜
=
1
2
n−1 = 1 + n
1+ 2
This value is greater than 1/n, if n > 1 (i.e., there are al least two resources
that may perform w). If n = 1, then it is easy to see that the obtained value
is 1 for both.
176
CHAPTER 7. SOME COVERED RELATED TOPICS
Conversely, the value for any other resource ri (with i > 1) is as follows:
distance(w, ri , Relative Past Execution) =
=
=
1
past execution(w,ri )
Pn
1
1
i=2 past execution(w,ri )
past execution(w,r1 ) +
1
+
d˜
1
2d˜
Pn 1
i=2 2d˜
=
1/2
1
n−1 = 1 + n
1+ 2
For all n > 0, this value is smaller than
2
n+1 ,
that is the metric value for r1 .
Work-item ageing. Some of the metrics above suffer from the fact that
their values do not change over time. Therefore, if some work-items have a
small value with respect to those metrics, it is likely that there are always
other work items that have a greater value for those metrics. If resources
behave “fairly”, picking always work items that provide more benefit for the
organizations, some work-items could remain on a work list for a very long
time or even indefinitely.
Therefore, we devised a technique of ageing work-items that occur on work
lists in such a way that they eventually become the least distant work item.
Let d be any metric and χten = distanced (w, r) be the distance value when
w becomes enabled, where w, r are, respectively, a metric and resource. The
distance value with respect to metric d at time ten + t ages as follows:
χten +t = 1 − (1 − χten ) · exp−α·t
(7.3)
If t = 0, then χten +t = χten and if t → ∞ (i.e., time t increases indefinitely),
then χten +t → 1. Please note that if α = 0, then work-items do not age. The
greater value α, the more quickly Equation 7.3 approaches 1 when t increases.
Vice versa, smaller values of α make the growth of Equation 7.3 with t slower.
7.2.5
Implementation
The general framework described in the previous section has been operationalised through the development of a component that can be plugged into
the YAWL system. Section 7.2.6 gives an overview of YAWL18 , an open
source PMS developed by the Queensland University of Technology, Brisbane
(Australia), in cooperation with the Technical University of Eindhoven, The
Netherlands.
18
http://www.yawlfoundation.org
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
177
Section 7.2.7 illustrates some of the visualisation features provided by the
implementation, whereas Section 7.2.8 focusses on how the component fits
within the YAWL architecture.
7.2.6
The YAWL system
The YAWL environment is an open source PMS, based on the workflow patterns [120, 134], using a service-oriented architecture. The YAWL engine and
all other services (work list handler, web-service broker, exception handler,
etc.) communicate through XML messages.
YAWL offers the following distinctive features:
• YAWL offers comprehensive support for the control-flow patterns. It is
the most powerful process specification language for capturing controlflow dependencies.
• The data perspective in YAWL is captured through the use of XML
Schema, XPath and XQuery.
• YAWL offers comprehensive support for the resource patterns. It is the
most powerful process specification language for capturing resourcing
requirements.
• YAWL has a proper formal foundation. This makes its specifications
unambiguous and automated verification becomes possible (YAWL offers
two distinct approaches to verification, one based on Reset nets, the
other based on transition invariants through the WofYAWL editor plugin).
• YAWL has been developed independent from any commercial interests.
It simply aims to be the most powerful language for process specification.
• For its expressiveness, YAWL needs few constructs compared with other
languages, such as BPMN.
• YAWL offers unique support for exceptional handling, both those that
were and those that were not anticipated at design time.
• YAWL offers unique support for dynamic workflow through the
Worklets-approach. Workflows can thus evolve over time to meet new
and changing requirements.
• YAWL aims to be straightforward to deploy. It offers a number of automatic installers and an intuitive graphical design environment.
178
CHAPTER 7. SOME COVERED RELATED TOPICS
• Through the BPMN2YAWL component, BPMN models can be mapped
to the YAWL environment for execution.
• The Declare component (released throgh declare.sf.net) provides unique
support for specifying workflows in terms of constraints. This approach
can be combined with the Worklet approach thus providing very poewrful flexibility support.
• YAWL’s architecture is Service-oriented and hence one can replace existing components with one’s own or extend the environment with newly
developed components.
• The YAWL environments supports the automated generation of forms.
This is particularly useful for rapid prototyping purposes.
• Automated tasks in YAWL can be mapped to Web Services or to Java
programs.
• Through the C-YAWL approach a theory has been developed for the
configuration of YAWL models.19 .
• Simulation support is offered through a link with the ProM environment.20 Through this environment it is also possible to conduct postexecution analysis of YAWL processes (e.g. in order to identify bottlenecks).
The YAWL work-list handler is developed as a web application. Its graphical interface uses different tabs to show the various queues (e.g., started work
items) (see Figure 7.7). The visualisation framework can be accessed through
a newly introduced tab and is implemented as a Java Applet.
7.2.7
The User Interface
The position and distance functions represent orthogonal concepts that require
joint visualisation for every map. The position function for a map determines
where work items and resources will be placed as dots, while the distance
function will determine the colour of work items. Conceptually, work item information and resource information is split and represented in different layers.
Users can choose which layers they wish to see and in case they choose both
layers which of them should overlay the other.
19
20
http://www.processconfiguration.com
http://www.processmining.org
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
179
Figure 7.7: The YAWL work-list handler
Work-item Layer. Distances can be mapped to colours for work items
through a function colour : [0, 1] → C which associates every metric value with
a different colour in the set C. In our implementation colours range from white
to red, with intermediate shades of yellow and orange. When a resource sees
a red work item this could for example indicate that the item is very urgent,
that it is one of those most familiar to this resource, or that it is the closest
work item in terms of its geographical position. While the colour of a work
item can depend on the resource viewing it, it can also depend on which state
of the lifecycle it is in. Special colours are used to represent the various states
of the work item lifecycle and Table 7.3 provides an overview. The various
rows correspond to the various states and their visualisation. Resources can
filter work items depending on the state of items. This is achieved through the
provision of a checkbox for each of the states of Table 7.3. Several checkboxes
can be ticked. There is an additional checkbox which allows resources to see
work items that they cannot execute, but they are authorised to see.
Resources may be offered work items whose positions are the same or very
close. In such cases their visualisations may overlap and they are grouped
into a so-called “joint dot”. The diameter of a joint dot is proportional to
the number of work items involved. More precisely, the diameter D of a joint
dot is determined by D = d(1 + lg n), where d is the standard diameter of a
normal dot and n is the number of work items involved. Note that we use a
logarithmic (lg) scaling for the relative size of a composite dot.
180
CHAPTER 7. SOME COVERED RELATED TOPICS
Work item state
Created
Offered to single/multiple resource(s)
Allocated to a single resource
Started
Suspended
Failed
Completed
Colour scheme used in the work-list handler
Work item is not shown.
The colour is determined by the distance to
the resource with respect to the chosen metric.
The colour ranges from white through various
shades of yellow and orange to red.
Purple.
Black.
The same as for offered.
Grey.
Work item is not shown.
Table 7.3: Visualisation of a work item depending on its state in the life cycle.
Combining several work items int a single dot raises the question of how
the distance of this dot is determined. Four options are offered for defining
the distance of a joint dot, one can take a) the maximum of all the distances of
the work items involved, b) their minimum, c) their median, or d) their mean.
When a resource clicks on a joint dot, all work items involved are enumerated
in a list and they are coloured according to their value in terms of the distance
metric chosen.
Resource Layer. When a resource clicks on a work item the positions of
the other resources to whom this work item is offered are shown. Naturally
this is governed by authorisation privileges and by the availability of location
information for resources for the map involved.
Resource visualisation can be customised so that a resource can choose
to see a) only herself, b) all resources, or c) all resources that can perform a
certain work item. The latter option supports the case where a resource clicks
on a work item and wishes to see the locations of the other resources that can
do this work item.
7.2.8
Architectural Considerations
Figure 7.8 shows the overall architecture of the visualisation framework and
the connections with other YAWL components. Specifically, the visualisation
framework comprises:
The Visualisation Applet is the client-side applet that allows resources to
access the visualisation framework and it resides as a separate tab in the
work-list handler.
The Visualisation Designer is used by special administrators in order to
define and update maps as well as to specify the position of work items
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
181
Figure 7.8: Position of the visualisation components in the YAWL architecture.
on defined maps. Designers can define positions as fixed or as variable
through the use of XQuery. In the latter case, an XQuery expression is
defined that refers to case variables. This expression is evaluated at run
time when required.
Services is the collective name for modules providing information used to
depict maps and to place work items (e.g. URLs to locate map images,
work item positions on various maps).
The YAWL engine is at the heart of the YAWL environment. It determines
which work items are enabled and can thus be offered for execution and it
handles the data that is involved. While the YAWL engine offers a number
of external interfaces, for the visualisation component interfaces B and E are
relevant. Interface B is used, for example, by the work list handler to learn
about work items that need to be offered for execution. This interface can
also be used for starting new cases. Interface E provides an abstraction mechanism to access log information, and can thus e.g. be used to learn about past
executions of siblings of a work item. In particular one can learn how long a
certain work item remained in a certain state.
The work list handler is used by resources to access their “to-do” list. The
standard version of the work list handler provides queues containing work
items in a specific state. This component provides interface G which allows
other components to access information about the relationships between work
items and resources. For example, which resources have been offered a certain
work item or which work items are in a certain state. Naturally this component
is vital to the Visualisation Applet.
182
CHAPTER 7. SOME COVERED RELATED TOPICS
In addition to interface G, the Visualisation Applet also connects to the
Services modules through the following interfaces:
The Position Interface provides information about maps and the positioning of work items on these maps. Specifically, it returns an XQuery over
the YAWL net variables that the Visualisation Applet has to compute.
The work list handler needs to be consulted to retrieve the current values
of these variables.
The Metric Interface provides information about available metrics and
their values for specific work item - resource combinations.
The Resource Interface is used to update and retrieve information concerning positions of active resources on maps.
The visualisation framework was integrated into the standard work list handler
of YAWL through the addition of a JSP (Java Server Page).
All of the services of the visualisation framework share a repository, referred to as Visualisation Repository in Figure 7.8, which stores, among others, XQueries to compute positioning information, resource locations in various
maps, and names and URLs of maps. Services periodically retrieve log data
through Interface E in order to compute distance metric values for offered work
items. For instance, to compute the metric Relative Past Execution (Equation 7.2) for a certain resource, one can see from Equation 7.1 that information
is required about the h past executions of sibling work items performed by that
resource.
To conclude this section, we would like to stress that the approach and
implementation are highly generic, i.e., it is relatively easy to embed the visualisation framework in another PAIS.
Interface Details. The modules which are collectively named Service are
implemented as Tomcat web applications. Specifically, each interface is implemented as a web application and methods are provided as servlets, which
take inputs and return outputs as XML documents.
Figure 7.9 summarizes the methods offered by all implemented interfaces.
Although they are actually servlets and parameters XML documents, we conceptualise them as methods of classes of an object-oriented programming language.
Interface Metric provides two methods to get: 1) all available metrics
(specifically getMetrics()), which returns the list of metric names and 2) the
distance metric value for single work items (i.e., getDistance()), which takes
a work item identifier and a metric name as input and returns the value for
that metric for that work item.
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
183
Figure 7.9: Details of the interfaces provided.
Interface Resource provides two methods basically to get and set the resource position with respect to a specified map.
Finally, interface Position allows one to request information about all available maps through method getMaps(). In particular, it returns an array of
objects Map. Each object defines two properties: 1) the map name and 2) the
URL where the map can be found. Method getResourcePosition() takes
a resource identifier and a given map as input, and returns the coordinates
of such a resource on the map specified. This information is mostly what
resources themselves provide through method setResourceCoordinate() of
interface Resource. Method getWorkitemPosition() of interface Position is
very similar but operates on work items instead of resources.
None of the interfaces accesses the Visualisation Repository database directly for modularity questions. In fact, the Visualisation Repository Interface
exists solely for the purpose of masking interaction with database, namely
Visualisation Repository Interface. As the various methods are sufficiently
self-explanatory we are not providing more details.
The only thing worth mentioning is that getLastPastExecutions returns
the duration of the last h sibling work items offered within the last limitDays
days. This method is required for computing function pastExecution. In order
to return the h more recent executions, the method needs to obtain all work
items and, then, to sort them in descending order by timestamp when they
184
CHAPTER 7. SOME COVERED RELATED TOPICS
moved to the offered state (i.e., work item dimension y). Finally, the method
considers the first h work items in such a sorted listed. We foresee an initial
filtering, discarding all work items that were offered more than limitDays
days ago. If this filtering was not performed, the sorting operation could be
computationally hard, as it could involve thousands of work items. Therefore,
filtering is meant to reduce the size of the set to be sorted.
7.2.9
Example: Emergency Management
In this section we are going to illustrate a number of features of the visualisation framework by considering a potential scenario from emergency management. This scenario stems from a user requirement analysis conducted in the
context of WORKPAD [23]. Teams are sent to an area to make an assessment
of the aftermath of an earthquake. Team members are equipped with a laptop
and their work is coordinated through the use of a PMS.
The main process of workflow for assessing buildings is named Disaster
Management. The first task Assess the affected area represents a quick onthe-spot inspection to determine damage to buildings, monuments and objects.
For each object identified as worthy of further examination an instance of the
sub-process Assess every sensible object (of which we do not show the actual
decomposition for space reasons) is started as part of which a questionnaire
is filled in and photos are taken. This can be an iterative process as an evaluation is conducted to determine whether the questionnaire requires further
refinement or more photos need to be taken. After these assessments have
finished, the task Send data to the headquarters can start which involves the
collection of all questionnaires and photos and their subsequent dispatch to
headquarters. This information is used to determine whether these objects
are in imminent danger of collapsing and if so, whether this can be prevented
and how that can be achieved. Depending on this outcome a decision is made
to destroy the object or to try and restore it.
For the purposes of illustrating our framework we assume that an earthquake has occurred in the city of Brisbane. Hence a number of cases are
started by instantiating the Disaster Management workflow described above.
Each case deals with the activities of an inspection teams in a specific
zone. Figure 7.10 shows three diagrams. In each diagram, the dots refer to
work items. Figure 7.10(a) shows the main process of the Disaster Management workflow, including eight work items. Dots for work items which are
instances of the tasks Assess the affected area and Send data to the headquarter are placed on top of these tasks in this figure. Figure 7.10(b) shows the
decomposition of Assess every sensible object. Here also eight work items are
shown. No resources are shown in these diagrams. Note that on the left-hand
side is shown a list of work items that are not on the diagram. For example,
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
185
(a) Disaster Management process diagram showing 4+4=8 work items
(b) Assess the affected area sub-net also showing 8 work items
(c) Example of a timeline diagram for showing 11 work items
Figure 7.10: Examples of Process and Timeline Diagrams for Disaster Management
186
CHAPTER 7. SOME COVERED RELATED TOPICS
the eight work items shown in the diagram in Figure 7.10(a) appear in the list
of “other work items” in Figure 7.10(b).
Figure 7.10(a) uses the urgency distance metric to highlight urgent cases
while Figure 7.10(b) uses the familiarity metric to highlight cases closer to the
user in terms of earlier experiences.
As another illustration consider Figure 7.10(c) where work items are positioned according to their deadlines. This can be an important view in the
context of disaster management where saving minutes may save lives. In the
diagram shown, the x-axis represents the time remaining before a work item
expires, while the y-axis represents the case number of the case the work item
belongs to. A work item is placed at location (100 + 2 ∗ x
e, 10 + 4 ∗ ye) on that
diagram, if x
e minutes are remaining to the deadline of the work item and its
case number is ye. In this example, work items are coloured in accordance with
the popularity distance metric.
Figures 7.11 and 7.12 show some screenshots of a geographical map of the
city of Brisbane. On these diagrams, work items are placed at the location
where they should be executed. If their locations are so close that their corresponding dots overlap, a larger dot (i.e., a joint-dot) is used to represent the
work items involved and the number inside corresponds to the number of these
items. The green triangle is a representation of the resource whose work list is
visualised here. Work items for tasks Assess the affected area and Send data
to the headquarters are not shown on the diagram as they can be performed
anywhere. In this example, dots are coloured according to the familiarity distance metric. A dot that is selected as focus obtains a blue colour and further
information about the corresponding work item is shown at the bottom of the
screen (as is the case for work item Take Photos 4 in Figure 7.11(b)).
One can click on a dot and see the positions of other resources that have
been offered the corresponding work item. For example, by clicking on the
dot representing the work item Take photo 4, other resources, represented by
triangles, are shown (see Figure 7.11(b)). As for work items, overlapping triangles representing resources are combined. For examples, the larger triangle
shown in Figure 7.11(b) represents two resources.
Figure 7.12(a) shows the screen shot after clicking on the joint triangle.
A resource can thus see the list of resources associated with this triangle. By
selecting one of the resources shown in the list, the work items offered to
that resource can be seen. The colour of these work items is determined by
their value for the chosen distance metric. A zooming feature is also provided.
Figure 7.12(b) shows the result of zooming in a bit further on the diagram
of Figure 7.12(a). As can be seen, no dots nor any triangles are overlapping
anymore.
This run-time behaviour stems for some steps made by people responsible
of designing work-list visualisation through the Visualisation Designer tool.
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
187
(a) Diagram showing the geographic locations of work items and resources: the
triangle represents the resource and most work items are shown as single dots
except for the two work items that are clustered into a single dot labeled “2”
(b) Information about the selected dot (blue dot) is shown and also other
resources are shown
Figure 7.11: Examples of Geographic Diagrams for Disaster Management.
188
CHAPTER 7. SOME COVERED RELATED TOPICS
(a) When a triangle is selected, the corresponding resources and offered work
items are shown
(b) When zooming in, clustered work items and resources are separated
Figure 7.12: Further examples of Geographic Diagrams for Disaster Management.
7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS
189
(a) Assess the affected area sub-net also showing 8 work items
(b) Disaster Management process diagram showing 4+4=8 work items
Figure 7.13: Uses of the Visualisation Designer tp specify the task positions
on diagrams.
190
CHAPTER 7. SOME COVERED RELATED TOPICS
As already said, such tool allows to add and remove diagrams valuable for
participants as well as to specify the position of tasks on such diagrams. Figure 7.13(b) shows an example of how to specify dynamically task positions. A
responsible person opens the YAWL process specification which she is willing
to specify the position of the tasks of. This results in a new window is opened
(Task List window in Figure 7.13(b)), which comprises all tasks existing in
the specification. At this point, users can drag and drop tasks on the defined
maps. In this way, users are specifying static positions for tasks. Users can
define dynamic positions through a specific window (Insert Position for Task
XYZ window in Figure 7.13(b)). It allows to specify some XQueries for defining the X and Y component of the point where the corresponding task should
be positioned. These XQueries are defined on the process instance variables
and computed at run-time. Figure 7.13(b) depicts another example to define
the position of tasks on a process diagram. Specifically, the user is dragging
and dropping the desired (static) position for task Assess the affected area.
7.2.10
Final Remarks
We have proposed a general visualisation that can aid users in selecting the
“right” work item among a potentially large number of work items offered to
them. The framework uses the “diagram metaphor” to show the locations of
work items and resources. The “distance metaphor” is used to show which
work items are “close” (e.g., urgent, similar to earlier work items, or geographically close). Both concepts are orthogonal and this provides a great deal of
flexibility when it comes to presenting work to people. For example, one can
choose a geographical map to display work items and resources and use a distance metric capturing urgency. The proposed framework was operationalised
as a component of the YAWL environment. By using well-defined interfaces
the component is generic so that in principle it could be exploited by other
PMSs as well under the provision that they are sufficiently “open” and provide
the required interface methods. The component is also highly configurable,
e.g., it allows resources to choose how distances should be computed for dots
representing a number of work items and provides customizable support for
determining which resources should be visible.
Future work on this concern may go in three directions:
1. Connecting this framework and its implementation to SmartPM. The
current implementation works in concert with YAWL and a porting to
SmartPM is planned. Although that would require SmartPM to provide
all information needed, the framework is independent of any specific
Process Management System.
2. Connecting the current framework to geographical information systems
7.3. A SUMMARY
191
and process mining tools like ProM [135].
3. Geographical information systems store data based on locations and process mining can be used to extract data from event logs and visualise this
on diagrams, e.g., it is possible to make a “movie” showing the evolution
of work items based on historic data.
7.3
A summary
This chapter has introduced some research topicsrelated to the process management in pervasive scenarios. The first deals with the problem of synthesizing a process schema according to the available services and distributing the
orchestration among all of them. The second touches the topic of supporting
process participants when choosing the next task to work on among the several
ones they can be offered to. This second topic is fully available in a workflow
product, specifically YAWL.
192
CHAPTER 7. SOME COVERED RELATED TOPICS
Chapter 8
Conclusion
The topic of this thesis work was directed to process management for highly
dynamic and pervasive scenarios. Examples of pervasive scenarios include
emergency management, health-care or home automation. These scenarios
are characterised by processes that are as complex as the traditional ones
of business domains (e.g., loans, insurances). Therefore, the use of Process
Management Systems is indicated and very helpful.
Unfortunately, most of existing PMSs are intended for business scenarios
and are not completely appropriate for the settings in which we are interested.
Indeed, pervasive scenarios are turbulent a subject to an higher frequency of
unexpected contingencies with respect to business settings, where the environment is mostly static and shows a foreseeable behaviour.
Therefore, PMSs for pervasive scenarios should provide a very high degree
of operational flexibility/adaptability. In this thesis, we have given a new
definition of adaptability suitable for our intends in terms of gap. Adaptability
is the ability of the system to reduce the gap between the virtual reality, the
model of the reality used to deliberate, and the physical reality, the real-world
state with the actual values of conditions and outcomes. When the gap is so
significant that the executing process cannot be carried out, the PMS should
be able to build a proper recovery plan able to reduce such a gap so as to
allow the process to complete.
This thesis work proposes some techniques and frameworks to devise a
general recovery method able to handle any kind of exogenous event, including those which were unforeseen. When doing that, we encountered main
challenges in two directions: (i) conceiving an execution monitor able to determine when exogenous events occur and when they do not allow running
processes to terminate successfully; (ii) devising a recovery planner able to
build a plan to allow the original process to terminate successfully. For this
aim, we have “borrowed” techniques from AI, such as Situation Calculus, In193
194
CHAPTER 8. CONCLUSION
diGolog as well as Execution Monitoring in agent and robot programming. We
have applied such techniques to a different field, which required a significant
effort to conceptualize and formalize.
In order to show the feasibility of such techniques, we have conceived and
developed a proof-of-concept implementation called SmartPM by using an IndiGolog platform available. In order to make it usable in many pervasive
scenarios, such as emergency management, SmartPM needs to work in settings based on Mobile Ad-hoc Networks (manets). In order to make that
possible, we had to do some research work on topics related to mobile networking. Specifically, we developed a manet layer to enable the multi-hop
communication as well as we conceived and developed a specific technique to
predict device disconnections before the actual occurrence (so as to be able to
recover on time).
The next step on which we are currently working is to overcome the intrinsical planning inefficiency of Prolog by making use of efficient state-of-the-art
planners.
There are a number of future research directions that arise from this thesis,
and we have explained them in detail throughout this thesis itself. But, we
are willing to summarize here the most relevant ones:
1. Working on integrating SmartPM with state-of-art planners in order to
overcome the intrinsical planning inefficiency of Prolog. This step is
anything but not easy and a lot of research is still ongoing. The most
challenging issue is to convert Action Theories and IndiGolog programs in
a way they can be given as input to planners (e.g., converting to PDDL).
2. Operationalizing the approach described in Chapter 6, more efficient,
and integrating it with the framework currently implemented. Indeed,
the idea would be that SmartPM should be able to understand process
by process when the more efficient approach is applicable and, if not
applicable, it should continue using the current implemented approach.
3. Providing SmartPM with full-fledged work-list handlers to facilitate the
task distribution to human participants. We envision two two types of
work-list handler: a version for ultra mobile devices and a lighter version
for PDAs, “compact” but providing less features. First steps have been
already done in these directions. The version for ultra mobile has been
currently operationalized for a different PMS (see Section 7.2). The same
holds also for the PDA version that has been currently developed during
this thesis in the ROME4EU Process Management System, a previous
valuable attempt to deal with unexpected deviations (see [7]).
4. Working on moving the central SmartPM engine to a distributed approach, where every device contributes in the coordination of the process
195
execution. A first evaluation has been always done from a theoretical
perspective (see Section 7.1); it is not to concretely develop it and fit it
with the adaptability approach of SmartPM.
196
CHAPTER 8. CONCLUSION
Appendix A
The IndiGolog Code of the
Running Example
This appendix lists the code of the running example shown and discussed in
Chapters 4 and 5.
1
2
3
4
5
6
7
8
9
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% FILE: aPMS/pms.pl
%
% AUTHOR : Massimiliano de Leoni, Andrea Marrella,
%
Sebastian Sardina, Stefano Valentini
% TESTED : SWI Prolog 5.id_.1id_ http://www.swi-prolog.org
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
10
11
:- dynamic controller/1.
12
13
14
/* SOME DOMAIN-INDEPENDENT PREDICATES TO DENOTE THE VARIOUS OBJECTS OF
INTEREST IN THE FRAMEWORK */
15
16
17
18
/* Available services */
services([1,2,3,4,5]).
service(Srvc) :- domain(Srvc,services).
19
20
21
22
/* Tasks defined in the process specification */
tasks([TakePhoto,EvaluatePhoto,CompileQuest,Go,SendByGPRS]).
task(Task) :- domain(Task,tasks).
23
24
25
26
/* Capabilities relevant for the process of interest*/
capabilities([camera,compile,gprs,evaluation]).
capability(B) :- domain(B,capabilities).
27
197
198
28
29
30
31
32
APPENDIX A. THE CODE OF THE RUNNING EXAMPLE
/* The list of identifiers that may be used to distinguish
different istances of the same task */
task_identifiers([id_1,id_2,id_3,id_4,id_5,id_6,id_7,id_8,id_9,id_10,
id_11,id_12,id_13]).
id(Id) :- domain(Id,task_identifiers).
33
34
35
36
37
38
/* The capabilities required for each task */
required(TakePhoto,camera).
required(EvaluatePhoto,evaluation).
required(CompileQuest,compile).
required(SendByGPRS,gprs).
39
40
/The capabilities provided by each service */
41
42
43
44
45
46
47
48
49
50
provide(1,gprs).
provide(1,evaluation).
provide(2,compile).
provide(2,evaluation).
provide(2,camera).
provide(3,compile).
provide(4,evaluation).
provide(4,camera).
provide(5,compile).
51
52
53
54
/* There is nothing to do caching on
(required because cache 1 is static) */
cache(_):-fail.
55
56
57
58
59
60
61
62
/* Definition of predicate loc(i,j)
identifying the current location of a service */
gridsize(10).
gridindex(V) :gridsize(S),
get_integer(0,V,S).
location(loc(I,J)) :- gridindex(I), gridindex(J).
63
64
65
/*The definition of integer numbers
number(Srvc,M) :- get_integer(0,Srvc,M).
66
67
68
/* square(X,Y): Y is the square of X */
square(X,Y) :- Y is X * X.
69
70
71
72
73
74
/* member(ELEM,LIST): returns true if ELEM is contained in LIST */
member(ELEM,[HEAD|_]) :- ELEM=HEAD.
member(ELEM,[_|TAIL]) :- member(ELEM,TAIL).
listEqual(L1,L2) :- subset(L1,L2),subset(L2,L1).
199
75
76
77
78
79
/* Definition of predicate workitem(Task,Id,I).
It identifies a task Task with id Id and input I */
listelem(workitem(Task,Id,I)) :- id(Id), location(I),
member(Task,[Go,CompileQuest,EvaluatePhoto,TakePhoto]).
listelem(workitem(SendByGPRS,Id,input)) :- id(Id).
80
81
82
worklist([]).
worklist([ELEM | TAIL]) :- worklist(TAIL),listelem(ELEM).
83
84
/*
DOMAIN-INDEPENDENT FLUENTS */
85
86
87
88
89
90
/* Basically, there has to be some definition for predicates
causes_true and causes_false, at least one
for each. We have added the following dummy code: */
causes_true(_,_,_) :- false.
causes_false(_,_,_) :- false.
91
92
93
94
/* Indicates that list LWrk of workitems has been assigned
to service Srvc */
rel_fluent(assigned(LWrk,Srvc)) :- worklist(LWrk),service(Srvc).
95
96
97
/* assigned(LWrk,Srvc) holds after action assign(LWrk,Srvc) */
causes_val(assign(LWrk,Srvc),assigned(LWrk,Srvc),true,true).
98
99
100
101
/* assigned(LWrk,Id,Srvc) holds no longer after action
release(LWrk,Srvc) */
causes_val(release(LWrk,Srvc),assigned(LWrk,Srvc),false,true).
102
103
104
105
/* Indicates that task Task with id Id has been begun
by service Srvc */
rel_fluent(enabled(Task,Id,Srvc)) :- task(Task), service(Srvc), id(Id).
106
107
108
109
110
/* enabled(Task,Id,Srvc) becomes true if the service Srvc calls
the exogenous action readyToStart((Task,Id,Srvc), indicating the
starting of the task Task with id Id */
causes_val(readyToStart((Task,Id,Srvc),enabled(Task,Id,Srvc),true,true).
111
112
113
114
115
/* enabled(Task,Id,Srvc) holds no longer after service Srvc calls
exogenous action finishedTask(Task,Id,Srvc,V)*/
causes_val(finishedTask(Task,Id,Srvc,_),
enabled(Task,Id,Srvc),false,true).
116
117
118
119
/* free(Srvc) indicates that service Srvc
has no task currently assigned */
rel_fluent(free(Srvc)) :- service(Srvc).
120
121
/* free(Srvc) holds after action release(LWrk,Srvc) */
200
122
APPENDIX A. THE CODE OF THE RUNNING EXAMPLE
causes_val(release(_X,Srvc),free(Srvc),true,true).
123
124
125
/* free(Srvc) holds no longer after action assign(LWrk,Srvc) */
causes_val(assign(_LWrk,Srvc),free(Srvc),false,true).
126
127
/*
ACTIONS and PRECONDITIONS */
128
129
130
prim_action(assign(LWrk,Srvc)) :- worklist(LWrk),service(Srvc).
poss(assign(LWrk,Srvc), true).
131
132
133
134
prim_action(ackTaskCompletion(Task,Id,Srvc)) :task(Task), service(Srvc), id(Id).
poss(ackTaskCompletion(Task,Id,Srvc), neg(enabled(Task,Id,Srvc))).
135
136
137
138
139
140
prim_action(start(Task,Id,Srvc,I)) :listelem(workitem(Task,Id,I)), service(Srvc).
poss(start(Task,Id,Srvc,I), and(enabled(Task,Id,Srvc),
and(assigned(LWrk,Srvc),
member(workitem(Task,Id,I),LWrk)))).
141
142
143
prim_action(release(LWrk,Srvc)) :- worklist(LWrk),service(Srvc).
poss(release(LWrk,Srvc), true).
144
145
/* DOMAIN-DEPENDENT FLUENTS */
146
147
148
149
150
/* at(Srvc) indicates that service Srvc is in position P */
fun_fluent(at(Srvc)) :- service(Srvc).
causes_val(finishedTask(Task,Id,Srvc,V),at(Srvc),
loc(I,J),and(Task=Go,V=loc(I,J))).
151
152
153
154
155
156
157
rel_fluent(evaluationOK(Loc)) :- location(Loc).
causes_val(finishedTask(Task,Id,Srvc,V),evaluationOK(loc(I,J)), true,
and(Task=EvaluatePhoto,
and(V=(loc(I,J),OK),
and(photoBuild(loc(I,J),N),
N>3)))).
158
159
160
161
162
163
164
fun_fluent(photoBuild(Loc)) :- location(Loc).
causes_val(finishedTask(Task,Id,Srvc,V),photoBuild(Loc),N,
and(Task=TakePhoto,
and(V=(loc(I,J),Nadd),
and(Nold=photoBuild(Loc),
N is Nold+Nadd)))).
165
166
167
168
rel_fluent(infoSent()).
causes_val(finishedTask(Task,Id,Srvc,V),infoSent, true,
and(Task=SendByGPRS,V=OK).
201
169
170
proc(hasConnection(Srvc),hasConnectionHelper(Srvc,[Srvc])).
171
172
173
174
175
176
177
178
proc(hasConnectionHelper(Srvc,M),
or(neigh(Srvc,1),
some(n,
and(service(n),
and(neg(member(n,M)),
and(neigh(n,Srvc),
hasConnectionHelper(n,[n|M]))))))).
179
180
181
182
183
184
185
186
187
188
189
190
proc(neigh(Srvc1,Srvc2),
some(x1,
some(x2,
some(y1,
some(y2,
some(k1,
some(k2,
and(at(Srvc1)=loc(x1,y1),
and(at(Srvc2)=loc(x2,y2),
and(square(x1-x2,k1),
and(square(y1-y2,k2),sqrt(k1+k2)<7))))))))))).
191
192
/* INITIAL STATE:
*/
193
194
initially(free(Srvc),true) :- service(Srvc).
195
196
197
198
/*All services are at coodinate (0,0)
initially(at(Srvc),loc(0,0)) :- service(Srvc).
initially(at_prev(Srvc),0) :- service(Srvc).
199
200
201
initially(photoBuild(Loc),0) :- location(Loc).
initially(photoBuild_prev(Loc),0) :- location(Loc).
202
203
204
initially(evaluationOK(Loc),false) :- location(Loc).
initially(evaluationOK_prev(Loc),false) :- location(Loc).
205
206
207
initially(infoSent(),false).
initially(infoSent_prev(),false).
208
209
210
initially(enabled(X,Id,Srvc),false) :- task(X), service(Srvc), id(Id).
initially(assigned(X,Srvc),false) :- task(X), service(Srvc), id(Id).
211
212
213
initially(evaluate,false).
initially(finished,false).
214
215
/* ACTIONS EXECUTED BY SERVICES */
202
APPENDIX A. THE CODE OF THE RUNNING EXAMPLE
216
217
218
219
220
exog_action(readyToStart((T,Id,Srvc))
:- task(T), service(Srvc), id(Id).
exog_action(finishedTask(T,Id,Srvc,_V))
:- task(T), service(Srvc), id(Id).
221
222
/* PREDICATES AND ACTIONS FOR MONITORING ADAPTATION */
223
224
225
exog_action(disconnect(Srvc,loc(I,J)))
:- service(Srvc), gridindex(I), gridindex(J).
226
227
228
229
/* at(Srvc) assumes the value loc(I,J)
after exogenous action disconnect(Srvc,loc(I,J))*/
causes_val(disconnect(Srvc,loc(I,J)),at(Srvc),loc(I,J),true).
230
231
232
prim_action(A) :- exog_action(A).
poss(A,true) :- exog_action(A).
233
234
causes_val(disconnect(Srvc,L),exogenous,true,true).
235
236
237
/* Fluents in the previous situation */
fun_fluent(at_prev(Srvc)) :- service(Srvc).
238
239
fun_fluent(photoBuild_prev(Loc)) :- location(Loc).
240
241
fun_fluent(evaluationOK_prev(Loc)) :- location(Loc).
242
243
fun_fluent(infoSent_prev()).
244
245
246
247
248
249
250
251
252
causes_val(disconnect(_,_),at_prev(Srvc),X,at(Srvc)=X)
:- service(Srvc).
causes_val(disconnect(_,_),
photoBuild_prev(Srvc),X,photoBuild(Loc)=X) :- location(Loc).
causes_val(disconnect(_,_),
evaluationOK_prev(Loc),X,evaluationOK(Loc)=X) :- location(Loc).
causes_val(disconnect(_,_),
infoSent_prev(),X,infoSent()=X) :- service(Srvc).
253
254
proc(hasConnection_prev(Srvc),hasConnectionHelper_prev(Srvc,[Srvc])).
255
256
257
258
259
260
proc(hasConnectionHelper_prev(Srvc,M),
or(neigh_prev(Srvc,1),
some(n,
and(service(n),and(neg(member(n,M)),
and(neigh_prev(n,Srvc),hasConnectionHelper_prev(n,[n|M]))))))).
261
262
proc(neigh_prev(Srvc1,Srvc2),
203
263
264
265
266
267
268
269
270
271
272
some(x1,
some(x2,
some(y1,
some(y2,
some(k1,
some(k2,
and(at_prev(Srvc1)=loc(x1,y1),
and(at_prev(Srvc2)=loc(x2,y2),
and(square(x1-x2,k1),
and(square(y1-y2,k2),sqrt(k1+k2)<7))))))))))).
273
274
/* ADAPTATION DOMAIN-INDEPENDENT FEATURES */
275
276
277
prim_action(finish).
poss(finish,true).
278
279
280
rel_fluent(finished).
causes_val(finish,finished,true,true).
281
282
283
rel_fluent(exogenous).
initially(exogenous,false).
284
285
rel_fluent(adapted).
286
287
288
prim_action(resetExo).
poss(resetExo,true).
289
290
291
292
causes_val(resetExo,exogenous,false,true).
causes_val(adaptStart,adapted,false,true).
causes_val(adaptFinish,adapted,true,true).
293
294
295
296
297
prim_action(adaptFinish).
poss(adaptFinish,true).
prim_action(adaptStart).
poss(adaptStart,true).
298
299
fun_fluent(photoBuild_prev(Loc)) :- location(Loc).
300
301
fun_fluent(evaluationOK_prev(Loc)) :- location(Loc).
302
303
fun_fluent(infoSent_prev()).
304
305
306
307
308
309
proc(relevant,
and(some(Srvc,and(service(Srvc),
and(hasConnection_prev(Srvc),
neg(hasConnection(Srvc))))),
204
and(some(Loc,and(location(Srvc),
and(photoBuild_prev(Srvc)=Y,
neg(photoBuild(Srvc)=Y)))),
and(some(Loc,and(location(Srvc),
and(evaluationOK_prev(Srvc)=Z,
neg(evaluationOK(Srvc)=Z)))),
and(infoSent=W,neg(infoSent=W)))))
310
311
312
313
314
315
316
317
APPENDIX A. THE CODE OF THE RUNNING EXAMPLE
).
318
319
proc(goalReached,neg(relevant)).
320
321
322
323
324
325
proc(adapt, [adaptStart, ?(writeln(’about to adapt’)),
pconc([adaptingProgram, adaptFinish],
while(neg(adapted), [?(writeln(’waiting’)),wait]))
]
).
326
327
328
329
330
331
332
333
334
proc(adaptingProgram, searchn([?(true),searchProgram],
[assumptions([
[ assign([workitem(Task,Id,_I)],Srvc),
readyToStart((Task,Id,Srvc) ],
start(Task,Id,Srvc,I),
finishedTask(Task,Id,Srvc,I) ] ])
])
).
335
336
337
338
339
340
341
342
343
344
345
proc(searchProgram, [star(pi([t,i,n],
[ ?(isPickable([workitem(t,id_30,i)],n)),
assign([workitem(t,id_30,i)],n),
start(t,id_30,n,i),
ackTaskCompletion(t,id_30,n),
release([workitem(t,id_30,i)],n)
]
), 10),
?(goalReached)]
).
346
347
348
/* ABBREVIATIONS - BOOLEAN FUNCTIONS */
349
350
351
352
353
354
355
356
proc(isPickable(WrkList,Srvc),
or(WrkList=[],
and(free(Srvc),
and(WrkList=[A|TAIL],
and(listelem(A),
and(A=workitem(Task,_Id,_I),
and(isExecutable(Task,Srvc),
205
isPickable(TAIL,Srvc))))))
357
)
358
359
).
360
361
362
363
proc(isExecutable(Task,Srvc),
and(findall(Capability,required(Task,Capability),A),
and(findall(Capability,provide(Srvc,Capability),C),subset(A,C)))).
364
365
366
% Translations of domain actions to real actions (one-to-one)
actionNum(X,X).
367
368
/* PROCEDURES FOR HANDLING THE TASK LIFE CYCLES */
369
370
371
proc(manageAssignment(WrkList),
[atomic([pi(n,[?(isPickable(WrkList,n)), assign(WrkList,n)])])]).
372
373
374
proc(manageExecution(WrkList),
pi(n,[?(assigned(WrkList,n)=true),manageExecutionHelper(WrkList,n)])).
375
376
proc(manageExecutionHelper([],Srvc),[]).
377
378
379
380
proc(manageExecutionHelper([workitem(Task,Id,I)|TAIL],Srvc),
[start(Task,Id,Srvc,I), ackTaskCompletion(Task,Id,Srvc),
manageExecutionHelper(TAIL,Srvc)]).
381
382
383
proc(manageTermination(WrkList),
[atomic([pi(n,[?(assigned(WrkList,n)=true), release(X,n)])])]).
384
385
386
387
388
proc(manageTask(WrkList),
[manageAssignment(WrkList),
manageExecution(WrkList),
manageTermination(WrkList)]).
389
390
391
/* MAIN PROCEDURE FOR INDIGOLOG */
392
393
394
proc(main,
proc(main,
mainControl(N)) :- controller(N), !.
mainControl(3)). % default one
395
396
397
398
399
400
proc(mainControl(5), prioritized_interrupts(
[interrupt(and(neg(finished),exogenous), monitor),
interrupt(true, [process,finish]),
interrupt(neg(finished), wait)
])).
401
402
403
proc(monitor,[?(writeln(’Monitor’)),
206
404
405
406
407
408
APPENDIX A. THE CODE OF THE RUNNING EXAMPLE
ndet(
[?(neg(relevant)),?(writeln(’NonRelevant’))],
[?(relevant),?(writeln(’Relevant’)),adapt]
), resetExo
]).
409
410
411
412
413
414
415
416
417
418
419
proc(branch(Loc),
while(neg(evaluationOk(Loc)),
[
manageTask([workitem(CompileQuest,id_1,Loc)]),
manageTask([workitem(Go,id_1,Loc),
Workitem(TakePhoto,id_2,Loc)]),
manageTask([workitem(EvaluatePhoto,id_1,Loc)]),
]
)
).
420
421
422
423
424
425
proc(process,
[rrobin([branch(loc(2,2),branch(loc(3,5)),branch(loc(4,4)))]),
manageTask([workitem(SendByGPRS,id_29,input)])
]
).
426
427
428
429
430
% Translations of domain actions to real actions (one-to-one)
actionNum(X,X).
Bibliography
[1] M. Adams, A. H. M. ter Hofstede, W. M. P. van der Aalst, and D. Edmond. Dynamic, extensible and context-aware exception handling for
workflows. In On the Move to Meaningful Internet Systems 2007:
CoopIS, DOA, ODBASE, GADA, and IS Proceedings, Part I, volume
4803 of Lecture Notes in Computer Science, pages 95–112. Springer,
2007.
[2] I. Akyildiz, J. S. M. Ho, and Y. B. Lin. Movement-based Location Update and Selective Paging for PCS Networks. IEEE/ACM Transactions
on Networking, 4(4):629–638, 1996.
[3] K. Andresen and N. Gronau. An Approach to Increase Adaptability
in ERP Systems. In Managing Modern Organizations with Information
Technology: Proceedings of the 2005 Information Resources Management Association International Conference, pages 883–885. Idea Group
Publishing, May 2005.
[4] J. Baier and S. McIlraith. On Planning with Programs that Sense. In
KR’06: Proceedings of the 10th International Conference on Principles
of Knowledge Representation and Reasoning, pages 492–502, Lake District, UK, June 2006. AAAI Press.
[5] J. A. Baier, C. Fritz, and S. A. McIlraith. Exploiting Procedural Domain Control Knowledge in State-of-the-Art Planners. In Proceedings
of the International Conference on Automated Planning and Scheduling
(ICAPS), pages 26–33. AAAI Press, 2007.
[6] S. Basagni, I. Chlamtac, V. R. Syrotiuk, and B. A. Woodward. A distance routing effect algorithm for mobility (DREAM). In MobiCom ’98:
Proceedings of the 4th annual ACM/IEEE international conference on
Mobile computing and networking, pages 76–84. ACM, 1998.
[7] D. Battista, M. de Leoni, A. Gaetanis, M. Mecella, A. Pezzullo,
A. Russo, and C. Saponaro. ROME4EU: A Web Service-Based ProcessAware System for Smart Devices. In ICSOC ’08: Proceedings of the 6th
207
208
BIBLIOGRAPHY
International Conference on Service-Oriented Computing, pages 726–
727. Springer-Verlag, 2008.
[8] B. Benatallah, M. Dumas, and Q. Sheng. Facilitating the rapid development and scalable orchestration of composite web services. Distributed
and Parallel Databases, 17, 2005.
[9] D. Berardi, D. Calvanese, G. De Giacomo, R. Hull, and M. Mecella.
Automatic composition of transition-based semantic web services with
messaging. In Proc. VLDB 2005, 2005.
[10] D. Berardi, D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Mecella.
Automatic Service Composition Based on Behavioural Descriptions. International Journal of Cooperative Information Systems, 14(4):333–376,
2005.
[11] D. Berardi, D. Calvanese, G. De Giacomo, and M. Mecella. Composing
web services with nondeterministic behavior. In Proc. ICWS 2006, 2006.
[12] P. Berens. Process-Aware Information Systems, chapter Case handling
with FLOWer: Beyond workflow. John Wiley & Sons, 2005.
[13] J. O. Berger.
Springer, 1985.
Statistical Decision Theory and Bayesian Analysis.
[14] G. Bertelli, M. de Leoni, M. Mecella, and J. Dean. Mobile Ad hoc Networks for Collaborative and Mission-critical Mobile Scenarios: a Practical Study. In WETICE’08: Proceedings of the 17th IEE International
Workshops on Enabling Technologies: Infrastructure for collaboration
enterprises, pages 157–152. IEEE Publisher, 2008.
[15] R. Bobrik, M. Reichert, and T. Bauer. View-based process visualization.
In Proceedings of the 5th International Conference on Business Process
Management BPM 2007, volume 4714 of LNCS, pages 88–95. Springer,
2007.
[16] A. Borgida and T. Murata. Tolerating exceptions in workflows: a unified framework for data and processes. In WACC ’99: Proceedings of
the international joint conference on Work activities coordination and
collaboration, pages 59–68. ACM, 1999.
[17] R. Brown and H.-Y. Paik. Resource-centric worklist visualisation. In
Proceedings of OTM Confederated International Conferences, CoopIS,
DOA, and ODBASE 2005, volume 3760 of LNCS, pages 94–111.
Springer, 2005.
BIBLIOGRAPHY
209
[18] D. Calvanese, G. De Giacomo, M. Lenzerini, M. Mecella, and F. Patrizi.
Automatic Service Composition and Synthesis: the Roman Model. IEEE
Data Engineering Bulletin, 31(3):18–22, 2008.
[19] F. Casati, S. Ceri, B. Pernici, and G. Pozzi. Workflow Evolution. Data
& Knowledge Engineering, 24(3):211–238, 1998.
[20] F. Casati, S. Ilnicki, L. jie Jin, V. Krishnamoorthy, and M.-C. Shan.
Adaptive and Dynamic Service Composition in eFlow. In CAiSE2000:
Proceedings of 12th International Conference Advanced Information Systems Engineering, volume 1789 of Lecture Notes in Computer Science,
pages 13–31. Springer, 2000.
[21] T. Catarci, F. Cincotti, M. de Leoni, M. Mecella, and G. Santucci.
Smart homes for all: Collaborating services in a for-all architecture
for domotics. In CollaborateCom’08: Proc. of The 4th International
Conference on Collaborative Computing: Networking, Applications and
Worksharing. ACM Press, 2009. To appear.
[22] T. Catarci, M. de Leoni, F. De Rosa, M. Mecella, A. Poggi, S. Dustdar, L. Juszczyk, H. Truong, and G. Vetere. The WORKPAD P2P
Service-Oriented Infrastracture for Emergency Management. In WETICE ’07: Proceedings of the 16th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, Washington, DC, USA, 2007. IEEE Computer Society.
[23] T. Catarci, M. de Leoni, A. Marrella, M. Mecella, G. Vetere, B. Salvatore, S. Dustdar, L. Juszczyk, A. Manzoor, and H.-L. Truong. Pervasive
Software Environments for Supporting Disaster Responses. IEEE Internet Computing, 12(1):26–37, 2008.
[24] T. Catarci, F. De Rosa, M. de Leoni, M. Mecella, M. Angelaccio,
S. Dustdar, A. Krek, G. Vetere, Z. M. Zalis, B. Gonzalvez, and G. Iiritano. WORKPAD: 2-Layered Peer-to-Peer for Emergency Management
through Adaptive Processes. In CollaborateCom 2006: Proceedings of
the 2nd International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE Computer Society, 2006.
[25] G. Chafle, S. Chandra, V. Mann, and M. G. Nanda. Decentralized orchestration of composite web services. In Proc. WWW 2004 – Alternate
Track Papers & Posters, 2004.
[26] D. Chiu, Q. Li, , and K. Karlapalem. A logical framework for exception handling in ADOME workflow management system. In CAiSE2000:
210
BIBLIOGRAPHY
Proceedings of 12th International Conference Advanced Information Systems Engineering, volume 1789 of Lecture Notes in Computer Science,
pages 110–125. Springer, 2000.
[27] Cosa GmbH. COSA BPM product description. http://www.cosa.
de/project/docs/en/COSA57-Productdescription.pdf, July 2008.
Prompted on 1 February, 2009.
[28] F. D’Aprano, M. de Leoni, and M. Mecella. Emulating mobile ad-hoc
networks of hand-held devices: the octopus virtual environment. In
MobiEval ’07: Proceedings of the 1st international workshop on System
evaluation for mobile platforms, pages 35–40, New York, NY, USA, 2007.
ACM.
[29] G. De Giacomo, Y. Lespérance, H. J. Levesque, and S. Sardina. IndiGolog: A High-Level Programming Language for Embedded Reasoning
Agents, chapter in Multi-Agent Programming: Languages, Platforms and
Applications. Rafael H. Bordini, Mehdi Dastani, Jürgen Dix, Amal El
Fallah-Seghrouchni (Eds.). Springer, 2009. To appear.
[30] G. De Giacomo and H. J. Levesque. An incremental interpreter for highlevel programs with sensing. In H. J. Levesque and F. Pirri, editors,
Logical Foundations for Cognitive Agents: Contributions in Honor of
Ray Reiter, pages 86–102. Springer, Berlin, 1999.
[31] G. De Giacomo, R. Reiter, and M. Soutchanski. Execution Monitoring
of High-Level Robot Programs. In KR’98: Proceedings of the Sixth International Conference on Principles of Knowledge Representation and
Reasoning, pages 453–465, 1998.
[32] G. De Giacomo and S. Sardina. Automatic synthesis of new behaviors
from a library of available behaviors. In IJCAI’07: Proceedings of 20th
International Joint Conference on Artificial Intelligence, pages 1866–
1871, Hyderabad, India, 2007.
[33] M. de Leoni, G. De Giacomo, Y. Lespérance, and M. Mecella. On-line
Adaptation of Sequential Mobile Processes Running Concurrently. In
SAC ’09: Proceedings of the 2009 ACM Symposium on Applied Computing. ACM Press, 2009. To appear.
[34] M. de Leoni, F. De Rosa, S. Dustdar, and M. Mecella. Resource disconnection management in MANET driven by process time plan. In
Autonomics ’07: Proceedings of the 1st ACM/ICST International Conference on Autonomic Computing and Communication Systems. ACM,
2007.
BIBLIOGRAPHY
211
[35] M. de Leoni, F. De Rosa, A. Marrella, A. Poggi, A. Krek, and F. Manti.
Emergency Management: from User Requirements to a Flexible P2P
Architecture. In B. Van de Walle, P. Burghardt, and C. Nieuwenhuis,
editors, Proceedings of the 4th International Conference on Information
Systems for Crisis Response and Management ISCRAM2007, 2007.
[36] M. de Leoni, F. De Rosa, and M. Mecella. MOBIDIS: A Pervasive Architecture for Emergency Management. In WETICE ’06: Proceedings
of the 15th IEEE International Workshops on Enabling Technologies:
Infrastructure for Collaborative Enterprises, pages 107–112. IEEE Computer Society, 2006.
[37] M. de Leoni, S. Dustdar, and A. H. M. ter Hofstede. Introduction to the
1st International Workshop on Process Management for Highly Dynamic
and Pervasive Scenarios (PM4HDPS’08). In BPM 2008 Workshops, volume 17 of LNBIP, pages 241–243. Springer-Verlag, 2009.
[38] M. de Leoni, S. R. Humayoun, M. Mecella, and R. Russo. A Bayesian
Approach for Disconnection Management in Mobile Ad Hoc Networks.
Ubiquitous Computing and Communication Journal, CPE, March 2008.
[39] M. de Leoni, A. Marrella, M. Mecella, S. Valentini, and S. Sardina. Coordinating Mobile Actors in Pervasive and Mobile Scenarios: An AI-based
Approach. In WETICE’08: Proceedings of the 17th IEE International
Workshops on Enabling Technologies: Infrastructure for collaboration
enterprises, pages 82–88. IEEE Publisher, 2008.
[40] M. de Leoni, M. Mecella, and G. De Giacomo. Highly Dynamic Adaptation in Process Management Systems Through Execution Monitoring.
In BPM’07: Proceedings of the 5th Internation Conference on Business
Process Management, volume 4714 of Lecture Notes in Computer Science, pages 182–197. Springer, 2007.
[41] M. de Leoni, M. Mecella, and R. Russo. A bayesian approach for
disconnection management in mobile ad hoc networks. In WETICE
’07: Proceedings of the 16th IEEE International Workshops on Enabling
Technologies: Infrastructure for Collaborative Enterprises, pages 62–67,
Washington, DC, USA, 2007. IEEE Computer Society.
[42] M. de Leoni, W. M. P. van der Aalst, and A. H. M. ter Hofstede. Visual
Support for Work Assignment in Process-Aware Information Systems.
In Proceedings of the 6th International Conference on Business Process
Management (BPM’08), Milan, Italy, September 2-4, volume 5240 of
Lecture Notes in Computer Science. Springer, 2008.
212
BIBLIOGRAPHY
[43] J. Dehnert and P. Rittgen. Relaxed Soundness of Business Processes. In
Proceedings of 19th International Conference on Advanced Information
Systems Engineering, 19th International, volume 2068 of Lecture Notes
in Computer Science, pages 157–170. Springer, 2001.
[44] Y. Dong and Z. Shen-sheng. Approach for workflow modeling using
pi-calculus. Journal of Zhejiang University SCIENCE, 4(6):643–650,
November 2003.
[45] O. V. Drugan, T. Plagemann, and E. Munthe-Kaas. Non-intrusive neighbor prediction in sparse manets. In SECON 2007: Proceedings of the
Fourth Annual IEEE Communications Society Conference on Sensor,
Mesh and Ad Hoc Communications and Networks, pages 172–182. IEEE,
2007.
[46] M. Dumas, W. M. P. van der Aalst, and A. H. M. ter Hofstede. ProcessAware Information Systems: Bridging People and Software Through
Process Technology. Wiley, September 2005.
[47] W. Feler. An Introduction to Probability Theory and its Applications.
Willey, 2nd edition, 1971.
[48] J. Flynn, H. Tewari, D., and O’Mahony. Jemu: A Real Time Emulation
System for Mobile Ad Hoc Networks. In Proceedings of 1st Joint IEI/IEE
Symposium on Telecommunications Systems Research, 2001.
[49] D. Fox, J. Hightower, L. Lao, D. Schulz, and G. Borriello. Bayesian
Filters for Location Estimation. IEEE Pervasive Computing, 2(3):24 –
33, 2003.
[50] M. Fox and D. Long. PDDL2.1: An Extension to PDDL for Expressing
Temporal Planning Domains. Journal of Artificial Intelligence Research
(JAIR), 20:61–124, 2003.
[51] C. Fritz, J. A. Baier, and S. A. McIlraith. ConGolog, Sin Trans: Compiling ConGolog into Basic Action Theories for Planning and Beyond.
In KR2008: Proceedings of the Eleventh International Conference on
Principles of Knowledge Representation and Reasoning, pages 600–610.
AAAI Press, 2008.
[52] M. Ghallab, D. Nau, and P. Traverso. Automated Planning: Theory and
Practice. Morgan Kaufmann Publishers, May 2004.
[53] G. D. Giacomo, M. de Leoni, M. Mecella, and F. Patrizi. Automatic
Workflows Composition of Mobile Services. In ICWS’07: Proceedings of
BIBLIOGRAPHY
213
the 2007 IEEE International Conference on Web Services, pages 823–
830. IEEE Computer Society, 2007.
[54] K. Göser, M. Jurisch, H. Acker, U. Kreher, M. Lauer, S. Rinderle,
M. Reichert, and P. Dadam. Next-generation Process Management
with ADEPT2. In Proceedings of the BPM Demonstration Program
at the Fifth International Conference on Business Process Management
(BPM’07), volume 272 of CEUR Workshop Proceedings. CEUR-WS.org,
2007.
[55] L. Guibas and J. Stolfi. Primitives for the manipulation of general subdivisions and the computation of voronoi. ACM Trans. Graph., 4(2):74–
123, 1985.
[56] C. W. Günther, M. Reichert, and W. M. van der Aalst. Supporting Flexible Processes with Adaptive Workflow and Case Handling. In WETICE’08: Proceedings of the 17th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises, 2008.
[57] P. Gupta and P. R. Kumar. The capacity of wireless networks. IEEE
Transactions on Information Theory, IT-46(2):388–404, March 2000.
[58] D. Hadaller, S. Keshav, T. Brecht, and S. Agarwal. Vehicular opportunistic communication under the microscope. In MobiSys ’07: Proceedings of the 5th international conference on Mobile systems, applications
and services, pages 206–219. ACM Press, 2007.
[59] C. Hagen and G. Alonso. Exception handling in workflow management
systems. IEEE Transactions on Software Engineering, 26(10):943–958,
October 2000.
[60] G. Hansen. Automated Business Process Reengineering: Using the
Power of Visual Simulation Strategies to Improve Performance and
Profit. Prentice-Hall, Englewood Cliffs, 1997.
[61] D. Harel, D. Kozen, and J. Tiuryn. Dynamic Logic. The MIT Press,
2000.
[62] G. Harik, E. Cantú-Paz, D. E. Goldberg, and B. L. Miller. The gambler’s ruin problem, genetic algorithms, and the sizing of populations.
Evolutionary Computation, 7(3):231–253, 1999.
[63] S. L. Hickmott, J. Rintanen, S. Thiébaux, and L. B. White. Planning
via petri net unfolding. In IJCAI 2007: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 1904–1911.
AAAI Press, 2007.
214
BIBLIOGRAPHY
[64] J. Hidders, M. Dumas, W. M. P. van der Aalst, A. H. M. ter Hofstede,
and J. Verelst. When are two workflows the same? In CATS ’05: Proceedings of the 2005 Australasian symposium on Theory of computing,
pages 3–11. Australian Computer Society, Inc., 2005.
[65] R. Hull and J. Su. Tools for design of composite web services. In Proc.
SIGMOD 2004, pages 958–961, 2004.
[66] S. R. Humayoun, T. Catarci, M. de Leoni, A. Marrella, M. Mecella,
M. Bortenschlager, and R. Steinmann. Designing Mobile Systems in
Highly Dynamic Scenarios. The WORKPAD Methodology. Journal on
Knowledge, Technology & Policy, 2009. To appear.
[67] S. R. Humayoun, T. Catarci, M. de Leoni, A. Marrella, M. Mecella,
M. Bortenschlager, and R. Steinmann. The workpad user interface and
methodology: Developing smart and effective mobile applications for
emergency operators. In HCI International 2009: Proceedings of 13th
International Conference on Human-Computer Interaction, volume 5616
of Lecture Notes in Computer Science. Springer, 2009.
[68] IBM Inc.
An introduction to WebSphere Process Server and
WebSphere Integration Developer. ftp://ftp.software.ibm.com/
software/integration/wps/library/WSW14021-US%EN-01.pdf, May
2008. Prompted on 1 February, 2009.
[69] A. Jardosh, E. M. Belding-Royer, K. C. Almeroth, and S. Suri. Towards
realistic mobility models for mobile ad hoc networks. In MobiCom ’03:
Proceedings of the 9th annual international conference on Mobile computing and networking, pages 217–229. ACM Press, 2003.
[70] K. Jensen. Coloured Petri Nets; Basic Concepts, Analysis Methods and
Practical Use. Springer, 2nd edition, 1997.
[71] D. B. Johnson, D. A. Maltz, and J. Broch. DSR: The Dynamic Source
Routing Protocol for Multi-Hop Wireless Ad Hoc Networks. In C. E.
Perkins, editor, Ad Hoc Networking, pages 139–172. Addison-Wesley,
2001.
[72] B. Kiepuszewski, A. H. M. t. Hofstede, and C. Bussler. On Structured
Workflow Modelling. In CAiSE ’00: Proceedings of the 12th International Conference on Advanced Information Systems Engineering, pages
431–445, London, UK, 2000. Springer-Verlag.
[73] M. Kinateder.
Sap advanced workflow techniques.
https:
//www.sdn.sap.com/irj/servlet/prt/portal/prtroot/docs/
BIBLIOGRAPHY
library/uui%d/82d03e23-0a01-0010-b482-dccfe1c877c4,
Prompted on 1 February, 2009.
215
2006.
[74] E. Kindler. On the semantics of EPCs: Resolving the vicious circle.
Data & Knowledge Engineering, 56(1):23–40, 2006.
[75] Y.-B. Ko and N. H. Vaidya. Location-Aided Routing (LAR) in Mobile
Ad hoc Networks. Wireless Networks, 6:307––321, 2000.
[76] R. A. Kowalski. Using meta-logic to reconcile reactive with rational
agents. Meta-logics and logic programming, pages 227–242, 1995.
[77] A. Kumar, W. Aalst, and H. Verbeek. Dynamic Work Distribution in
Workflow Management Systems: How to Balance Quality and Performance? Journal of Management Information Systems, 18(3):157–193,
2002.
[78] O. Kupferman and M. Y. Vardi. Synthesizing distributed systems. In
Proc. of LICS 2001, page 389, 2001.
[79] B. Kusy, J. Sallai, G. Balogh, A. Ledeczi, V. Protopopescu, J. Tolliver,
F. DeNap, and M. Parang. Radio interferometric tracking of mobile
wireless nodes. In MobiSys ’07: Proceedings of the 5th international
conference on Mobile systems, applications and services, pages 139–151.
ACM Press, 2007.
[80] U. Kuter, E. Sirin, D. Nau, B. Parsia, and J. Hendler. Information gathering during planning for web service composition. In Proc. Workshop
on Planning and Scheduling for Web and Grid Services, 2004.
[81] M. La Rosa, M. Dumas, A. H. M. ter Hofstede, J. Mendling, and
F. Gottschalk. Beyond Control-Flow: Extending Business Process Configuration to Roles and Objects. In ER 2008: Proceedings of 27th International Conference on Conceptual Modeling, volume 5231 of Lecture
Notes in Computer Science, pages 199–215, 2008.
[82] M. Lankhorst. Enterprise Architecture at Work: Modelling, Communication, and Analysis. Springer, 2005.
[83] Y. Lespérance and H.-K. Ng. Integrating Planning into Reactive HighLevel Robot Programs. In Proceedings of the Second International Cognitive Robotics Workshop (in conjunction with ECAI 2000), pages 49–54,
August 2000.
[84] J. Li, C. Blake, D. S. J. De Couto, H. I. Lee, and R. Morris. Capacity
of Ad Hoc Wireless Networks. In Proc. 7th International Conference
216
BIBLIOGRAPHY
on Mobile Computing and Networking (MOBICOM 2001), pages 61–69,
2001.
[85] B. Liang and Z. J. Haas. Predictive Distance-based Mobility Management for Multidimensional PCS Networks. IEEE/ACM Transactions on
Networking, 11(5):718–732, 2003.
[86] P. Luttighuis, M. Lankhorst, R. Wetering, R. Bal, and H. Berg. Visualising business processes. Computer Languages, 27(1/3):39–59, 2001.
[87] L. T. Ly, S. Rinderle, and P. Dadam. Integration and verification of
semantic constraints in adaptive process management systems. Data &
Knowledge Engineering, 64(1):3–23, 2008.
[88] J. Macker, W. Chao, and J. Weston. A low-cost, IP-based Mobile Network Emulator (MNE). In MILCOM’03: Proceedings of the Military
Communications Conference, volume 1, pages 481– 486. IEEE Press,
2003.
[89] P. Mahadevan, A. Rodriguez, D. Becker, and A. Vahdat. Mobinet: a
scalable emulation infrastructure for ad hoc and wireless networks. In
WiTMeMo ’05: The 2005 workshop on Wireless traffic measurements
and modeling, pages 7–12, Berkeley, CA, USA, 2005. USENIX Association.
[90] D. Mahrenholz and S. Ivanov. Real-time network emulation with ns-2.
In DS-RT ’04: Proceedings of the 8th IEEE International Symposium on
Distributed Simulation and Real-Time Applications, pages 29–36. IEEE
Computer Society, 2004.
[91] B. S. Manoj and A. Hubenko Baker. Communication Challenges in
Emergency Response. Communincation of ACM, 50(3):51–53, 2007.
[92] D. V. McDermott. The 1998 AI Planning Systems Competition. AI
Magazine, 21(2):35–55, 2000.
[93] S. A. McIlraith and T. C. Son. Adapting Golog for composition of
semantic web services. In Proc. KR 2002, pages 482–496, 2002.
[94] M. Mecella and B. Pernici. Cooperative information systems based on
a service oriented approach. Journal of Interoperability in Business Information Systems, 1(3), 2006.
[95] B. Medjahed, A. Bouguettaya, and A. K. Elmagarmid. Composing web
services on the semantic web. VLDB Journal, 12(4):333 – 351, 2003.
BIBLIOGRAPHY
217
[96] J. Mendling and W. M. van der Aalst. Formalization and Verification
of EPCs with OR-Joins Based on State and Context. In Proceedings of
19th International Conference on Advanced Information Systems Engineering, 19th International, volume 4495 of Lecture Notes in Computer
Science. Springer, 2007.
[97] J. Mendling, H. M. W. Verbeek, B. F. van Dongen, W. M. P. van der
Aalst, and G. Neumann. Detection and prediction of errors in EPCs of
the SAP reference model. Data & Knowledge Engineering, 64(1):312–
329, 2008.
[98] S. Menotti. SPIDE A Smart Process IDE for Emergency Operators.
Master’s thesis, Faculty of Computer Engineering - SAPIENZA Università di Roma, 2008. Supervisor: Dr. Massimo Mecella. In English.
[99] R. Milner. A Calculus of Communicating Systems, volume 92 of Lecture
Notes in Computer Science. Springer, 1980.
[100] R. Milner. Communication and Concurrency. Prentice Hall, Inc., Upper
Saddle River, NJ, USA, 1989.
[101] R. Müller, U. Greiner, and E. Rahm. AGENTWORK: a workflow system supporting rule-based workflow adaptation. Data & Knowledge Engineering, 51(2):223–256, 2004.
[102] A. L. Murphy, G. P. Picco, and G. C. Roman. LIME: A Coordination
Model and Middleware Supporting Mobility of Hosts and Agents. ACM
Transactions on Software Engineering and Methodologies, 15(3):279 –
328, 2006.
[103] D. Niculescu and B. Nath. Position and Orientation in ad hoc Networks.
Elsevier Journal of Ad Hoc Networks, 2(2):133–151, April 2004.
[104] R. P. P. Nikitin and D. Stancil. Efficient Simulation of Ricean Fading
within a Packet Simulator. In Proceedings of the 51st Vehicular Technology Conference, pages 764–767. IEEE, 2000.
[105] Object Management Group.
Business Process Modeling Notation.
http://www.bpmn.org/Documents/OMG%20Final%20Adopted%
20BPMN%201-0%20Spec%%2006-02-01.pdf, February 2006. Prompted
on 16 February, 2009.
[106] S. Papanastasiou, L. M. Mackenzie, M. Ould-Khaoua, and V. Charissis. On the interaction of TCP and Routing Protocols in MANETs.
In AICT-ICIW ’06: Proceedings of the Advanced Int’l Conference on
218
BIBLIOGRAPHY
Telecommunications and Int’l Conference on Internet and Web Applications and Services, page 62, Washington, DC, USA, 2006. IEEE Computer Society.
[107] C. E. Perkins and P. Bhagwat. Highly Dynamic Destination-Sequenced
Distance-Vector Routing (DSDV) for Mobile Computers. In Proc. SIGCOMM 94, 1994.
[108] C. Petri. Communication with Automata. PhD thesis, Insitut für Instrumentelle Mathematik - Universität Bonn, 1962.
[109] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In
Proc. POPL 1989, pages 179–190, 1989.
[110] A. Pnueli and R. Rosner. Distributed reactive systems are hard to synthesize. In Proc. of FOCS 1990, pages 746–757, 1990.
[111] J. L. Pollock. The logical foundations of goal-regression planning in
autonomous agents. Artificial Intelligence, 106(2):267–334, 1998.
[112] T. V. Project.
The ns Manual.
ns-documentation.html, 01 2009.
http://isi.edu/nsnam/ns/
[113] F. Puhlmann. Soundness Verification of Business Processes Specified in
the Pi-Calculus. In On the Move to Meaningful Internet Systems 2007:
CoopIS, DOA, ODBASE, GADA, and IS Proceedings, Part I, volume
4803 of Lecture Notes in Computer Science, pages 6–23. Springer, 2007.
[114] F. Puhlmann and M. Weske. Using the π-calculus for Formalizing Workflow Patterns. In Proceedings of the 3rd International Conference on
Business Process Management, BPM 2006, volume 3649 of Lecture Notes
in Computer Science, pages 153–168. Springer, 2005.
[115] M. Puzar and T. Plagemann. NEMAN: A Network Emulator for Mobile
Ad-Hoc Networks. In ConTel 2005: Proceedings of the 8th International
Conference on Telecommunications, pages 155–161. IEEE Press, June
2005.
[116] M. Qin, R. Zimmermann, and L. S. Liu. Supporting multimedia streaming between mobile peers with link availability prediction. In MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference
on Multimedia, pages 956–965, New York, NY, USA, 2005. ACM.
[117] M. Reichert and P. Dadam. ADEPTflex - Supporting Dynamic Changes
of Workflows Without Losing Control. Journal of Intelligent Information
Systems (JIIS), 10(2):93–129, 1998.
BIBLIOGRAPHY
219
[118] M. Reichert, S. Rinderle, U. Kreher, and P. Dadam. Adaptive Process Management with ADEPT2. In ICDE ’05: Proceedings of the 21st
International Conference on Data Engineering, pages 1113–1114. IEEE
Computer Society, 2005.
[119] R. Reiter. Knowledge in Action: Logical Foundations for Specifying and
Implementing Dynamical Systems. MIT Press, September 2001.
[120] N. Russell, W. M. P. van der Aalst, A. H. M. ter Hofstede, and D. Edmond. Workflow resource patterns: Identification, representation and
tool support. In Proceedings of 17th International Conference CAiSE
2005, volume 3520 of LNCS, pages 216–232. Springer, 2005.
[121] S. Sardina, G. De Giacomo, Y. Lespérance, and H. J. Levesque. On the
Semantics of Deliberation in Indigolog—from Theory to Implementation. Annals of Mathematics and Artificial Intelligence, 41(2-4):259–299,
2004.
[122] S. Sardiña, F. Patrizi, and G. De Giacomo. Automatic synthesis of a
global behavior from multiple distributed behaviors. In AAAI 2007:
Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, pages 1063–1069. AAAI Press, 2007.
[123] H. Schonenberg, R. Mans, N. Russell, N. Mulyar, and W. M. P. van der
Aalst. Towards a Taxonomy of Process Flexibility. In Proceedings of the
Forum at the CAiSE’08 Conference, volume 344 of CEUR Workshop
Proceedings, pages 81–84. CEUR-WS.org, 2008.
[124] A. Streit, B. Pham, and R. Brown. Visualization support for managing
large business process specifications. In Proceedings of the 3rd International Conference on Business Process Management BPM 2005, volume
3649 of LNCS, pages 205–219. Springer, 2005.
[125] G. Tagni, A. ten Teije, and F. van Harmelen. Reasoning about repairability of workflows at design time. In Proceedings of the 1st International Workshop on QoS in Self-healing Web Services (QSWS-08), in
conjunction with BPM 2008 6th International Conference on Business
Process Management (BPM 2008), Lecture Notes in Computer Science.
Springer, 2009.
[126] T. O. Team. OMNET++ User Manual. http://www.omnetpp.org/
doc/manual/usman.html, 2006.
[127] Tibco Software Inc.
Introduction to TIBCO iProcess Suite.
www.tibco.com/resources/software/bpm/tibco\_iprocess\
_suite\_whitepaper%.pdf, 2008. Prompted on 1 February, 2009.
220
BIBLIOGRAPHY
[128] P. Traverso and M. Pistore. Automated composition of semantic web
services into executable processes. In Proc. ISWC 2004, volume 3298 of
LNCS, pages 380–394. Springer, 2004.
[129] W. M. van der Aalst and P. Berens. Beyond Workflow Management:
Product-Driven Case Handling. In GROUP2001: Proceedings of the
International ACM SIGGROUP Conference on Supporting Group Work,
pages 42–51. ACM Press, 2001.
[130] W. M. van der Aalst, M. Weske, and D. Grünbauer. Case Handling:
A New Paradigm for Business Process Support. Data and Knowledge
Engineering, 53:129–162, 2005.
[131] W. M. P. van der Aalst. The application of petri nets to workflow
management. Journal of Circuits, Systems, and Computers, 8(1):21–66,
1998.
[132] W. M. P. van der Aalst. Workflow verification: Finding control-flow
errors using petri-net-based techniques. In Proceedings of Business Process Management, Models, Techniques, and Empirical Studies, pages
161–183, London, UK, 2000. Springer-Verlag.
[133] W. M. P. van der Aalst and A. H. M. ter Hofstede. YAWL: yet another
workflow language. Information Systems, 30(4):245–275, 2005.
[134] W. M. P. van der Aalst, A. H. M. ter Hofstede, B. Kiepuszewski, and
A. P. Barros. Workflow Patterns. Distributed and Parallel Databases,
14(1):5–51, 2003.
[135] W. M. P. van der Aalst, B. van Dongen, G. Christian, R. S. Mans,
A. Alva de Medeiros, A. Rozinat, V. Rubin, M. Song, H. M. W. Verbeek,
and A. J. M. M. Weijters. Prom 4.0: Comprehensive support for real
process analysis. In Proceedings of the 28th International Conference on
Applications and Theory of Petri Nets and Other Models of Concurrency
ICATPN 2007, volume 4546 of LNCS, pages 484–494. Springer, 2007.
[136] W. M. P. van der Aalst and K. van Hee. Workflow Management: Models,
Methods, and Systems. The MIT Press, 2002.
[137] A. Venkateswaran, V. Sarangan, N. Gautam, and R. Acharya. Impact of
mobility prediction on the temporal stability of manet clustering algorithms. In PE-WASUN ’05: Proceedings of the 2nd ACM international
workshop on Performance evaluation of wireless ad hoc, sensor, and
ubiquitous networks, pages 144–151, New York, NY, USA, 2005. ACM
Press.
BIBLIOGRAPHY
221
[138] B. Victor and F. Moller. The mobility workbench - a tool for the picalculus. In CAV ’94: Proceedings of the 6th International Conference
on Computer Aided Verification, pages 428–440, London, UK, 1994.
Springer-Verlag.
[139] G. Vossen and M. Weske. The WASA2 Object-Oriented Workflow Management System. In SIGMOD 1999: Proceedings ACM SIGMOD International Conference on Management of Data, pages 587–589. ACM
Press, 1999.
[140] B. Weber, S. Rinderle, and M. Reichert. Change Patterns and Change
Support Features in Process-Aware Information Systems. In Proceedings
of 19th International Conference on Advanced Information Systems Engineering, 19th International, volume 4495 of Lecture Notes in Computer
Science, pages 574–588. Springer, 2007.
[141] B. Weber, W. Wild, and R. Breu. CBRFlow: Enabling Adaptive Workflow Management Through Conversational Case-Based Reasoning. In
ECCBR 2004: Proceedings of the 7th European Conference on Advances
in Case-Based Reasoning, volume 3155 of Lecture Notes in Computer
Science, pages 434–448. Springer, 2004.
[142] M. Weske. Formal Foundation and Conceptual Design of Dynamic Adaptations in a Workflow Management System. In HICSS01: Proceedings of
the 34th Annual Hawaii International Conference on System Sciences.
IEEE Computer Society, 2001.
[143] D. West. An Implementation and Evaluation of the Ad-Hoc On-Demand
Distance Vector Routing Protocol for Windows CE. M.sc. thesis in
computer science, University of Dublin, September 2003.
[144] J. Wielemaker. An Overview of the SWI-Prolog Programming Environment. In WLPE: Proceedings of the 13th International Workshop
on Logic Programming Environments, volume CW371 of Report, pages
1–16, 2003.
[145] W. Wright. Business Visualization Adds Value. IEEE Computer Graphics and Applications, 18(4):39, 1998.
[146] M. T. Wynn, W. M. P. van der Aalst, A. H. M. ter Hofstede, and
D. Edmond. Verifying Workflows with Cancellation Regions and ORJoins: An Approach Based on Reset Nets and Reachability Analysis.
In Proceedings of the 4th International Conference on Business Process
Management, BPM 2006, volume 4102 of Lecture Notes in Computer
Science, pages 389–394. Springer, 2006.
222
BIBLIOGRAPHY
[147] X. Zeng, R. Bagrodia, and M. Gerla. Glomosim: a library for parallel
simulation of large-scale wireless networks. SIGSIM Simulation Digest,
28(1):154–161, 1998.
[148] Y. Zhang and W. Li. An integrated environment for testing mobile adhoc networks. In MobiHoc ’02: Proceedings of the 3rd ACM international
symposium on Mobile ad hoc networking & computing, pages 104–111.
ACM, 2002.
[149] P. Zheng and L. M. Ni. EMWIN: emulating a mobile wireless network using a wired network. In WOWMOM ’02: Proceedings of the
5th ACM international workshop on Wireless mobile multimedia, pages
64–71. ACM, 2002.
Scarica

Adaptive Process Management in Highly Dynamic and Pervasive