Sapienza - Università di Roma Dottorato di Ricerca in Ingegneria Informatica XXI Ciclo – 2009 Adaptive Process Management in Highly Dynamic and Pervasive Scenarios Massimiliano de Leoni Sapienza - Università di Roma Dottorato di Ricerca in Ingegneria Informatica XXI Ciclo - 2009 Massimiliano de Leoni Adaptive Process Management in Highly Dynamic and Pervasive Scenarios Thesis Committee Reviewers Prof. Tiziana Catarci Prof. Giuseppe De Giacomo Dr. Massimo Mecella Dr. Alfredo Gabaldon Prof. Jan Mendling Author’s address: Massimiliano de Leoni Dipartimento di Informatica e Sistemistica Antonio Ruberti Sapienza Università di Roma Via Ariosto 25, I-00185 Roma, Italy e-mail: [email protected] www: http://www.dis.uniroma1.it/∼deleoni Ringraziamenti Ora che questa tesi è completata ed un altro passo della mia vita è stata fatto, non posso non tornare indietro e ripercorre tutti questi anni da quando, appena laureato di 1◦ livello, la Prof. Catarci mi propose come assistente alla didattica al Dr. Mecella. Dissi di sı̀, e da allora inizio questa avventura... Quindi non posso che dare un sentito ringraziamento alla Prof. Catarci, che ha permesso che tutto questo avesse inizio e continuasse fino ad oggi. Correva l’anno 2003 e allora cominciò la collaborazione, sebbene inizialmente solamente per la didattica. Un immenso ringraziamento va al Dr. Massimo Mecella. Senza Massimo non avrei mai potuto scrivere questa tesi e crescere umanamente e professionalmente cosı̀ tanto. Massimo è stato più di quello che il suo ruolo lo avrebbe portato a fare.. E’ stato anche un amico, e un supporto nei momenti di sconforto durante tutti gli anni da dottorando. Grazie! Grazie! Grazie! Molti ringraziamenti vanno anche a Prof. De Giacomo, per il suo scientifico supporto e per il tempo che mi ha dedicato; egli è stato un prezioso mentore di molti degli aspetti toccati in questa tesi. Desidero inoltre ringraziare Dr. Sardina, una persona umanamente veramente squisita, che è stato molto disponibile e pronto ad aiutarmi quando c’era da realizzare concretamente le tecniche sviluppate in questa tesi. Senza di lui, SmartPM non sarebbe stato mai realizzato. Inoltre, non posso non esprimere la mia gratitudine al Prof. ter Hofstede che mi ha accolto per 6 mesi nel suo gruppo di ricerca e per il tempo che mi ha dedicato. Durante il purtroppo breve periodo passato lı̀, sono riuscito a crescere professionalmente molto più di quanto avrei sperato di fare. Un saluto va anche all’Australia che mi è rimasta nel cuore e sarà per sempre la mia seconda patria... Molte grazie ai revisori esterni per i loro commenti sul contenuto e la presentazione di questa dissertazione. Molte grazie a tutti i collaboratori e i tesisti che sono stati di supporto negli anni nello sviluppo dei diversi aspetti considerati in questa tesi; un grosso abbraccio a tutti gli amici e a tutti i colleghi nel Dipartimento di Informatica e Sistemistica. Non voglio nominare nessuno in particolare per evitare che mi dimentichi di qualcun’altro, e non v sarebbe giusto.. Inoltre voglio ringraziare Sara: ella ha iniziato avventura con me e mi ha incoraggiato durante tutto il percorso; purtroppo il suo “ruolo” nel frattempo è cambiato per ragioni più grandi di noi. Desidero esprimere poi la mia riconoscenza ai miei genitori, Pierfrancesco e Maria Rosa, a mio fratello Fabrizio, che, nonostante non approvassero la mia scelta, mi hanno comunque dato supporto e non mi hanno “messo il bastone tra le ruote”. Per ultima, ma non per ordine di importanza, voglio ringraziare la mia amata Mariangela. Ella è arrivata da poco nella mia vita, ma quanto basta per accenderne la luce, quella luce che piano piano si era spenta. Acknowledgements Now, that this thesis is completed and another step of my life has been walked, I cannot prevent myself from looking behind and going back over all these years from when, just bachelor graduated, Prof. Catarci proposed me to be teaching assistant to Dr. Mecella. I accepted, and from them this adventure began. Therefore, I wish to thank Prof. Catarci, who has allowed all of this to begin and keep still going on. It was year 2003 and my collaboration started, even though iniatially only for teaching purposes. I need to thank Dr. Massimo Mecella infinitely: without him I could never have written this thesis, nor be growing up humanely and professionally so much. Massimo has been more than his role would have led up to do... He has been also a friend as well as and a support in the moments of discouragement during the years of the Ph.D. program. Thanks! Thanks! Thanks so much! I need to say “thanks” to Prof. De Giacomo, as well, for his scientific support and for the time he devoted me. He has been a precious mentor for the topics touched on in thesis. I wish also to thank Dr. Sardina, a really exquisite person, who has been definitely promptly helpful when I had to realize concretely the techniques conceived in the thesis. Without him, I could never develop concretely SmartPM. Moreover, I cannot prevent myself from expressing my gratitude to Prof. ter Hofstede, who hosted me in his research group, devoting a lot of his time to me. During the (unfortunately) short time there, I could grow up much more than I hoped to do. A lovely hug is also for Australia, which is in my heart of hearts and will be forever my second country. I wish to express my thanks to the external referees for their valuable comments on the content and the presentation of this dissertation. Thanks very much to all collaborators and Master/Bachelor students that have been contributing in the development of many practical aspects of this thesis; a lovely hug to all of my friends and to my colleagues of Dipartimento di Informatica e Sistemistica. I am not willing to name explicitly anyone to avoid forgetting any, as that would not be fair. Furthermore, I wish to thank Sara; she started this adventure with me and supported me along this path; unfortunately, her “role” has meanwhile vii changed for some reasons greater than us. I wish to show my appreciation to my parents, Pierfrancesco e Maria Rosa, my brother Fabrizio, who all, although they were not approving my choice, supported me anyway “without throwing a spanner in the works”. Last, but not least, I wish to thank my beloved Mariangela. She has entered recently into my life, but enough to turn on its light, which little by little were going off. Contents 1 Introduction 1.1 Problem Statement . . . . . . . . 1.2 Original Contributions . . . . . . 1.3 Publications and Collaborations . 1.4 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Rationale 1 1 3 7 11 13 3 Literature Review 3.1 Process Modelling Languages . . . . . . . . . . . 3.1.1 Workflow Nets . . . . . . . . . . . . . . . 3.1.2 Yet Another Workflow Language (YAWL) 3.1.3 Event-driven Process Chains (EPCs) . . . 3.1.4 π-calculus . . . . . . . . . . . . . . . . . . 3.1.5 Discussion . . . . . . . . . . . . . . . . . . 3.2 Related Works on Adaptability . . . . . . . . . . 3.3 Case Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 21 21 25 26 27 30 33 39 4 Framework for Automatic Adaptation 4.1 Preliminaries . . . . . . . . . . . . . . . . . 4.2 Execution Monitoring . . . . . . . . . . . . 4.3 Process Formalisation in Situation Calculus 4.4 Monitoring Formalisation . . . . . . . . . . 4.5 A Concrete Technique for Recovery . . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 45 49 53 57 59 63 5 The SmartPM System 5.1 The IndiGolog Platform . . . . . . . . . . . . . . . . . . . . 5.1.1 The top-level main cycle and language semantics . . 5.1.2 The temporal projector . . . . . . . . . . . . . . . . 5.1.3 The environment manager and the device managers 5.1.4 The domain application . . . . . . . . . . . . . . . . . . . . . . . . . . 65 66 67 70 71 72 ix . . . . . . . . . . . . . . . . . . 5.2 5.3 5.4 5.5 5.6 The SmartPM Engine . . . . . . . . . . . . . . . . . . 5.2.1 Coding processes by the IndiGolog interpreter . 5.2.2 Coding the adaptation framework in IndiGolog 5.2.3 Final discussion . . . . . . . . . . . . . . . . . . The Network Protocol . . . . . . . . . . . . . . . . . . 5.3.1 Protocols and implementations . . . . . . . . . 5.3.2 Testing Manets . . . . . . . . . . . . . . . . . . 5.3.3 Final Remarks . . . . . . . . . . . . . . . . . . Disconnection Prediction in Manets . . . . . . . . . . 5.4.1 Related Work . . . . . . . . . . . . . . . . . . . 5.4.2 The Technique Proposed . . . . . . . . . . . . . 5.4.3 Technical Details . . . . . . . . . . . . . . . . . 5.4.4 Experiments . . . . . . . . . . . . . . . . . . . The OCTOPUS Virtual Environment . . . . . . . . . 5.5.1 Related Work . . . . . . . . . . . . . . . . . . . 5.5.2 Functionalities and Models . . . . . . . . . . . 5.5.3 The OCTOPUS Architecture . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . 6 Adaptation of Concurrent Branches 6.1 General Framework . . . . . . . . . . . . . . 6.2 The adaptation technique . . . . . . . . . . 6.2.1 Formalization . . . . . . . . . . . . . 6.2.2 Monitoring-Repairing Technique . . 6.3 An Example from Emergency Management 6.4 A summary . . . . . . . . . . . . . . . . . . 7 Some Covered Related Topics 7.1 Automatic Workflow Composition . 7.1.1 Conceptual Architecture . . . 7.1.2 A Case Study . . . . . . . . . 7.1.3 The Proposed Technique . . . 7.1.4 Final remarks . . . . . . . . . 7.2 Visual Support for Work Assignment 7.2.1 Related Work . . . . . . . . . 7.2.2 The General Framework . . . 7.2.3 Fundamentals . . . . . . . . . 7.2.4 Available Metrics . . . . . . . 7.2.5 Implementation . . . . . . . . 7.2.6 The YAWL system . . . . . . 7.2.7 The User Interface . . . . . . 7.2.8 Architectural Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 75 83 90 92 92 94 101 102 104 105 112 114 116 118 120 125 126 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 130 131 131 136 142 145 . . . . . . . . . . . . . . . . . . . . . . . . . in PMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 151 152 153 161 165 166 168 169 171 172 176 177 178 180 7.3 7.2.9 Example: Emergency Management . . . . . . . . . . . . 184 7.2.10 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . 190 A summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 8 Conclusion 193 A The Code of the Running Example 197 Chapter 1 Introduction 1.1 Problem Statement Nowadays organisations are always trying to improve the performance of the processes they are part of. It does not matter whether such organisations are dealing with classical static business domains, such as loans, bank accounts or insurances, or with pervasive and highly dynamic scenarios. The demands are always the same: seeking more efficiency for their processes to reduce the time and the cost for their execution. According to the definition given by the Workflow Management Coalition1 , a workflow is “the computerised facilitation of automation of a business process, in whole or part”. The Workflow Management Coalition defines a Workflow Management System as “a system that completely defines, manages and executes workflows through the execution of software whose order of execution is driven by a computer representation of the workflow logic”. Workflow Management Systems (WfMSs) are also known as Process Management Systems (PMSs), and we are going to use both of them interchangeably throughout this thesis. Accordingly, this thesis uses many times word “process” is place of word “workflow”, although the original acceptation of the former is not intrinsically referring to its computerised automation. The idea of Process Management Systems as information systems aligned in a process-oriented way was born in late 80’s with the aim of improving the process performances. And PMSs are still growing in importance since the demand of efficiency and effectiveness is more and more crucial in a highly competitive world. PMSs improve efficiency, while providing a better process control [46, 136]. The use of computer systems avoids process executions to be improvised and guarantees a more systematic process execution, which finally translates to an 1 http://wfmc.org 1 2 CHAPTER 1. INTRODUCTION overall improvement of the response time. In this thesis we are not dealing with classical business scenarios, which it has been extensively researched on, but we turn our attention to highly dynamic and pervasive scenarios. In pervasive scenarios, information processing is thoroughly integrated with the physical environment and its objects. As such, people cannot carried out activities remotely, but they need to interact actively with the environment and make physical changes to it. Pervasive scenarios comprise, for instance, emergency management, health care or home automation (a.k.a. domotics). The physical interaction with the environment increases the frequency of unexpected contingencies with respect to classical scenarios. Being pervasive scenarios very dynamic and turbulent, PMSs should provide a higher degree of operational flexibility/adaptability to suit them. According to Andresen and Gronau [3] adaptability can be seen as an ability to change something to fit to occurring changes. Adaptability is to be understood here as the ability of a PMS to adapt/modify processes efficiently and fast to changed circumstances. If processes were not adapted, they could not be carry out in the changed environment. In pervasive settings, efficiency and effectiveness when carrying on processes are a strong requirement. For instance, in emergency management saving minutes could result in saving injured people, preventing buildings from collapses, and so on. Or, pervasive health-care processes can cause people’s permanent diseases when not executed by given deadlines. In order to improve effectiveness of process execution, adaptation ought to be as automatic as possible and to require minimum manual human intervention. Indeed, human intervention would cause delays, which might not be acceptable. The main concern of this thesis is to research for improving the degree of automatic adaptation to react to very frequent changes in the execution environment and fit processes accordingly. Let us consider a scenario for emergency management where processes show typical a complexity that is comparable to business settings. Therefore, it worthy using a PMS to coordinate the activities of emergency operators within teams. The members of a team are equipped with PDAs and are coordinated through the PMS residing on a leader device (usually an ultra-mobile laptop). In such a PMS, process schemas (in the form of enriched Activity Diagrams) are defined, describing different aspects, such as tasks/activities, control and data flow, tasks assignment to services, etc. Every task is associated to a set of conditions which ought to be true for the task to be performed; conditions are defined on the control and data flow (e.g., a previous task has to be completed or a variable needs to be assigned a specific range of values). Devices communicate with each other through ad hoc networks. A Mobile Ad hoc NETwork (manet) is a P2P network of mobile nodes capable of communicating with each other without an underlying infrastructure. Nodes can 1.2. ORIGINAL CONTRIBUTIONS 3 communicate with their own neighbors (i.e., nodes in radio-range) directly by wireless links. Non-neighbor nodes can communicate as well, by using other intermediate nodes as relays that forward packets toward destinations. The lack of a fixed infrastructure makes this kind of network suitable in all scenarios where it is needed to deploy quickly a network, but the presence of access points is not guaranteed, as in emergency management [91]. The execution of the emergency management process requires such devices to be continually connected to the PMS. However, this cannot be guaranteed: the environment is highly dynamic and the movement of nodes (that is, devices and related operators) within the affected area, while carrying out assigned tasks, can cause disconnections and, thus, unavailability of nodes. From the collection of actual user requirements [35, 66, 67], it results that typical teams are formed by a few nodes (less than 10 units), and therefore frequently a simple task reassignment is not feasible. Indeed, there may not be two “similar” services available to perform a given task. Adaptability might consist in this case to recover the disconnection of a node X, and that can be achieved by assigning a task “Follow X” to another node Y in order to maintain the connection. When the connection has been restored, the process can progress again. 1.2 Original Contributions The definitions of adaptability currently available in literature are too generic for our intends. This thesis comes up with a more precise definition of process adaptability which stems from the the field of robotics and agent programming [31] and is adapted for process management. Adaptability can be seen as the ability of the PMS to reduce the gap of the virtual reality, the (idealized) model of reality that is used by the PMS to deliberate, from the physical reality, the real world with the actual values of conditions and outcomes. For instance in the aforementioned scenario about emergency management, in virtual reality PMS assumes nodes to be always connected. But in physical reality when nodes are moving, they can lose a wireless connection and, hence, may be unable to communicate. The reduction of this gap requires sufficient knowledge of both kinds of realities (virtual and physical). Such knowledge, harvested by the services performing the process tasks, would allow the PMS to sense deviations and to deal with their mitigation. In theory there are three possibilities to deal with deviations: 1. Ignoring deviations – this is, of course, not feasible in general, since the new situation might be such that the PMS is no more able to carry out the process instance. 4 CHAPTER 1. INTRODUCTION 2. Anticipating all possible discrepancies – the idea is to include in the process schema the actions to cope with each of such failures. This can be seen as a try-catch approach, used in some programming languages such as Java. The process is defined as if exogenous actions cannot occur, that is everything runs fine (the try block). Then, for each possible exogenous event, a catch block is designed in which the method is given to handle the corresponding exogenous event. As already touched on and widely discussed in Chapter 3, most PMSs use this approach. For simple and mainly static processes, this is feasible and valuable; but, especially in mobile and highly dynamic scenarios, it is quite impossible to take into account all exception cases. 3. Devising a general recovery method able to handle any kind of exogenous events – considering again the metaphor of try/catch, there exists just one catch block, able to handle any exogenous events, included the unexpected. The catch block activates the general recovery method to modify the old process P in a process P 0 so that P 0 can terminate in the new environment and its goals are included in those of P . This approach relies on the execution monitor (i.e., the module intended for execution monitoring) that detects discrepancies leading the process instance not to be terminable. When they are sensed, the control flow moves to the catch block. An important challenge here is to build the monitor which is able to identify which exogenous events are relevant, i.e. that prevent processes from being completed successfully, as well as to automatically synthesize P 0 during the execution itself. This thesis aims at achieving adaptability by using the third approach, which seems to be the most appropriate when dealing with scenarios where the frequency of unexpected exogenous events are relatively high. After an investigation of possible techniques which can be used for automatically adaptation, we focussed our attention to well-established techniques and frameworks in Artificial Intelligence, such as Situation Calculus [119] and automatic planning. Those techniques were born to coordinate robots and intelligent agents, i.e. in application fields that are far from the main topic of this thesis. Therefore, their applicability to process management has required a significant effort in terms of conceptualisation and formalisation. Then, we have proposed a proof-of-concept implementation, namely SmartPM, which is based on the IndiGolog interpreter developed at University of Toronto and RMIT University, Melbourne. The use of an available platform born for coordinating robots has raised critical issues when used to integrate generic automatic services and humans. And solving these issues has required a tight collaboration with the conceivers and developers. 1.2. ORIGINAL CONTRIBUTIONS 5 Actions are modeled in IndiGolog [121], a logic-based language used for robot and agent programming. Fluents denoting world properties of interest are modeled in SitCalc as well as pre- and post-conditions of actions are. Such formalisms enable to reason over exogenous events and determine (i) when such events are able to invalidate the execution of certain processes and (ii) how to recovery from them and take the original process back to the right track. Specifically, when a certain deviation is sensed that makes deviate the physical reality from the virtual one, we make use of planning mechanisms to find and enact a set of activities thus recovering from such a mismatching. The first framework proposed is able to deal with any well-structure processes with no restrictions (see Chapter 4). Then, we have later a second framework that, from the one side, is more efficient. But, from the other side, it poses some restrictions on the structure and the characteristics of the processes and, hence, it cannot be always used (see Chapter 6). In sum, the contribution of this thesis to the field of automatic process adaptability is manifold:2 • The collection of actual requirements by users acting in such pervasive and dynamic scenarios. Requirement collections guarantee that the resulting system is really useful for end users [66, 67, 23, 22, 35, 24]. • The analysis of existing work within the topic of adaptability (a.k.a. flexibility), exception handling and process modelling in order to analyze and systematize available modelling languages and approaches to process adaptability. • The evaluation of possible alternative approaches. We tried other approaches which are valuable but partly fail when dealing with unexpected deviations. Finally, we move beyond the borders of the process management field, yielding to agent and robotic programming. By such analysis and evaluation, we have been also able to give a precise characterization of the notion of process adaptability in term of gap between the virtual and physical reality [36, 7, 34]. • The conceptualisation and formalisation of a first set of techniques for automatic adaptation of any well-structured process [37]. In order to achieve that, we provide some sub-contribution: – The definition of a precise semantic for defining formally the process structure and the activity conditions. These semantics has been obtained tailoring Situation Calculus and IndiGolog to process management. Formalising processes using Situation Calculus and 2 The references below concern papers of the candidate addressing such topics 6 CHAPTER 1. INTRODUCTION IndiGolog has required a significant effort, since such formalisms are not intended for that. – The formalization of the concept of equivalence of two processes through bisimulation. A process P running in an environment E is said to be equivalent to a process P 0 running in an environment E 0 if P achieves the same goals as P 0 when P is executed in E and P 0 in E 0 . – The effort of taking the adaptability issue to the problem of finding a plan to recover from discrepancies in order to eliminate the mismatching between the physical and the virtual reality. – The formal proof of the correctness and completeness of the proposed approach. • The development of SmartPM, a proof-of-concept implementation of the adaptation framework that is based on the IndiGolog interpreter developed at University of Toronto and RMIT University, Melbourne [39]. The use of a platform specifically intended for robot and agent programming has required a tight collaboration with the conceivers and developers to tailor it to process management. The aim of such an implementation has been to demonstrate the practical feasibility and effectiveness of the approach beyond the formal proof of soundness. For the sake of testing in a context of mobile ad-hoc networks, we have provided also other contributes, specifically to the field of mobile networking. Specifically: – The conception and development of a proper manet emulator, namely octopus, which overtakes some issues significant in our testing. Section 5.5 describes octopus and motivates its conception. [28] – The development of a proper manet layer that is really working on low-profile devices. Many implementations are in theory available but, in fact, either they do not work on low-profile or they are partially fledged (see Section 5.3). [14] – The development of some sensors able to sense deviations. Specifically, we have developed a module that is able to predict node disconnections before they actually happen. [38, 41] • The conception of a second technique which aims at overcoming some of limitations of the first framework. It results to be more efficient in dealing with recovery plans since it is able to stick individually the parts which are affected by discrepancies without having to block the whole process. On the other hand, this approach is applicable over more restrictive conditions of the structured and the characteristic of processes. [33] 1.3. PUBLICATIONS AND COLLABORATIONS 7 We have also contributed on other topics of the field of process management, more in general. These topics address other challenging issues concerning pervasive scenarios. Specifically: • The formalisation of a first step towards distributing the process orchestration among the different devices of the involved services/participants as well as towards synthesizing the process specification on the basis of available services. Indeed, in pervasive scenarios any device may fall down in any moment because of the environment, including the device hosting the engine. The sole way to avoid the engine to be a single point of failure is to distribute the orchestration and the coordination among all available devices. In addition, processes often might be only provided as template and their concrete instance are created when on the basis of the available services the process has to be enacted [53]. • The conceptualisation and the implementation of an innovative “client” tool to distribute tasks to process participants in a way they are aided when choosing the next task to work on. This tool aims to overcome current limitations of worklist handlers of the state-of-the-art in Processaware Information Systems. These worklist handlers typically show a sorted list of work items comparable to the way that e-mails are shown in mail agents. Since the worklist handler is the dominant interface between the system and its users, it is worthwhile to provide a more advanced graphical interface that uses information about work items and users as well as about process cases which are completed or still running. The worklist handler proposed aims to provide process participants with a deeper insight in the context in which processes are carried out. This way, participants can be assisted with the selection of the next work item to perform. The approach uses the ”map metaphor” to visualise work items and resources (e.g., participants) in a sophisticated manner. Moreover, depending on the ”distance notion” chosen, work items are visualised differently. For example, urgent work items of a type that suits the user are highlighted. The underlying map and distance notions may be of a geographical nature (e.g., a map of a city or an office building), but may also be based on the process design, organisational structures, social networks, due dates, calenders, etc. [42] 1.3 Publications and Collaborations The following publications have been produced while researching this thesis: • M. de Leoni, F. De Rosa, M. Mecella “ MOBIDIS: A Pervasive Architecture for Emergency” 8 CHAPTER 1. INTRODUCTION In Proceedings of the 15th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE 2006), University of Manchester, UK, June 26th -28th, 2006. AWARDED AS “BEST PAPER” OF DMC 2006 WORKSHOP. • T. Catarci, M. de Leoni, M. Mecella, M. Angelaccio, S. Dustdar et al. “WORKPAD: 2-Layered Peer-to-Peer for Emergency Management through Adaptive Processes” In Proceedings of The 2nd International IEEE Conference on Collaborative Computing: Networking, Applications and Worksharing (COLLABORATECOM 2006), Atlanta, Georgia, USA, November 17th - 20th, 2006. • M. de Leoni, A. Marrella, F. De Rosa, M. Mecella, A. Poggi, A. Krek, F. Manti “Emergency Management: from User Requirements to a Flexible P2P Architecture” In Proceedings of the 4th International Conference on Information Systems for Crisis Response and Management (ISCRAM’07 ), Delft, the Netherlands, May 13th-16th, 2007. • F. D’Aprano, M. de Leoni, M. Mecella “ Emulating Mobile Ad-hoc Networks of Hand-held Devices. The OCTOPUS Virtual Environment” In Proceedings of the ACM Workshop on System Evaluation for Mobile Platform: Metrics, Methods, Tools and Platforms (MobiEval ) co-located with Mobisys 2007, Puerto Rico 11-14 June 2007 • M. de Leoni, M. Mecella, R. Russo “A Bayesian Approach for Disconnection Management” In Proceedings of the 16th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE-2007), GET/INT, Paris, France, June 18-20, 2007 • T. Catarci, M. de Leoni, M. Mecella, S. Dustdar, L. Juszczyk et al. ”The WORKPAD P2P Service-Oriented Infrastructure for Emergency Management” In Proceedings of the 16th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE 2007), GET/INT, Paris, France, June 18-20, 2007 • G. De Giacomo, M. de Leoni, M. Mecella, F. Patrizi “ Automatic Workflow Composition of Mobile Services” In Proceedings of the IEEE International Conference on Web Services (ICWS 2007 ), Salt Lake City, USA, July, 2007. • M. de Leoni, M. Mecella, G. De Giacomo “Highly Dynamic Adaptation in Process Management Systems through Execution Monitoring” 1.3. PUBLICATIONS AND COLLABORATIONS 9 In Proceedings of the 5th International Conference on Business Process Management (BPM 2007 ), Brisbane, Australia, 24-28 September 2007. • M. de Leoni, F. De Rosa, M. Mecella, S. Dustdar “ Resource Disconnection Management in MANET Driven by Process Time Plan” In Proceedings of the First International ACM Conference on Autonomic Computing and Communication Systems (AUTONOMICS’07 ), Rome, Italy, 28-30 October 2007. • T. Catarci, M. de Leoni, M. Mecella, G. Vetere, S. Dustdar et al. ”Pervasive and Peer-to-Peer Software Environments for Supporting Disaster Responses”. “IEEE Internet Computing” Journal – Special Issue on Crisis Management January 2008 • M. de Leoni, S. R. Humayoun, M. Mecella, R. Russo ”A Bayesian Approach for Disconnection Management in Mobile Ad-hoc Network” ”Ubiquitous Computing and Communication” Journal - March 2008 • G. Bertelli, M. de Leoni, M. Mecella, J. Dean Mobile Ad hoc Networks for Collaborative and Mission-critical Mobile Scenarios: a Practical Study In Proceedings of the 17th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE 2008), 23-25 June 2008,Rome, Italy. • M. de Leoni, A. Marrella, M. Mecella, S. Valentini, S. Sardina ”Coordinating Mobile Actors in Pervasive and Mobile Scenarios: An AI-based Approach” In Proceedings of the 17th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises (WETICE 2008), 23-25 June 2008,Rome, Italy. • M. de Leoni, W. M. P. van der Aalst, A.H.M. ter Hofstede ”Visual Support for Work Assignment in Process-aware Information Systems” In Proceedings of the 6th International Conference on Business Process Management (BPM 2008 ), Milan, Italy, 1-4 September 2008. • T. Catarci, F. Cincotti, M. de Leoni, M. Mecella, G. Santucci ”Smart Homes for All: Collaborating Services in a for-All Architecture for Domotics” In Proceedings of the 4th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom’08 ), Orlando, USA, 13-16 November 2008 • D. Battista, A. De Gaetanis, M. de Leoni et al. ”ROME4EU: A Web Service-based Process-aware Information System for Smart devices” 10 CHAPTER 1. INTRODUCTION In Proceedings of the International Conference on Service Oriented Computing (ICSOC 2008 ), Sydney, Australia, 1-4 December 2008. • M. de Leoni, Y. Lésperance, G. De Giacomo, M. Mecella ”On-line Adaptation of Sequential Mobile Processes Running Concurrently” In Proceedings of the 24th ACM Symposium on Applied Computing (SAC09 ) 8-12 March, 2009, Honolulu, Hawaii, USA. Special Track ”Coordination Models, Languages and Applications” • S. R. Humayoun, T. Catarci, M. de Leoni, A. Marrella, M. Mecella, M. Bortenschlager, R. Steinmann ”Designing Mobile Systems in Highly Dynamic Scenarios. The WORKPAD Methodology.” Springer’s International Journal on Knowledge, Technology & Policy, Volume 22, Number 1 / March, 2009. • S. R. Humayoun, T. Catarci, M. de Leoni, A. Marrella, M. Mecella, M. Bortenschlager, R. Steinmann ”The WORKPAD User Interface and Methodology: Developing Smart and Effective Mobile Applications for Emergency Operators” In Proceedings of 13th International Conference on Human-Computer Interaction (HCI International 2009 ), 19-24 July, 2009, San Diego, USA. Session “Designing for Mobile Computing”. • F. Cardi, M. de Leoni, M. Adams, W. M. P. van der Aalst, A.H.M. ter Hofstede Visual Support for Work Assignment in YAWL In Proceedings of the Demonstration Track of 7th International Conference on Business Process Management (BPM 2009), September 2009, Ulm, Germany. To appear. The work described in Section 7.2 has been mostly produced during an internship of Mr. Massimiliano de Leoni at the BPM Group of the Faculty of Information Technology of Queensland University of Technology, Brisbane (Australia). His visit commenced on September 17th, 2007 and ceased on April 07th, 2008 and was supervised by Prof. Arthur H. M. ter Hofstede, co-leader of this group. The implementation of the adaptation framework has been developed in cooperation with Dr. Sebastian Sardina, research assistant at the Agent Group of the RMIT University, Melboune, Australia. In particular, Mr. de Leoni was visiting the group from December 7th, 2008 to December 17th, 2008, with the aim of solving the last details of the proof-of-concept implementation. Mr. Massimiliano de Leoni has also co-chaired a workshop on Process Management for Highly Dynamic and Pervasive Scenarios (PM4HDPS) held in Milan on September 1st, 2008 in conjunction with the 6th International Conference on Business Process Management (BPM’08).3 The workshop aimed at 3 Web site: http://pm4hdps.deleoni.it 1.4. OUTLINE OF THE THESIS 11 Figure 1.1: Outline of the Thesis and relationship among Chapters providing a forum to draw attention to Highly Dynamic and Pervasive settings and to exchange the latest individual research and development ideas. The valuable outcomes are summarized in [37]. 1.4 Outline of the Thesis Figure 1.1 diagrams the structure of this Thesis document. Specifically: • Chapter 2 illustrates in detail the rationale behind the need of the new approach to process adaptability that this Thesis deals with. In particular, it highlights why the approaches currently proposed fail when dealing with Highly Dynamic and Pervasive Scenarios. • Chapter 3 surveys the literature and describes the works, the systems and the techniques that have been already proposed in the field of process adaptation. Specifically, it compares the choice of IndiGolog as modelling language with respect to the other languages that are nowadays used by various Process Management Systems. Moreover, it discusses the levels of support for process adaptability/flexibility and exception handling in several of the leading commercial products and in some academic 12 CHAPTER 1. INTRODUCTION prototypes. Finally, it concludes stating the inappropriateness of Case Handling as approach to manage the performance of pervasive processes. • Chapter 4 shows a first approach to handle unexpected exogenous events, and to recovery process instance executions when exogenous events make impossible their termination. • Chapter 5 describes the most salient points of the concrete implementation based on the IndiGolog platform developed by the University of Toronto and RMIT University. • Chapter 6 illustrates a more efficient adaptability technique but under more restrictive conditions with respect to the one proposed in Chapter 4. • Chapter 7 introduces some research topics related to the process management in pervasive scenarios. The first deals with the problem of synthesizing a process schema according to the available services and distributing the orchestration among all of them. The second touches the topic of supporting process participants when choosing the next task to work on among the several ones they can be offered to. • Chapter 8 conclude the thesis, surveying the outcomes and sketching future improvement in the field of the process adaptation. Chapter 2 Rationale Over the last decade there has been increasing interest in Process Management Systems (PMS), also known as Workflow Management System (WfMS). A PMS is, according to the definition in [46], “a software that manages and executes operational processes involving people, applications, and information sources on the basis of process models”. PMSs are driven by process specifications, which are some computerized models for the processes to be enacted. The model defines the tasks (also referred to simply as activities) that are part of the processes, as well as their pre- and post-conditions. Pre-conditions are typically defined on the so-called control and data flows. Indeed, the control flow defines the right sequence of task executions: some tasks can be assigned to members for performance only when others have been already completed. The data flow specifies how the values of process variables change/evolve over time as well as which variables specific tasks are allowed to read and/or write. Process specifications can define some decision points to choose one branch among alternative ones; such choices are driven by some formulas over process variables. These formulas are, then, evaluated at run-time by taking into account the actual variable values. When processes need to be running, instances are created, which possess their own copies of the variables defined. In the PMS literature, instances are often referred to as cases. To be more precise, tasks are never executed. Tasks are defined inside the process schema. When process schemas are instantiated in cases, tasks are instantiated, as well. A work item is a task instance inside a case and is created as soon as the case reaches to the corresponding task in the schema. Work-items represent the real pieces of work that participants execute. For instance, if there exists a task “Approve travel request” for a flight-booking process, a possible work item might be “Approve travel request XYZ1234” of case “Flight booking XYZ1234”. It is worthy noting that many work items referring to the same 13 14 CHAPTER 2. RATIONALE task may be instantiated for a single case. Unless needed, we do not distinguish throughout this thesis between the concept of tasks/activities and work items, bearing anyway in mind that a difference does exist. At the heart of PMSs there exists an engine that manages the process routing and decides which tasks are enabled for execution by taking into account the control flow, the value of variables and other aspects. Once a task can be assigned, PMSs are also in charge of assigning tasks to proper participants; this step is performed by taking into account the participants “skills” required by single tasks as well as their roles in their respective organisations. Indeed, a task will be assigned to all of those participants that provide every skill required or have a certain organisation role. Human participants are provided with a client application, often named , which is intended to receive notifications of task assignments. Participants can, then, use this application to pick from the list of assigned tasks which one to work on as next. SmartPM, the adaptive Process Management System conceptualised, formalised and developed in this thesis work, abstracts from the possible participants that it can coordinate. We name them generically services. SmartPM provides a client interface that services can invoke in order to communicate for data exchange and to coordinate the process execution. We assume communication to be one-way, which means services send request to SmartPM and close the communication without standing by for a prompt response. When the response is ready, SmartPM will be in charge of contacting the service and informing on the response. When SmartPM is communicating with the client, it assumes services to provide well-known and established interfaces, which SmartPM uses to send back responses. Therefore, services have to provide these interfaces, either directly if services are built for SmartPM, or by implementing a specific wrapper if services are legacy, an handler that provide the proper interfaces to SmartPM and internally transform the messages in the form that legacy services are able to understand. We envision two classes of services. The first class includes the automatic services, i.e. those which can execute tasks with no human intervention, whereas the second comprises the human services. For the second class of human-based services, we envision a client application, named in literature work-list handler, that acts as service. From the one side, it handles the communication with the SmartPM engine, receiving notifications of tasks assignment and informing upon task completion, as any service would do. From the other side, it is equipped with a Graphical User Interface to inform the human users of the task which she has to work on as next. The human users are the real executor of the work that the service is supposed to perform. 15 Process Management for Highly Dynamic and Pervasive Scenarios: Why current solutions do not work properly. Nowadays, Process Management Systems (PMSs) are widely used in many business scenarios, e.g. by government agencies, by insurance companies, and by banks. Despite this widespread usage, the typical application of such systems is predominantly in the context of static scenarios, instead of pervasive and highly dynamic scenarios. Nevertheless, pervasive and highly dynamic scenarios may be configured as complex as business scenarios. Therefore, they could also benefit from the use of PMSs. Some examples of Highly Dynamic and Pervasive scenarios are: Emergency situations. Several devices, robots and/or sensors must be coordinated in accordance with a process schema (e.g., based on a disaster recovery plan) to cope with environmental disasters. Pervasive healthcare. The purpose is to make healthcare available to anyone, anytime, and anywhere by removing location, time and other constraints while increasing both the coverage and quality of healthcare. Ambient intelligence. In this vision, devices/robots work in concert to support people in carrying out their everyday life activities, tasks and rituals in an easy, natural way using information and intelligence that is hidden in the network connecting these devices. Devices and robots are intelligent agents that act and react to external stimuli. Domotics, sometimes also referred to as Home Automation, is a specialised application area in this field. In classical PMSs applied to business scenarios, the procedure for handling possible run-time exceptions is generally subject to acknowledgement by the person responsible for the process. This authorization may be provided at run-time for handling deviations caused by a single exceptional event. Or, conversely, it is possible that the person gives the “go-ahead” for all exceptions in a certain class, defining the correct protocol they should be handled by. In any case, the adaptation is manual and requires human intervention. Conversely, the thesis addresses pervasive and dynamic scenarios, which are characterized by being very instable. In such scenarios, unexpected events may happen, which break the initial condition and makes the executing processes unable to be carried on and terminate successfully. These unforeseen events are quite frequent and, hence, the process can often be invalidated. Deviations are frequent events and often, due to deadline constraints, they must be handled very quickly. For instance, in scenarios of the management of an occurred earthquake, offering first aid to injured victims ought to be as fast as possible. Indeed, saving minutes might result in saving people’s life. Such 16 CHAPTER 2. RATIONALE a requirement rules out waiting for a person’s acknowledgement: adaptation must be as automatic and autonomic as possible. From the surveys in Section 3, it results that all major commercial PMS and academic prototypes are unable to automatically synthesize a recovery plan/process to deal with exogenous events, unless event handlers were foreseen and designed at design-time. This is feasible in classic mostly-static scenarios where exogenous events occur quite rarely. Sometimes manual adaptation or automatic for pre-planned event classes is even mandatory since, as argued before, handling deviations may require either a proper authorization or a specific protocol to exist. This thesis work deals with the issue of devising a set of techniques that can be beneficial for Process Management Systems; in such a way PMSs can handle any exogenous events, even unforeseen, and create proper recovery plans/processes. Then, these techniques have been concretely implemented in SmartPM, an adaptive Process Management System that is specifically intended for pervasive scenarios. The user requirements and consequences on task SmartPM life-cycle The SmartPM system is under development in the context of the Europeanfunded project called WORKPAD, which concerns devising a two-level software infrastructure for supporting rescue operators of different organisations during operations of emergency management [23]. In the context of this project, the whole SmartPM system has been devised in cooperation with real end users, specifically “Protezione Civile della Calabria” (Civil Protection and Homeland Security of Calabria). Indeed, the rest of this thesis will explain the various introduced techniques through examples stemming from emergency management. But its exploitation comprises many other possible pervasive scenarios (such as those described above). According to the HumanComputer Interaction methodology different prototypes have been proposed to users who fed back with comments [66, 67]. At each iteration cycle the prototype has been refined according to such feedbacks till meeting finally the complete users’ satisfaction. From the analysis with final users, we learnt that processes for pervasive scenarios are highly critical and time demanding as well as they often need to be carried out within strictly specified deadlines. Therefore, it is unapplicable to use a pull mechanism for task assignment where SmartPM would assign every task to all process participants qualified for it, letting them decide autonomously what task to execute as next. Consequently, SmartPM aims at improving the overall effectiveness of the process execution by assigning tasks to just one member and, vice versa, by assigning at most one task to members. Moreover, these processes are created in an ad-hoc manner upon the occur- 17 Figure 2.1: The life-cycle model in SmartPM rence of certain events. These processes are designed starting from provided templates or simple textual guidelines on demand. In the light of that, these processes are used only once for the specific setting for which they were created; later, they will not be used anymore. Moreover, process participants are asked to face one situation and, hence, they take part in only one process simultaneously. Taking into account the considerations above, the SmartPM life-cycle model, depicted in Figure 2.1, is specialized with respect to those of other PMSs [120]: 1. When all pre-conditions over data and control flow holds, the SmartPM engine assigns the task to a service, human or automatic, that guarantees the highest effectiveness. The task moves to the Assigned state. 2. The service notifies to SmartPM, when the corresponding member is willing to begin executing. The task moves to the Running state. 3. The service begins executing it, possibly invoking external applications. 4. When the task is completed, the service notifies to SmartPM. The task moves to the final state Completed. 18 CHAPTER 2. RATIONALE Chapter 3 Literature Review The idea that Information Systems have to be aligned in a process-oriented had its root in the 1970s. Nowadays, such systems are often referred to as Workflow Management System (WfMS) or Process Management System (PMS). The competition in a globalized world has become in the last decade really harder than in the past and, hence, PMSs are gaining more and more momentum. As a consequence, from the one side, many software companies have developed commercial PMSs. From the other side many scientific research groups have focused (and are still focusing) their efforts to come up with new ideas to improve certain aspects and to provide new features for the next PMS generations. In order to provide an effective process support, PMSs should capture real-world processes adequately by avoiding any mismatch between the computerised processes and those in reality. With this intend, several models have been proposed for representing real processes in a form that they can represent as many aspects of real processes as possible as well as they are manageable by software systems. Any PMS envisions the figure of the Process Designer who is in charge of modelling business processes by communicating with business domain experts. Process Designers could neither have a strong theoretical background nor be computer scientists. Many proposed process models tried to leverage the necessity of representing real processes precisely and of being easily comprehensible and manageable by non-theoretical people. Section 3.1 gives an overview of the most used formalisms for process modelling from which it results many of them lack in their theoretical foundations. The process adaptability framework proposed in this thesis requires a strong reasoning on the process model to recognise, for instance, when adaptation is needed or to automatically synthesize the recovery plan. That is why we are using IndiGolog, a logical programming language used in robotics, which has a strong 19 20 CHAPTER 3. LITERATURE REVIEW Case Handling Approach (Section 3.3) WebSphere Workflow State of Art Adaptability SAP Workflow Academic and Industrial PMS (Section 3.2) YAWL GraphGraph- based Languages Modelling languages for adaptation (Section 3.1) ADEPT BPMN … YAWL Workflow Nets PiPi-calculus Figure 3.1: Overview of the chapter structure theoretical basis on SitCalc. Another aspect of PMSs when dealing with real processes is to provide enough adaptability to realign processes when exogenous events produce deviation. Section 3.2 illustrates how such adaptability, often also referred to as flexibility, is achieved by many PMSs as well as new techniques and approaches to deal with deviations. Unfortunately, the most of other approaches require experts in charge of manually adapting processes whenever needed. That is applicable in traditional business domains where exceptional events are infrequent. Manual adaptations may be even mandatory in some cases (e.g., when the recovery requires the explicit authorisation of responsible unit heads). It is not feasible in highly dynamic and pervasive scenarios when exogenous events (and, hence, recovery plans) are really frequent. A different approach to deal with flexibility is Case Handling that focuses mainly on cases, running instances of processes. The Case Handling approach poses less constraints on the case executions and, hence, deals intrinsically better with providing adaptability. But being driven by artifacts, its applicability is limited in many pervasive scenarios. Case Handling is driven by the artifacts produced by cases. In many pervasive scenarios it is not always possible to represent every process outcome as a well-defined artifact. Section 3.3 discusses better these points. 3.1. PROCESS MODELLING LANGUAGES 3.1 21 Process Modelling Languages The frameworks for automatic adaptation proposed in this thesis are based on a strong reasoning and on other key features that the languages currently proposed for process modelling do not enable. While their are valuable in other context, they seem to be inappropriate in the light of certain requirements of the adaptation techniques proposed in this thesis. Firstly, appropriate languages for our techniques need to be characterized by sound and formal semantics. Indeed, activities pre- and post-conditions need to be specified in a formal and unambiguous way, thus allow process management systems to reason about the successful completion of process instances. Secondly, appropriate languages need to enable both structural and semantic soundness: processes are not only needed to complete but they have to carry out obtaining the outcomes they have been designed for. Moreover, appropriate languages should model non-atomic execution of activities: the techniques proposed for execution monitoring and recovery should be able to check activities even while they are executing. It is insufficient to model only before and after the execution. Moreover, we rely on planning features: in order the techniques to be feasible in practice, languages for which planners are unavailable are inappropriate. Finally, execution monitoring concerns the state; event-based languages should not be considered, preferring the statebased ones. Indeed, when using event-based languages, the state is implicit and making it explicit would require an additional step, which needs to be repeat continuously. This section is meant to discuss the most used languages for modelling processes showing their inappropriateness in the light of the aforementioned requirements. Sections 3.1.1- 3.1.4 highlights such languages, where 3.1.5 discusses their pros and cons in the light of the requirements as above. 3.1.1 Workflow Nets The most widely used language for defining process specifications are Workflow nets [131, 136]. Workflow nets allow one to define unambiguous specifications, formally reason on them as well as to check for specific properties. The Workflow net language is a subclass of the well-know Petri Nets [108, 136]. Petri nets consist of places, transitions, and direct arcs connecting places and transitions. Petri nets are bipartite graphs in the sense that two places or two transitions cannot be directly connected. There is a graphical notation where places are represented by circles, transitions by rectangles, and connects by direct arcs. Tokens are used to represent the dynamic state and reside on certain places. Each place may contain several tokens: their number and locations inside places identify the correct status;. Figure 3.2 shows an example of 22 CHAPTER 3. LITERATURE REVIEW Figure 3.2: An example of Petri Net Petri Net where places and transitions are respectively depicted as circles and rectangles. The black dots on the places represent tokens and their location. An input place of a transition t is such that it has an outgoing arc toward t, and vice versa an output place of t has ingoing arcs from t. A certain transition t is said to fire if for each input place one token is removed and one token is placed in each output place. Of course, a transition can fire only if it is enabled, that is each input place contains at least one token. In the context of process management, transitions represent activities and their firing represent their execution. Places and connecting arcs represent the process instance state as well as the process constraints. For instance, in the Petri Net above, the two tokens’ location identify that transitions Send Acknowledgement and Request and check payment are enabled. Therefore, the corresponding activities are ready to be assigned to participants and executed. For the sake of brevity, here we introduce formally only an extension by Kurt Jensen [70], named coloured Petri net, which is better tailored to process management. Coloured Petri Nets introduces the association of “colours” to tokens. Data types associated to tokens are called colour sets, where a colour set of a token represents the set of values that tokens may have. Like in programming languages data values of a certain type are associate to variables, in coloured petri nets colours of a certain colour set are associated to tokens. Colours are meant to hold application data, including process instance identifiers. Places may have a different colour set, since some additional data can become available while tokens are passing through the net (i.e., activities are executed). A coloured Petri net is a tuple (Σ, P, T, A, N, C, G, E, I) where: • Σ is a finite set of non-empty types, called colour sets • P is a finite set of places • T is a finite set of transitions 3.1. PROCESS MODELLING LANGUAGES 23 • A is a finite set of arc identifiers, such that P ∩ T = P ∩ A = T ∩ A = ∅ • N : A → (P ×T )∪(T ×P ) is a node function mapping each arc identifier to a pair (startn ode, endn ode) of the arc. • C : P → Σ is a colour function that associates each place with a colour set. • G : T → BoolExpr is a guard function that maps each transition to a boolean expression BoolExpr over the token colour. • E : A → Expr is an arc expression that evaluates to a multi-set over the colour set of the place • I is an initial marking of the colour Petri Net, the initial position of possible tokens with their respective values. In coloured Petri nets, the enabling of a certain transition is determined not only by the existence of tokens on the input places but also by the values of the colour sets of such tokens. A transition is enabled if the guard function for that transition is evaluated as true and the arch expression is satisfied. When a transition fires, the respective tokens are removed from the input places and others are placed in the output places guided by, respectively, the arc expression of the ingoing and outgoing edge. In order to represent the dynamic status of Colour Petri Nets, there exists a function marking which returns, for each place p ∈ P and for each possible colour value v ∈ C(p), the number of tokens in p with value v: Let be P N = (Σ, P, T, A, N, C, G, E, I). For all pi ∈ P , let σpi be s.t. C(pi ) = σi . For all pi there exists a function Mpi : σpi → N. A marking function for P N is defined as follows: ½ Mp (q) if σpi = C(pi ) ∨ q ∈ σpi M (p, q) = 0 otherwise Petri nets should have specific structural restrictions in order to be properly used for process management. In that case, they are named workflow nets: A Petri Net P N = Σ, P, T, A, N, C, G, E, I) is called workflow net iff the following conditions hold: • There is a distinguished place place i ∈ P , named initial place, that has no incoming edge. • There is a distinguished place place o ∈ P , named final place, that has no outgoing edge. • Every place and transition is located on a firing path from the initial to the final place. 24 CHAPTER 3. LITERATURE REVIEW Papers [131, 132] has studied the problem of checking the soundness. Indeed, a process definition is said to be sound if any run-time execution of its cases may not lead to situations of deadlock (the process is not completed but no activity can be executed) or livelocks (the process cycles executing infinitely always the same activities and never terminates). In those papers soundness is defined as follows:1 Definition 3.1 (Soundness). Let P N = (Σ, P, T, A, N, C, G, E, I) be a Workflow Net with initial place i and final place o. P N is structurally sound if and only if the following properties hold: Termination. For every state M reachable from i there exists a firing sequence leading from M to o: ∗ ∗ ∀M, i → M ⇒ M → o Proper termination. State o is the only state reachable from state i with at least one token in place o: ∗ ∀M, i → M ∪ M ≥ o ⇒ M = o No dead transitions. Each transition t ∈ T can contribute to at least one process instance: ∗ t ∀t ∈ T, ∃M, M 0 , i → M → M 0 In some cases designers are only interested in checking whether a process specification allows to reach each defined activity for some execution. When the final state is reached, there can be tokens left in the net, maybe stuck in deadlock situations. For these concerns, the soundness criterion appears to be too restrictive. In the light of this, paper [43] has introduced the notion of Relaxed Soundness: Definition 3.2 (Relaxed Soundness). Let P N = (Σ, P, T, A, N, C, G, E, I) be a Workflow Net with initial place i and final place o. P N is relaxed sound if and only if each transition participates in at least one legal process instance starting from the initial state and reaching the final one: ∗ t ∗ ∀t ∈ T, ∃M, M 0 : i → M → M 0 → o 1 The state of a workflow net is here defined in term of the associated marking function. If ∃q ∈ C(o), M (o, q) ≥ 0, then M ≥ o. In addition, if M ≥ 0 and ∀p ∈ P \ {o} holds, then M =o 3.1. PROCESS MODELLING LANGUAGES 25 Figure 3.3: Basic nodes of the YAWL’s extended workflow nets (from [133]) 3.1.2 Yet Another Workflow Language (YAWL) Yet Another Workflow Language (YAWL) [133] has been developed in order to overcome the lack of a single language that supports all control flow patterns [134]. It is currently used as modelling language by The YAWL Language is used by the homonymous Process Management System (see Section 7.2.6 for further details.) Process specification are defined in YAWL through so-called extended workflow nets composed by nodes of the types in Figure fig:YAWLnet. An extended workflow net is a tuple (C, i, o, T, F, split, join, rem, nof i) such that: • C is a set of conditions • i ∈ C and o ∈ C are the initial and final condition • T is a set of tasks, s.t. C and T are disjoint. • F ⊆ (C \ {o} × T ) ∪ (T × C \ {i}) ∪ (T × T ) is a flow relation such that every node in C ∪ T is on a direct path from i to o. • split : T 6→ {And, Xor, Or} is a partial mapping to assign a split behaviour to tasks. • join : T 6→ {And, Xor, Or} is a partial mapping to assign a join behaviour to tasks. • rem : T 6→ 2T ∪ C\{i,o} specifies the possible subpart of a extended workflow net is cleaned when a certain task.2 2 Formalism 2S is meant to denote the power set of S 26 CHAPTER 3. LITERATURE REVIEW • nof i : T 6→ N × N∞ × N∞ × {dynamic, static} is a partial function that specifies the number of instance of each task (minimum,maximum, threshold for continuation) and whether the instance creations is dynamic or static.3 Extended workflow nets are a flavour of workflow net which is able to handle: Multiple instances. YAWL is able to enable concurrently multiple instances of specific tasks. The exact number may be determined at run-time according to some variables/conditions evaluated on the process instance that multi-task is part of. Advanced Synchronization Pattern. YAWL handles some patterns in a more natural way than workflow nets (such as or split/join). Workflow nets are able to specify most of them even if they need to use artefices that require complex and prolix definitions. Non-local Firing Behaviour. Workflow nets can determine whether a transition can or cannot fire on the basis of the sole input places. YAWL can enable activities considering tokens on other places as well as it allows transitions to delete tokens [146] through the definition of function rem It allows also to divide the extended workflow net in sub-nets, which are made independent of the main net they are integrated in; therefore, sub-nets can be reused in different specifications. The YAWL’s execution semantics of activities are well-defined state transitions systems. Every atomic task is actually the sequence of four transitions: (i) task instance active; (ii) enabled but not yet running; (iii) currently executing; (iv) completed. Moreover it allows to define so-called composite tasks, which are links to other extended workflow nets. Composite tasks facilitate the modularisation of complex specifications and make easier reading those existing. 3.1.3 Event-driven Process Chains (EPCs) Event-driven Process Chain (EPC) is a rather informal notation developed as part of an holistic modelling approach named the ARIS framework [82, p. 35]. There are several formalisations of the EPC syntax as the original paper introduces EPC in an informal way. Here we specifically use the definition given in [96]:4 A tuple EP C = (E, F, C, l, A) is an Event-driven Process Chain if: 3 Formalism N∞ identifies the set of natural numbers plus the infinite The EPC syntax has been also extended with the data and resource perspective, i.e. process participants and data objects manipulated by activities. But here we do not consider worthy describing such extensions. 4 3.1. PROCESS MODELLING LANGUAGES 27 • E, F, C are disjoint, finite, non-empty sets; • l : C → {and, or, xor}; • A ⊆ (E ∪ F ∪ C) × (E ∪ F ∪ C); Elements of E, F, C are respectively named events, functions and connectors. Mapping l assigns to each connector a specific type, representing the or, and, xor semantics. Moreover, some conditions have to hold: • Graph (K, C) has to denote a connected graph; • Every function has exactly one incoming and one outcoming edge; • There exists at least one start and one end event. Start events are denoted by having exactly one outgoing edge and no ingoing edge. Viceversa, end events have no outgoing edge and one ingoing edge; • Each event that is not start or end has got one incoming and one outcoming event; • Each event can be followed only by functions and each function only by events. Events can be followed by multiple functions (and functions by multiple events) if there are intermediate connectors. • Events cannot be followed by an or or xor split node. 3.1.4 π-calculus One of the main problems of Workflow Nets is that they have no suitable methods to compose several nets by concurrent operators. The concurrency can be anyway obtained by clever artifices. Unfortunately such artifices make the model more complex with consequences of the formal verification, which becomes more difficult. By using such artifices, verification of a large model may be computationally infeasible. The use of π-calculus overcomes the problem: it provides tools for building high-level system by composing its sub-systems using concurrency operators. The π-calculus was introduced by Milner [100]. so as to represent concurrent mobile systems and their interactions. The term mobility refers to the way in which process execution evolves. Milner began studying how computer processes are embodied in computer systems and networks. He observed that computer processes merge together elements for computing and for communicating. As result, processes are made known only through the data exchanged. For instance, CPU computations are shown to external components for the information stored into the registers. 28 CHAPTER 3. LITERATURE REVIEW The syntax of π-calculus π-calculus is a CCS flavour and, as CCS, is based on the concept of name: channels to make communicate different sub-systems are named as well as variables and data are. The important improvement with respect to CCS is that π-calculus does not distinguish among the names of the different elements. Therefore, it is possible to send through channels a name representing another channel. The receiver can, then, parameterise the communication channel on the basis of the name returned. In π-calculus everything is considered as a process that exchanges data with other processes exclusively by channels. Specifically, here we are referring to to polyadic π-calculus , an extended version that allows to send and receive tuple of values through channels. The logic conjunction points between processes and channels are named ports. In this section, processes are always uppercase where names are lowercase. Moreover m e = (m1 , m2 , . . . , mn ) refers to any sequence of names. The following constructs are the basic of π-calculus: The input prefix. Process a(~x).P receives the sequence ~x of names on the port a; then, it behaves as P . The output prefix. Process a(~x).P sends the sequence ~x of names on the port a; then, it behaves as P . The summation. Process P1 + P2 behaves in a way that either P1 or P2 is performed. The choice is nondeterministic and works similarly to the nondeterministic choice between actions of ConGolog and IndiGolog. The composition. Process P1 | P2 performs both process P1 and P2 . Moreover, both are performed in parallel Q and can communicate with each other by channels. Abbreviation m i=1 (Pi ) = P1 |P2 | . . . |Pm denotes the composition of m processes. The restriction. Process (νy)P behaves like P but where y is a so-called restricted name. That is to say y cannot be a channel for communicating with the external environment (for example other processes). The matching. The [x = y].P process behaves like P if x and y are the same name. Otherwise, it behaves like the 0 process, that is the process doing nothing The replication. The !P process behaves like the one obtained by reexecuting process P an arbitrary deal of times. Moreover, expression P [~a/~b] in the π-calculus refers to the process obtained from P by substituting each name ai ∈ ~a for each name bi ∈ ~b. 3.1. PROCESS MODELLING LANGUAGES 29 Modeling workflow using π-calculus A first significant effort in modelling process in π-calculus is given in [44]. The approaches to formally model processes by π-calculus share the idea everything is a process: resources, activities, work lists and so on. The interaction between process participants and the engine is also modeled in this way. In our opinion, that fine granularity is not needed, but, rather, it causes the production of specifications which are less readable. Workflows an alternative approach, which produces specifications that are more slender (and, hence, more readable) than what generated by the aforementioned approach. In addition, this approach seems to be more solid and feasible as the paper introduces a mapping in π-calculus for several different control-flow patterns. In [114], every activity is an independent π-calculus process and coming-before relationships are modelled by values read and written on channel ports. The complete process definition for a basic activity A is: def A = x.[~a = ~b].τA .y.0 That means a process receives a trigger through port x mapping to an event (e.g., the completion of a preceding activity). Then, the process makes a certain comparison [~a = ~b], performs some internal work τA and, later, notifies the completion writing on a certain channel port y. Of course, that is the case of a single activity in a sequence. In general, an activity can be enabled only after several complete (say m); in addition, the completion can enable o subsequent activities. Therefore, supposing also n conditions to be checked, the general formalisation for an activity A is the following: def A = {xi }m ai = b~i ]}ni=1 .τA .{y}oi=1 .0 i=1 .{[~ In this way, all of basic control flow patterns can be mapped. A more comprehensive discussion and mappings is entrusted to paper [114]. Finally, the whole process specification is built by composing all different nodes A1 , . . . , An : def P = m Y Ai i=1 As far as checking for soundness, Puhlmann [113] provides means to characterize different soundness properties, such as relaxed and classical, using bisimulation equivalence. UppSala Universitet has developed independently the MWB (Mobility Workbench [138]) for manipulating and analyzing any mobile concurrent systems described in π-calculus, including business processes. 30 CHAPTER 3. LITERATURE REVIEW Language Formal Structural Soundness Semantic Soundness Non-Atomic Execution Planning Workflow Net YAWL EPC π-calculus Graphbased languages Yes Yes No No Yes No Yes Semi formal Yes Partially No Yes No No No Partially Yes No Yes No Early Stage No No No No State vs Event Based State State Event Event Event Table 3.1: A comprehensive comparison 3.1.5 Discussion Table 3.1 summarizes the assessment made in the light of the requirements described at the beginning of this section. A analysis of the results assessed is given below, where every language is discussed separately taking requirements into account. As pointed out by the table, no language addresses all the features that the framework proposed in this thesis require, including the necessity of being based on a notion of state. Workflow Nets. It is a sound formalism for representing business processes which is formal enough to enable reasoning and process verification. Current research directions in term of verification have been just limited to check the structural soundness according to Definition 3.1. Such a checking does not consider the actual environment where processes are enacted. As a consequence, when running, process instance may get stuck since some activities might require certain environmental conditions that do not currently hold. No work is currently trying to address such a kind of execution monitoring. In theory, Workflow nets is suitable as process modelling language for the adaptation framework in this thesis. Indeed, pre- and post-conditions can be formally specified as well as it is precisely and unambiguously defined when a certain state is final and how to pass from a state to another one. But there are some drawbacks which limit its application: 1. When transitions fire, tokens are consumed from their input places and others are put on the output places. These steps are atomic in the sense that nothing can happen in the meanwhile (e.g., firing of other transitions). Considering a transition firing represents an activity performance, such atomicity is somehow in contradiction. During the activity execution, events can happen and change the environment, and that may cause started activities to be unable to complete. Our adaptation framework has to be able to monitor even during the activity perfor- 3.1. PROCESS MODELLING LANGUAGES 31 mances. Workflow nets cannot be directly used, unless some artifices are introduced, which would make explode the complexity of the model. 2. Algorithmically, it would allow designers to define processes, and probably also to monitor and recovery as there exist researches on Petri-Net based planners (e.g., [63]). Nevertheless, IndiGolog allows, in addition, one to encode the whole framework only by itself (see next chapter). Indeed, the aspect of monitoring and recovering is directly modelled through IndiGolog procedures in a very natural way. Workflow Net is a “low-level” formalism and, as such, it cannot achieve easily the same results. For instance, we could concretely code the whole framework through the sole IndiGolog interpreter (see Chapter 5) whereas using Workflow Nets would have required different parts to be developed using different languages, and additional effort would have been needed, without gaining concrete benefits. It is also worthy saying that Petri-Net based planning techniques are not as mature and efficient as those based on logics. Indeed, as far as our knowledge, there exist no planners which take input any form of PN coding. YAWL. An extended workflow net (C, i, o, T, F, split, join, rem, nof i) can be reduced to an usual workflow net (C, T, F ), obviously loosing the additional features. It follows that most of the limitations of workflow nets also hold for YAWL. The YAWL formalism overtakes only the limitation concerning the modelling the temporal aspects of activity executions, since it models explictly the different states in which an activity can be. On the other hand, there exist no planners for YAWL. EPC. Event-driven Process Chains allow process designers to model processes from a user-oriented perspective. The alternation of events and functions yields to process representations that may become very complex. That is the reason why they are generally used to model processes at a high level, where representations generally remain reasonably small. This high-level process specifications are meant to be read and evaluated by humans and cannot serve as input for Process Management Systems. As also argued in [74], the informal EPC’s nature cannot be directly translated into a proper semantics. Describing in details pre- and post-conditions of functions (i.e., tasks/activities to perform) would result in huge specifications. Moreover, there are no standards to formalise conditions in a way they can be used for reasoning, such as our adaptation framework would require. As a matter of fact, most of the research work on EPCs is currently addressing to the problem of verifying the structural soundness of specifications. For instance, Mendling et al. [97] make an analysis of 604 SAP reference models 32 CHAPTER 3. LITERATURE REVIEW in order to look whether some of them contain structural errors. Specifically, their correctness criterion is based on relaxed soundness. Indeed, EPCs are argued as being frequently used to capture the expected behaviour without considering unwilling execution leading to deadlocks or livelocks. Mendling et al. [96] define a new enriched formal definition of EPCs which enables to check for the structure soundness. Apart from the verification of formal correctness at design-time, a process represented in EPC has to be executed at run-time, achieving the outcomes it has been devised for. There exists no research work for verifying whether such processes achieve the expected results in the actual real-world scenario. We are confident to state that would be in any case hard since it is difficult to enter into the activity semantics since pre- and post-conditions of activities are not formally represented in a proper way. π-calculus. it is a formal and sound formalism to represent business processes and reason on it. As for other formalisms, most of research work has been focused on verifying the structural soundness of process specification. But, nothing is told on how to monitor in at run-time the progression of running instances specified in π-calculus and check whether they can successfully terminate in the current state of the world. Moreover, π-calculus is event-based: transitions are modeled explicitly where the states between subsequently transitions are only modeled implicitly. That introduces many critical issues. Firstly, it is difficult to monitor deviations since these concern the gap between the state expected and that monitored. Secondly, planners are generally based on state and, hence, it is needed to rebuild a certain definition of states from the message model of π-calculus. This step, which can be made, requires additional effort without gaining a real benefit. Graph-based languages. These drawbacks are also shared with most of graph-based languages. Graph-based languages, which are not described in detail in the following sub-sections, are a collective name for some languages which are used by or, simply, meant for commercial or prototypal Process Management Systems in order to define process specifications. This class comprises, for instance, BPMN [105] or those used by AgentWork [101] or AdeptFlex [117]. Process elements, such as activities, joins or splits, are represented by nodes that are connected by proper edges. These languages are typically event-based, that represents a serious issues as said before, and typically allow designers to represent specifications in a more formal way than EPC. However, these languages are anyway still too informal to check whether deviations happened in the environment that require a recovery plan. 3.2. RELATED WORKS ON ADAPTABILITY 33 Process Adaptation Adaptation of Process Specifications Manual Adaptation Migration of running instances Checking for structural and semantic soundness Ad-hoc Adaptation of Single Instances Manual Adaptation Automatic Adaptation Unplanned Pre-planned Our Approach Figure 3.4: A taxonomy of the adaptation techniques 3.2 Related Works on Adaptability This section discusses the levels of support for process adaptability/flexibility and exception handling in several of the leading commercial products and some academic prototypes. Figure 3.4 shows a taxonomy of the adaptation techniques. Changes to a process can be classified in two main groups: evolutionary and exceptional changes. Orthogonally, there is the issue of verifying the soundness of the updated process specifications and/or of running instances adapted to occurred changes. Whereas there are a lot of research on the structural soundness, as widely discussed in Section 3.1, little work has been done on the semantic soundness of process changes [125, 87]. The most valuable approach is implemented in ADEPT (see later in this section), but activity conflicts are defined manually and not inferred automatically over the activity pre- and post-conditions. Evolutionary changes concern a planned migration of a process to an updated specification which, for instance, implements new legislations, policies or practices in business organisations, hospitals, emergency management, etc. Typically the inclusions of new evolutionary aspects are made manually by the process designer. When dealing with process specification changes, there 34 CHAPTER 3. LITERATURE REVIEW try { try { activity1; activity2; activity3; || subProcess(); } catch(Disconnection) { ... } catch(Devices Down) { ... } catch(Exception1) { ... } catch(Exception2) { ... } catch(Exception3) { ... } catch(Exception4) { ... } Pre-planned activity1; activity2; activity3; || subProcess(); } catch(AnyException) { /* Generic method */ } Unplanned Table 3.2: A Java-like model for describing Automatic Adaptation is the issue of managing running instances, and, possibly, making migrate such instances to the updated specification. Simple solutions, such as aborting running instances or continuing with the old specification, may not be working for obvious reasons. Aborting, when possible, would cause some work already done to be lost, and using old specification may result in applying old legislation and, hence, would be inappropriate and impracticable. Casati et al. [19] define a complete, minimal, consistent and sound set of modification primitives to modify specifications. This paper describes also the constraints under which running instances can be migrated to new updated specifications. Unfortunately, it does cover the issue of applying automatically the changes, and, hence, a domain expert is supposed to manually apply them. Weske [142] goes beyond and provides a technique that is able to adapt running cases by adding, deleting and moving activities in order to adhere to a new specification. This technique has been, then, implemented for WASA [139]. Similarly to Casati et al. [19], Weber et al. [140] suggest a set of change patterns (such as inserting, deleting or moving process fragments) that may be useful when modifying specifications. The set proposed is wider than what is in [19]; in addition, the paper reports how many of change patterns are actually implemented in the most spread Process Management Systems. On the other side, there are the exceptional changes which are characterised by events, foreseeable or unforeseeable, during the process instance executions which may require instances to be adapted in order to be carried out. Since such events are exceptional, process specifications do not need any modifications. There are two ways to handling exceptional events. The adap- 3.2. RELATED WORKS ON ADAPTABILITY 35 tation can be manual : once events are detected, a responsible person, expert on the process domain, modifies manually the affected instance(s). The adaptation can be automatic: when exceptional events are sensed, PMS is able to change accordingly the schema of affected instance(s) in a way they can still be completed. Automatic adaptation techniques can be broken down in two further groups: pre-planned or unplanned. Using pre-planned adaptation techniques, a responsible person should foresee at design-time all possible exceptional events and, for each of them, should define a proper handling strategy. This kind of pre-planned approach is named Flexibility by design in [123]. The same paper introduces also Flexibility by underspecification: in certain cases, the designer may be aware of that certain exceptional events may occur, but the recovery strategies cannot be known in advance but only defined at run-time. Several proposals have been made to define pre-planned policies, such as Control ICN [16] or Event-Condition-Action Rules [20]. Unplanned adaptation techniques, conversely, do not require to anticipate all of the possible expected events but there exists only one strategy, which is able to recover from any deviation. Table 3.2 is meant to clarify the differences between the two technique groups by using the Java metaphor. The left-hand side represents pre-planned adaptation where the process is put in a try block and there exists several catches, one for each expected exceptional event. Each catch block implements the strategy for recovering from the corresponding event. The righthand side aims at describing Unplanned adaptation, where, by contrast, only one catch exists, which describes the generic strategy to recovery from any possible. The remaining of this section is devoted to enumerate how some commercial products and academic prototypes address process adaptation. Table 3.2 summarises the comparison of existing approaches as far as concern ad-hoc adaptation of single instances. The last row shows how SmartPM is envisioned in this categorisation. Rows having no checkmark refer to the PMSs that do not allow to change directly running instances during the execution. In those systems, ad-hoc adaptation is done undirectly: such PMSs allow to modify specifications and corresponding changes are, then, propagated to running instances. 36 CHAPTER 3. LITERATURE REVIEW Product YAWL COSA Tibco WebSphere SAP OPERA ADEPT2 ADOME AgentWork CBRFlow WASA SmartPM Manual Pre-planned X X X X X X X The right policy chosen at run-time X X X X X Unplanned X Table 3.3: Features provided by the most spread PMSs for ad-hoc adaptation of single instances Discussion. SmartPM can be classified as belonging to the group of adaptation strategies that are automatic and unplanned. We are not interested in the problem of migrating process instances to updated models. Indeed, such a problem is generally related to long-term processes; pervasive processes, such as those of emergency management or pervasive health-care, are short-term as they complete relatively quickly. For instance, the process of saving people under debris or of provide medical assistance to injured people has to be carried out very quick to limit the risks for persons. We cannot manage adaptation by pre-planned techniques or manually for some reasons. Firstly, in pervasive scenarios, the environment is continually changing and, therefore, events that require processes to be adapted are not exceptional but very frequent. Therefore, it is not feasible to think about a responsible person who is devoted to adapt manually process instances on very frequent time basis. Moreover, this would delay the completion of process instances, and that should be avoided as much as possible since typically pervasive processes are time-constrained. Pre-planned techniques should be avoided, as well. Indeed, pervasive process management systems, such as SmartPM, are expected to be used in environments where there occur a great deal of possible exceptional events. Foreseeing all of them is not feasible; even if we handled many exceptional events, we would forget to consider others that may occur. The section has broadly discussed existing approaches on the concern of adapting processes, and Table 3.2 summarises the discussion. It is easy to see almost all approaches are addressing the problem by manually adapting the process instances or through various flavours of pre-planned techniques. The lack of unplanned techniques can be partially motivated by the fact that the majority of PMSs are intrinsically intended for traditional business processes where either every change or every recovery policy is generally subject to the chief approval. 3.2. RELATED WORKS ON ADAPTABILITY 37 YAWL provides one the most interesting approaches for adaptability [1] (more details about YAWL are given in Section 7.2.6). In YAWL each activity may be associate to an extensible repertoire of actions, one of which is chosen at run-time to carry out the activity. These actions are named “worklet”: a worklet is a small, self-contained, complete process which handles one specific task (action) in a large and composite process. On the basis of hierarchical Ripple-Down rules defined contextually, an activity is dynamically replaced by the most appropriate of the available worklets. This approach is pre-planned : the substitution rules are defined at design-time, possibly manually updated at run-time, and never inferred on the process instance state. WASA systems is totally manual [139] and concerns modifying process specification according to evolutionary changes. As discussed above in this section, It focuses also on checking whether running instances can be migrated to update specifications [142]. COSA allows to associate sub-processes to external “triggers” and events [27]. But the adaptation policies are pre-planned : associations triggers to sub-processes have to be defined at design-time. COSA allows also manual instance adaptations at run-time by using change patterns such as reordering, skipping, repeating, postponing or terminating steps. Tibco iProcess Suite provides constructs called “event nodes” [127]. They allow designers to define handlers for expected events to be activated when they occur. Policies comprise the possibility of suspending processes either indefinitely or wait until a deadline occurs. All of exceptions for which no handler exists are forward to the “default exception queue” to be handled manually. WebSphere MQ Workflow supports deadlines and, when they occur, branches to pre-planned exception paths and/or sends notification messages to specific administrator [68]. Administrators can manually suspend, restart or terminate processes, as well as they can reallocate tasks. SAP Workflow allows to define exception handling processes [73]. Although they are defined at design-time, they cannot be associated to exceptional events at that time. At run-time, when an event occurs, these handling processes are proposed to the administrator, who can manually select the most appropriate one. There is no way to define properties in order to filter out some handlers on the basis of the case and the occurred event. 38 CHAPTER 3. LITERATURE REVIEW OPERA prototype allows to associate at design-time an handler for a certain event to single tasks [59]. That means such an handler is launched, only when the event occurs and blocks those tasks. It allows also to define more general handlers for certain events to associate to all tasks. When an exceptional event occurs, the process is stuck and, if any, the corresponding handler for that event and that task is invoked. If it cannot recover the execution or there is no specific handler for that task and event, the general handler for that event is used. If it does not exist or cannot solve, manual adaptation can be used. ADEPT is able to handle both exceptional events and evolutionary changes [20]. All of changes can be achieved by manual interventions, although ADEPT provides a minimal support to facilitate such operations. As far as evolutionary changes, it supports also the feature of migrating running instances to the updated specifications. Version 2 introduces new features [118, 54], such as the structural and semantic correctness of the changes [87], but the ADEPT approach should be still considered as totally manual. ADEPT is one of the few works dealing with the issues of checking for semantic correctness, and this aspect is very valuable for Process Management Systems like SmartPM where adaptation is willing to be completely automatic. Unfortunately, the semantic correctness relies on a significant effort for configuration. Indeed, checking is not computed automatically on the basis of preand post-conditions of activities but it relies on the semantic constraints that are defined manually by designers at design-time. This is also related to the semi-formality of the process modelling language of ADEPT, which does not allow automatic reasoning. ADEPT2 relies on two relations between pairs of activities, namely dependency and exclusion. These relations allow to specify respectively (i) what activities are depending (the first in pairs can be executed only if the second has been also executed) and (ii) what activities are not compatible in achieving the outcomes of process instance. ADOME system provides some support for manual changes as well as for pre-planned policies [26]. An exception handler is linked to a certain task, instead of being associated to events. When a given exceptional event is such that it makes impossible to execute a certain task, the recovery policy for that task is used, if any. AgentWork provides the ability to modify process instances by dropping and adding individual tasks based on Event-Condition Rules rules [101]. They are formally defined by ACTIVETFL, which combines Frame Logic, a logic based on the notion of objects, with some features available for temporal logics. 3.3. CASE HANDLING 39 Consequently, AgentWork is comprised in the pre-planned approaches. Since the graph-based model used by AgentWork is not very formal, there is no way to check for semantic soundness and conflicts. Therefore, some rules may generate incompatible recovery actions. CBRFlow uses a case-based reasoning approach to support adaptation of workflow specifications to changing circumstances [141]. Case-based reasoning (CBR) is the process of solving new problems based on the solutions of similar past problems: users are supported to adapt process specifications by taking into account how other specifications have been modified in the past to follow the evolutionary changes. 3.3 Case Handling Case Handling aims at providing less rigidity than usual Process Management Systems by leveraging process orientation and data orientation to route the execution of processes [129, 130]. Flower [12] is one of the few systems that use the Case Handling paradigm Case Handling meets the requirements in some application scenarios where process participants are highly skilled. In such cases, organizations want participants to be more autonomous in driving and control the case executions. Being more rigid, traditional Process Management technologies does not allow expert participants to make deviate processes from the prescribed schema. In some scenarios, from the one side business process management is still valuable, but from the other side participants should be able to perform a broad range of activities and, consequently, drive how processes are carried on. Let us consider the scenarios of a process for the delivery of certain goods. At an certain stage, participants may be required to fill in a certain form to provide some data that also include some information for the mail delivery (e.g., the mail address, the post code). The next step consists in sending the goods and could be done as soon as delivery information is available. By traditional Process Management Technology, goods can be sent only when the whole form is filled in, including information which is not directly needed for the delivery. Case Handling approaches allow to enable the next step as soon as the delivery information is put into the form and committed, although the form itself is not yet wholly completed. Indeed, the work-item enablement is mainly drive by data availability in the case handling approach, whereas process management steers enablement by the control flow (coming-before relationship). In case handling, every activity is associated with at least a data object definition. There exist two main types of association: mandatory and re- 40 CHAPTER 3. LITERATURE REVIEW stricted. If a data object is mandatory for a certain activity, then it has to be entered in order to complete that activity. A data object is restricted to a certain activity set if it can be entered only by activities in that set. Associations mandatory and restricted are orthogonal; for instance, if a data object mandatory and restricted to a given activity, it means it is going to be entered by that activity (and by none else). Data objects may also be free in the sense that it can be entered in any moment. More information about case handling can be read in the aforementioned papers. From the one side, highly dynamic and pervasive scenarios could benefit from less rigidity, which is somehow intrinsical in these settings. Furthermore, weak activity boundaries and data-driven nature of Case Handling prevent in several cases the need for changes. Finally, the environment of the execution of processes running in dynamic scenarios are often continuously changing and, hence, it is really valuable for a participant to be going over the old process schema and decide to handle multiple activities in one go. But, from the other side, Case Handling systems experience some shortcomings: 1. Case Handling is data-driven: a certain state is reached when some data become available. Consequently, handling such cases has to be mainly performed within the system itself: the activity outcomes are always represented by the data produced. In many pervasive scenarios, such as healthcare or emergency management processes, the main effects of activities are not represented in the systems themselves. For instance, the outcomes of saving or aiding victims are not naturally definable as data updated or manipulated. 2. The nature of case handling makes quite difficult to modify cases which are already running as also argued in [129, 130, 56]. Generally, adaptability in case handling is designated to affect new cases going to be created after the modification. In highly dynamic and pervasive scenarios, changes should affect only running cases as they are fired by exogenous events which somehow invalidated them. 3. Activities are no longer pushed to participants by the system. The system becomes a discreet support to accomplish activities rather than a way to control the case progressions in a mechanic way. This might be problematic in some high dynamic and pervasive scenarios where process participants have to be continuously pushed by PMS to perform the work assigned as fast as possible. Therefore, the Case Handling approach is not actually feasible for the scenarios at which we are aiming. Günther et al. [56] start discussing a possible 3.3. CASE HANDLING 41 integration of case handling and typical Process Management. They start exploring how ideas from case handling can be introduced into Process Management Systems to gain the corresponding benefits and what changes would be required. Unfortunately, the discussion is still at an initial stage and, hence, unable to address the concerns of process management for pervasive scenarios. 42 CHAPTER 3. LITERATURE REVIEW Chapter 4 A General Framework for Automatic Adaptation This chapter is devoted to describe a general conceptual framework for SmartPM, the adaptive Process Management System (PMS) object of this thesis. This chapter aims at presenting a practical technique for solving adaptation, which is based on planning in AI. Moreover, we prove the correctness and completeness of the approach. In SmartPM, process specifications associates every task to a set of capabilities that the service going to execute it has to provide. Every task can be assigned to a given service that provides all capabilities required if a set of conditions hold. Conditions are defined on control and data flow (e.g., a previous task has to be finished, a variable needs to be assigned a specific range of values, etc.). This kind of conditions can be somehow considered as “internal”: they are handled internally by the PMS and, thus, easily controllable. Another type of conditions exists, the “external” ones: they depend on the environment where process instances are carried on. These conditions are more difficult to keep under control and a continuous monitoring to detect discrepancies is required. Indeed we can distinguish between a physical reality and a virtual reality [31]; the physical reality is the actual values of conditions, whereas the virtual reality is the model of reality that PMS uses in making deliberations. A PMS builds the virtual reality by assuming the effects of tasks/actions fill expectations (i.e., they modify correctly conditions) and no exogenous events break out, which are capable to modify conditions. When the PMS realizes that one or more events caused the two kinds of reality to deviate, there are three possibilities to deal with such a discrepancy: 1. Ignoring deviations – this is, of course, not feasible in general, since the new situation might be such that the PMS is no more able to carry out the process instance. 43 44 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION 2. Anticipating all possible discrepancies – the idea is to include in the process schema the actions to cope with each of such failures. As discussed in Chapter 3, most of PMSs use this approach. For simple and mainly static processes, this is feasible and valuable; but, especially in mobile and highly dynamic scenarios, it is quite impossible to take into account all exception cases. 3. Devising a general recovery method able to handle any kind of exogenous events. As discussed in Chapter 3, the process is defined as if exogenous actions causing deviations cannot occur, that is everything runs fine (the try block). Whenever the execution monitor (i.e., the module intended for execution monitoring) detects discrepancies leading the process instance not to be terminable, the control flow moves to the (unique) catch block. The catch block activates the general recovery method to modify the old process P in a process P 0 so that P 0 can terminate in the new environment and its goals are included in those of P . Here the challenge is to automatically synthesize P 0 during the execution itself, without specifying a-priori all the possible catches. In summary, this chapter aims (i) at introducing a general conceptual framework in accordance with the third approach previously described, and (ii) at presenting a practical technique, in the context of this framework, that is able to automatically cope with anomalies. We prove the correctness and completeness of such a technique, which is based on planning techniques in AI. This chapter extends the framework published in paper [40] and revises it in the light of the subsequent operationalisation which was devised after the paper. Section 4.1 introduces some preliminary notions, namely Situation Calculus and IndiGolog, that are used as proper formalisms to reason about processes and exogenous events. This section is not meant to give an all-comprehensive and very formal introduction of the notions. It aims mostly at giving an overall insight to those who are not very expert on such topics. Section 4.2 presents the general conceptual framework to address adaptivity in highly dynamic scenarios, and introduces a running example. Section 4.3 presents the proposed formalization of processes, and Section 4.4 deals with the adaptiveness. Section 4.5 presents the specific technique and proves its correctness and completeness. The chapter introduces and carries on also a concrete example, which is continually extended to cover and explain better the different concepts introduced. This example will be, then, operationalised in the Chapter 5. 4.1. PRELIMINARIES 4.1 45 Preliminaries SmartPM uses the situation calculus (SitCalc) to formalize adaptation. The SitCalc is a logic formalism designed for representing and reasoning about dynamic domains [119]. We will not go over the situation calculus in detail; we merely note the following components: there is a special constant S0 used to denote the initial situation, namely that situation in which no actions have yet occurred; there is a distinguished binary function symbol do, where do(a, s) denotes the successor situation to s resulting from performing the action a; relations (resp. functions) whose values vary from situation to situation, are called fluents, and are denoted by predicate (resp. function) symbols taking a situation term as their last argument. There is a special predicate P oss(a, s) used to state that action a is executable in situation s. We abbreviate with do([a1 , . . . , an−1 , an ], s) the term do(an , do(an−1 , ..., do(a1 , s))), which denotes the situation obtained from s by performing the sequence of actions a1 , . . . , an . Within this language, we can formulate domain theories which describe how the world changes as the result of the available actions. Here, we use action theories of the following form: • Axioms describing the initial situation, S0 . • Action precondition axioms, one for each primitive action a, characterizing P oss(a, s). • Successor state axioms, one for each relational fluent F . The successor state axiom for a particular fluent F captures the effects and non-effect of actions on F and has the following form: F (~x, do(a, s)) ⇔ ΦF (~x, a, s) (4.1) where ΦF (~x, a, s) is a formula fully capturing the truth-value of fluent F on objects ~x when action a is performed in situation s (~x, a, and s are all the free variables in ΦF ). • Unique names axioms for the primitive actions. • A set of foundational, domain independent axioms for situations Σ as in [119]. A certain formula is uniform in situation s if s is the only situation term that appears in it. Sometimes, we use situation-suppressed formulas; these are uniform formulas with situation arguments suppressed (e.g. G denotes 46 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION Construct a φ? δ1 ; δ2 δ1 | δ2 π x. δ δ∗ if φ then δ1 else δ2 while φ do δ δ1 k δ2 δ || hφ → δi proc P (~x) δ endProc ~ P (θ) Σ(δ) Meaning primitive action wait for a condition sequence nondeterministic branch nondeterministic choice of argument nondeterministic iteration conditional while loop concurrency with equal priority concurrent iteration interrupt procedure definition procedure call search operator Table 4.1: IndiGolog constructs the situation-suppressed expression for G(s)). Finally, we can introduce an ordering among situations: s ≤ s0 ⇔ ∃[a1 , . . . , an−1 , an ].do([a1 , . . . , an−1 , an ], s) = s0 On top of these theories of action, one can define complex control behaviors by means of high-level programs expressed in Golog-like programming languages. Specifically we focus on IndiGolog [121], which provides a set of programming constructs sufficient for defining every well-structured process as defined in [72] IndiGolog is a logic-based languages born to program the behaviour of intelligent agents and robots. It derives from ConGolog to which it adds basically the lookahead search operator. Such operator allows to simulate the execution of a process with the aim of searching for a successful termination before actually performing the program in the real world. In its, turn ConGolog extends the original Golog by introducing construct for current execution of different operations. Table 4.1 summarizes the constructs of IndiGolog used in this thesis. In the first line, a stands for a situation calculus action term whereas, in the second line, φ stands for a formula over situation calculus predicates including fluents, which are, then, evaluated in the current situation when IndiGolog program execution reaches φ The constructs listed included some nondeterministic ones. These include (δ1 | δ2 ), which nondeterministically chooses between programs δ1 and δ2 , π x. δ, which nondeterministically picks a binding for the variable x and per- 4.1. PRELIMINARIES 47 forms the program δ for this binding of x, and δ ∗ , which performs δ zero or more times. π x1 , . . . , xn . δ is an abbreviation for π x1 . . . . .π xn δ. The constructs if φ then δ1 else δ2 and while φ do δ are the synchronized versions of the usual if-then-else and while-loop. They are synchronized in the sense that testing the condition φ does not involve a transition per se: the evaluation of the condition and the first action of the branch chosen are executed as an atomic unit. So these constructs behave in a similar way to the test-and-set atomic instructions used to build semaphores in concurrent programming. We also have constructs for concurrent programming. In particular (δ1 k δ2 ) expresses the concurrent execution (interpreted as interleaving) of the programs δ1 and δ2 . Observe that a program may become blocked when it reaches a primitive action whose preconditions are false or a wait action φ? whose condition φ is false. Then, execution of (δ1 k δ2 ) may continue provided another program executes next. Another concurrent programming construct is (δ1 ii δ2 ), where δ1 has higher priority than δ2 , and δ2 may only execute when δ1 is done or blocked. Finally, an interrupt hφ → δi has a trigger condition φ, and a body δ. If the interrupt gets control from higher priority processes and the condition φ is true, the interrupt triggers and the body is executed. Once the body completes execution, the interrupt may trigger again. h~x : φ → δi is an abbreviation for h∃~x.φ → π~x.δi. Finally, the search operator Σ(δ) is used to specify that lookahead should be performed over the (nondeterministic) program δ to ensure that nondeterministic choices are resolved in a way that guarantees its successful completion. Formally two predicates are introduced to specify program transitions: • T rans(δ 0 , s0 , δ 00 , s00 ), given a program δ 0 and a situation s0 , returns (i) a new situation s00 resulting from executing a single step of δ 0 , and (ii) δ 00 which is the remaining program to be executed. • F inal(δ 0 , s0 ) returns true when the program δ 0 can be considered successfully completed in situation s0 . The predicate Trans for programs without procedures is characterized by the following set of axioms: 1. Empty program: Trans(nil, s, δ 0 , s0 ) ⇔ false. 2. Primitive actions: Trans(a, s, δ 0 , s0 ) ⇔ Poss(a[s], s) ∧ δ 0 = nil ∧ s0 = do(a[s], s). 48 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION 3. Test/wait actions: Trans(φ?, s, δ 0 , s0 ) ⇔ φ[s] ∧ δ 0 = nil ∧ s0 = s. 4. Sequence: Trans(δ1 ; δ2 , s, δ 0 , s0 ) ⇔ ∃γ.δ 0 = (γ; δ2 ) ∧ Trans(δ1 , s, γ, s0 ) ∨ Final(δ1 , s) ∧ Trans(δ2 , s, δ 0 , s0 ). 5. Nondeterministic branch: Trans(δ1 | δ2 , s, δ 0 , s0 ) ⇔ Trans(δ1 , s, δ 0 , s0 ) ∨ Trans(δ2 , s, δ 0 , s0 ). 6. Nondeterministic choice of argument: Trans(πv.δ, s, δ 0 , s0 ) ⇔ ∃x.Trans(δxv , s, δ 0 , s0 ). 7. Nondeterministic iteration: Trans(δ ∗ , s, δ 0 , s0 ) ⇔ ∃γ.(δ 0 = γ; δ ∗ ) ∧ Trans(δ, s, γ, s0 ). 8. Synchronized conditional: Trans(if φ then δ1 else δ2 endIf, s, δ 0 , s0 ) ⇔ φ[s] ∧ Trans(δ1 , s, δ 0 , s0 ) ∨ ¬φ[s] ∧ Trans(δ2 , s, δ 0 , s0 ). 9. Synchronized loop: Trans(while φ do δ endWhile, s, δ 0 , s0 ) ⇔ ∃γ.(δ 0 = γ; while φ do δ) ∧ φ[s] ∧ Trans(δ, s, γ, s0 ). 10. Concurrent execution: Trans(δ1 k δ2 , s, δ 0 , s0 ) ⇔ ∃γ.δ 0 = (γ k δ2 ) ∧ Trans(δ1 , s, γ, s0 ) ∨ ∃γ.δ 0 = (δ1 k γ) ∧ Trans(δ2 , s, γ, s0 ). 11. Prioritized concurrency: Trans(δ1 ii δ2 , s, δ 0 , s0 ) ⇔ ∃γ.δ 0 = (γ ii δ2 ) ∧ Trans(δ1 , s, γ, s0 ) ∨ ∃γ.δ 0 = (δ1 ii γ) ∧ Trans(δ2 , s, γ, s0 ) ∧ ¬∃ζ, s00 .Trans(δ1 , s, ζ, s00 ). 4.2. EXECUTION MONITORING 49 12. Concurrent iteration: Trans(δ || , s, δ 0 , s0 ) ⇔ ∃γ.δ 0 = (γ k δ || ) ∧ Trans(δ, s, γ, s0 ). By using T rans and F inal we can define a predicate Do(δ, s, s0 ) which, given the starting situation s and a program δ, holds for all possible situations s0 that result from executing δ starting from s such that situations s0 are final with respect to program δ 0 remaining to execute. Formally: Do(δ, s, s0 ) ⇔ ∃δ 0 .T rans∗ (δ, s, δ 0 , s0 ) ∧ F inal(δ 0 , s0 ) where T rans∗ is the definition of the reflective and transitive closure of Trans. Notice that there may be more than one resulting situation s0 since IndiGolog programs can be non-deterministic (e.g., due to concurrency). To cope with the impossibility of backtracking actions executed in the real world, IndiGolog incorporates a new programming construct, namely the search operator. Let δ be any IndiGolog program, which provides different alternative executable actions. When the interpreter encounters program Σ(δ), before choosing among alternative executable actions of δ, it performs reasoning in order to decide for a step which still allows the rest of δ to terminate successfully. More precisely, according to [30], the semantics of the search operator is that T rans(Σ(δ), s, Σ(δ 0 ), s0 ) ⇔ T rans(δ, s, δ 0 , s0 ) ∧ ∃s∗ .Do(δ 0 , s0 , s∗ ). If δ is the entire program under consideration, Σ(δ) emulates complete offline execution. Finally, our adaptation procedure will make use of regression (see [4] and [119]). Let ϕ(do([a1 , . . . , an ], s)) be a SitCalc formula with situation argument do([a1 , . . . , an ], s). Then, Rs (ϕ(do([a1 , . . . , an ], s))) is the formula with situation s which denotes the facts/properties that must hold before executing a1 , . . . , an in situation s for ϕ(do([a1 , . . . , an ], s)) to hold (aka the weakest preconditions for obtaining ϕ). To compute the regressed formula Rs (ϕ(do([a1 , . . . , an ], s))) from ϕ(do([a1 , . . . , an ], s)), one iteratively replaces every occurrence of a fluent with the right-hand side of its successor state axiom (Formula 4.1) until every atomic formula has a situation argument that is simply s. 4.2 Execution Monitoring The general framework is based on execution monitoring formally represented in SitCalc [83, 31]. After each action, the PMS has to align the internal world 50 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION Figure 4.1: Execution Monitoring representation (i.e., the virtual reality) with the external one (i.e., the physical reality), since they could differ due to unforeseen events. When using IndiGolog for process management, tasks are considered as predefined sequences of actions (see later) and processes as IndiGolog programs. Before a process starts to be executed, the PMS takes the initial context from the real environment as initial situation, together with the program (i.e. the process) δ0 to be carried on. The initial situation s0 is given by first-order logic predicates. For each execution step, the PMS, which has a complete knowledge of the internal world (i.e., its virtual reality), assigns a task to a service. The only assignable tasks are those ones whose preconditions are fulfilled. A service can collect from the PMS the data which are required in order to execute the task. When a service finishes executing the task, it alerts the PMS of its completion. The execution of the PMS can be interrupted by the monitor when a misalignment between the virtual and the physical reality is sensed. When this happens, the monitor adapts the program to deal with such a discrepancy. Figure 4.1 illustrates such an execution monitoring. The kind of monitor described here is a specialised version of what proposed by Soutchanski et al. [31]. At each step, PMS advances the process δ in the situation s by executing an action, resulting in a new situation s0 with the process δ 0 remain- 4.2. EXECUTION MONITORING 51 ing to be executed. The state1 is represented as first-order formulas that are defined on situations. The current state corresponds to the boolean values of these formulas evaluated on the current situation. Both the situation s0 and the process δ 0 are given as input to the monitor. The monitor collects data from the environment through sensors (here sensor is any software or hardware component enabling to retrieve contextual information). If a deviation is sensed between the virtual reality as represented by s0 and the physical reality as s00 , the PMS internally generates a discrepancy ~e = (e1 , e2 , . . . , en ), which is a sequence of actions called exogenous events such that s00 = do(~e, s0 ).2 Notice that the process δ 0 may fail to be correctly executed (i.e., by assigning all tasks as required) in s00 . If so, the monitor adapts the process by generating a new process δ 00 that pursues at least each δ 0 ’s goal and is executable in s00 . At this point, the PMS is resumed and the execution is continued from δ 00 and s00 . We end this section by introducing our running example, stemming from project WORKPAD, described in Chapter 2. Example 4.1. The example is meant to code a possible process for managing the aftermath of an earthquake: a team is sent to the affected area to make an assessment, which comprises taking some valuable photos, compiling a questionnaire and sending all these data to the headquarter. Here we assume that it is already known which buildings have to be assessed, namely buildings A, B and C. The team is equipped with PDAs in which some software services are installed and members are communicating with each other through a manet network. For each building, an actor compiles a questionnaire by using a certain software service, that is an specific application installed on some actor devices. Compiling questionnaires can be done anywhere: that is, no movement is required. Then, another actor/service has to be sent to the specific building to collect some pictures (this, conversely, requires movement). Finally, according to the information filled in the questionnaire, a third actor/service evaluates effectiveness of collected pictures. In order to evaluate properly the pictures, a certain minim number of pictures is required as input: if less, the evaluation cannot be done. Once evaluated, if pictures are judged as not effective, the task of taking new pictures is scheduled again (as well as the evaluation of the new pictures). When these steps have been performed for the three buildings A, B and C, the collected data (questionnaires and pictures) are sent to the headquarter. 1 Here we refer as state both the tasks’ state (e.g, performable, running, terminated, etc.) and the process’ variables on which task firing and process routing is defined 2 Note that the action sequence ~e might not be the one that really occurred. 52 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION Compile Questionnaire of building A Compile Questionnaire of building B Compile Questionnaire of building C Move to destination A Move to destination B Move to destination C Take photos of Destination B Take photos of Destination C Evaluate photos Evaluate photos Take photos of destination A Evaluate photos ¬evaluationOK (A) ¬evaluationOK (B) ¬evaluationOK (C) evaluationOK (B) evaluationOK (C) evaluationOK (A) Send data to headquarter Figure 4.2: A possible process to be carried on in disaster management scenarios Coordination and data exchange requires manet nodes to be continually connected each other. But this is not guaranteed in a manet. The environment is highly dynamic, since nodes move in the affected area to carry out assigned tasks. Movements may cause possible disconnections and, so, unavailability of nodes, and, consequently, unavailability of provided services. Therefore processes should be adapted, not simply by assigning tasks in progress to other services, but also considering possible recovery of the services. 4.3. PROCESS FORMALISATION IN SITUATION CALCULUS 4.3 53 Process Formalisation in Situation Calculus Next we detail the general framework proposed above by using Situation Calculus and IndiGolog. We use some domain-independent predicates to denote the various objects of interest in the framework: • service(a): a is a service • task(x): x is a task • capability(b): b is a capability • provide(a, b): the service a provides the capability b • require(x, b): the task x requires the capability b Every task execution is the sequence of four PMS actions: (i) the assignment of the task to a service, resulting in the service being not free anymore; (ii) the notification to the service to start executing the task. Then, the service carries out the tasks and, after receiving the service notification of the task conclusion, (iii) the PMS acknowledges the successful task termination. Finally, (iv) the PMS releases the service, which becomes free again. We formalise these four actions as follows: • Assign(a, x): task x is assigned to a service a • Start(a, x, p): service a is allowed to start the execution of task x. The input provided is p. • AckT askCompletion(a, x): service a concluded successfully the executing of x. • Release(a, x): the service a is released with respect to task x. In addition, services can execute two actions: • readyT oStart(a, x): service a declares to be ready to start performing task x • f inishedT ask(a, x, q): services declares to have completed the execution of task x returning output q. The terms p and q denote arbitrary sets of input/output, which depend on the specific task. Special constant ∅ denotes empty input or output. The interleaving of actions performed by the PMS and services is as follows. After the assignment of a certain task x by Assign(a, x), when the service a is ready to start executing, it executes action readyT oStartT ask(a, x). At this 54 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION stage, PMS executes action Start(a, x, p), after which a starts executing task x. When a completes task x, it executes the action f inishedT ask(a, x, q). Specifically, we envision that actions f inishedT ask(·) are those in charge of changing properties of world as result of executing tasks. When x is completed, PMS is allowed in any moment to execute sequentially AckT askCompletion(a, x) and Release(a, x). The program coding the process will the executed by only one actor, specifically the PMS. Therefore, actions readyT oStartT ask(·) and f inishedT ask(·) are considered as external and, hence, not coded in the program itself. For each specific domain, we have several fluents representing the properties of situations. Some of them are modelled independently of the domain whereas others, the majority, are defined according to the domain. If they are independent of the domain, they can be always formulated as defined in this chapter. Among the domain-independent ones, we have fluent f ree(a, s), that denotes the fact that the service a is free, i.e., no task has been assigned to it, in the situation s. The corresponding successor state axiom is as follows: f ree(a, ¡ do(t, s)) ⇔ ¢ ∀x.t = 6 Assign(a, x) ∧ f ree(a, s) ∨¢ ¡ ¬f ree(a, s) ∧ ∃x.t = Release(a, x) (4.2) This says that a service a is considered free in the current situation if and only if a was free in the previous situation and no tasks have been just assigned to it, or a was not free and it has been just released. There exists also the domainindependent fluent enabled(x, a, s) which aims at representing whether service a has notified to be ready to execute a certain task x so as to enabled it. The corresponding successor-state axiom: enabled(x, a, do(t, s)) ⇔ ¡ ¢ enabled(x, a, s) ∧ ∀q.t 6= f inishedT ask(a, x, q) ∨¢ ¡ ¬enabled(x, a, s) ∧ t = readyT oStartT ask(a, x) (4.3) This says that enabled(x, a, s) holds in the current situation if and only if it held in the previous one and no action f inishedT ask(a, x, q) has been performed or it was false in the previous situation and readyT oStartT ask(a, x) has been executed. This fluent aims at enforcing the constraints that the PMS can execute Start(a, x, p) only after a performed begun(a, x) and it can execute AckT askCompletion(a, x, q) only after f inishedT ask(a, x, q). This can represented by two pre-conditions on actions Start(·) and AckT askCompletion(·): ∀p.P oss(Start(a, x, p), s) ⇔ enabled(x, a, s) ∀p.P oss(AckT askCompletion(x, a), s) ⇔ ¬enabled(x, a, s) (4.4) provided that AckT askCompletion(x, a) never comes before Start(x, a, p), s. 4.3. PROCESS FORMALISATION IN SITUATION CALCULUS 55 Furthermore, we introduce a domain-independent fluent started(x, a, p, s) that holds if and only if an action Start(a, x, p) has been executed but the dual AckT askCompletion(x, a) has not yet: started(a, x, p, do(t, s)) ⇔ ¡ ¢ started(a, x, p, s) ∧ t = 6 Stop(a, x) ∨ ¡ 0 ¢ 0 @p .started(x, a, p , s) ∧ t = Start(a, x, p) (4.5) In addition, we make use, in every specific domain, of a predicate available(a, s) which denotes whether a service a is available in situation s for tasks assignment. However, available is domain-dependent and, hence, requires to be defined specifically for every domain. Knowing whether a service is available is very important for the PMS when it has to perform assignments. Indeed, a task x is assigned to the best service a which is available and provides every capability required by x. The fact that a certain service a is free does not imply it can be assigned to tasks (e.g., in the example described above it has to be free as well as it has to be indirectly connected to the coordinator). The definition of available(·) must enforce the following condition: ∀a s.available(a, s) ⇒ f ree(a, s) (4.6) We do not give explicitly pre-conditions to task. We assume tasks can always be executed. We assume that, given a task, if some conditions do not hold, then the outcomes of that tasks are not as expected (in other terms, it fails). We illustrate such notions on our running example. Example 4.1 (cont.). We formalize the scenario in Example 4.1: • at(loc, p, s) is true if service w is located at coordinate loc = hlocx , locy , locz i in situation s. In the starting situation S0 , for each service ai , we have at(ai , loci , S0 ) where location loci is obtained through GPS sensors. • evaluationOK(loc, s) is true if the photos taken are judged as having a good quality, with evaluationOK(loc, S0 ) = false for each location loc. • inf oSent(s) is true in situation s if the information concerning injured people at destination d has been successfully forwarded to the headquarter. There holds inf oSent(d, S0 ) = false. • photoBuild(loc, n, s) is true if in location loc n photos have been taken. In the starting situation S0 photoBuildA(loc, 0, S0 ) for all locations loc. Before giving the success-state axioms for the above fluents, we define some abbreviations: 56 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION • available(a, s): which states a service a is available if it is connected to the coordinator device (denoted by Coord) and is free. • connected(w, z, s): which is true if in situation s the services w and z are connected through possibly multi-hop paths. • neigh(w, z, s): which holds if the services w and z are in radio-range in the situation s. Their definitions are as follows: def neigh(w0 , w1 , s) = at(w0 , p0 , s) ∧ at(w1 , p1 , s)∧ k p0 − p1 k< rrange def connected(w 0 , w1 , s) = neigh(w0 , w1 , s) ¡ ¢ ∨¡∃w2 .neigh(w0 , w2 , s) ∧ neigh(w2 , w1 , s) ¢ ∨¡∃w2 , w3 .neigh(w ¢ 0 , w2 , s) ∧ neigh(w2 , w3 , s) ∧ neigh(w3 , w1 , s) ∨ ∃w2 , w3 , w4 . . . ∨¡. . . ∨ ∃w2 , w3 , . . . , wn .neigh(w ¢ 0 , w2 , s) ∧ neigh(w2 , w3 , s) ∧neigh(w3 , w1 , s) ∧ . . . def available(w, s) = f ree(w, s) ∧ connected(w, Coord, s)) The successor state axioms for this domain are: at(a, ¡ loc, do(t, s)) ⇔ 0 ¢ at(a, loc, s) ∧ ∀loc .t 6= f inishedT ask(a, Go, loc0 ) ¡ ¢ ∨ ¬at(a, loc, s) ∧ t = f inishedT ask(a, Go, loc) ∧ started(a, Go, loc, s) evaluationOK(loc, do(t, s)) ⇔ evaluationOK(loc, s) ¡ ∨ ∃a.t = f inishedT ask(a, Evaluate, hloc, OKi) ¢ ∧photoBuild(loc, n, s) ∧ ∃p.started(a, Evaluate, p, s) ∧ n ≥ threshold inf oSent(do(t, s)) ⇔ inf oSent(s) ¡ ∨ ∃a.t = f inishedT ask(a, SendToHeadquarter, OK) ¢ ∧∃p.started(a, SendToHeadquarter, p, s) photoBuild(loc, n, do(t, s)) ⇔ ¡ ∃a, m, o.photoBuild(loc, m, s) ∧ t = f inishedT ask(a, TakePhoto, hloc, oi) ¢ ¡∧ n = m + o ∧ at(a, loc, s) ∧ ∃p.started(a, TakePhoto, p, s) ∨ ∃a, o.photoBuild(loc, n, s) ∧ t = f inishedT ask(a, ¢ TakePhoto, hloc, oi) ¡∧¬at(a, loc, s) ∧ ∃p.started(a, TakePhoto, p, s) ¢ ∨ ∀a, o.photoBuild(loc, n, s) ∧ t 6= f inishedT ask(a, TakePhoto, hloc, oi 4.4. MONITORING FORMALISATION 57 It is worthy noting that all fluents which denote world properties of interest are changed by f inishedT ask, as already told. Moreover, the value of fluent photoBuild(loc, n, s) is updated by the execution of task Go only if the executor is at location loc. Otherwise, the photos taken are not considered as valuable. Even if that is not formally the pre-condition of the task (the task can be executed in any case), in fact that is a condition that has to hold in order the task be executed. 4.4 Monitoring Formalisation Next we formalize how the monitor works. Intuitively, the monitor takes the current program δ 0 and the current situation s0 from the PMS’s virtual reality and, analyzing the physical reality by sensors, introduces fake actions in order to get a new situation s00 which aligns the virtual reality of the PMS with sensed information. Then, it analyzes whether δ 0 can still be executed in s00 , and if not, it adapts δ 0 by generating a new correctly executable program δ 00 . Specifically, the monitor work can be abstractly defined as follows (we do not model how the situation s00 is generated from the sensed information): 0 , s0 , s00 , δ 00 ) ⇔ M onitor(δ ¡ ¢ 0 , s0 , s00 ) ∧ Recovery(δ 0 , s0 , s00 , δ 00 ) ∨ Relevant(δ ¡ ¢ ¬Relevant(δ 0 , s0 , s00 ) ∧ δ 00 = δ 0 (4.7) where: (i) Relevant(δ 0 , s0 , s00 ) states whether the change from the situation s0 into s00 is such that δ 0 cannot be correctly executed anymore; and (ii) Recovery(δ 0 , s0 , s00 , δ 00 ) is intended to hold whenever the program δ 0 , to be originally executed in the situation s0 , is adapted to δ 00 in order to be executed in the situation s00 . Formally Relevant is defined as follows: Relevant(δ 0 , s0 , s00 ) ⇔ ¬SameConf ig(δ 0 , s0 , δ 0 , s00 ) where SameConf ig(δ 0 , s0 , δ 00 , s00 ) is true if executing δ 0 in s0 is “equivalent” to executing δ 00 in s00 (see later for further details). In this general framework we do not give a definition for SameConf ig(δ 0 , s0 , δ 00 , s00 ). However we consider any definition for SameConf ig to be correct if it denotes a bisimulation [99]. Definition 4.1. A predicate SameConf ig(δ 0 , s0 , δ 00 , s00 ) is correct if for every δ 0 , s0 , δ 00 , s00 : 1. F inal(δ 0 , s0 ) ⇔ F inal(δ 00 , s0 ) 58 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION Main() 1 (EvalTake(LocA) k EvalTake(LocB) k EvalTake(LocC)); 2 π.a0 [available(a0 ) ∧ ∀c.require(c, SendByGPRS) ⇒ provide(a0 , c)]; 3 Assign(a0 , SendByGPRS); 4 Start(a0 , SendByGPRS, ∅); 5 AckT askCompletion(a0 , SendByGPRS); 6 Release(a0 , SendByGPRS); EvalTake(Loc) 1 π.a1 [available(a1 ) ∧ ∀c.require(c, CompileQuest) ⇒ provide(a1 , c)]; 2 Assign(a1 , CompileQuest); 3 Start(a1 , CompileQuest, Loc); 4 AckT askCompletion(a1 , CompileQuest); 5 Release(a1 , CompileQuest); 6 while ¬evaluationOK(Loc) 7 do 8 π.a2 [available(a2 ) ∧ ∀c.require(c, Go) ⇒ provide(a2 , c)]; 9 Assign(a2 , Go); 10 Start(a2 , Go, Loc); 11 AckT askCompletion(a2 , Go); 12 Start(a2 , TakePhoto, Loc); 13 AckT askCompletion(a2 , TakePhoto); 14 Release(a2 , TakePhoto); 15 π.a3 [available(a3 ) ∧ ∀c.require(c, EvaluatePhoto) ⇒ provide(a3 , c)]; 16 Assign(a3 , EvaluatePhoto); 17 Start(a3 , EvaluatePhoto, Loc); 18 AckT askCompletion(a3 , EvaluatePhoto); 19 Release(a3 , EvaluatePhoto); Figure 4.3: The IndiGolog program corresponding to the process in Figure 4.2 4.5. A CONCRETE TECHNIQUE FOR RECOVERY 59 ¡ ¢ 2. ∀ a, δ 0 .T rans δ 0 , s0 , δ 0 , do(a, s0 ) ⇒ ¡ ¢ ¡ ¢ ∃ δ 00 .T rans δ 00 , s00 , δ 0 , do(a, s00 ) ∧ SameConf ig δ 0 , do(a, s), δ 00 , do(a, s00 ) ¡ ¢ 3. ∀ a, δ 0 .T rans δ 00 , s00 , δ 0 , do(a, s00 ) ⇒ ¡ ¢ ¡ ¢ ∃ δ 00 .T rans δ 0 , s0 , δ 0 , do(a, s0 ) ∧ SameConf ig δ 00 , do(a, s00 ), δ 0 , do(a, s0 ) Intuitively, a predicate SameConf ig(δ 0 , s0 , δ 00 , s00 ) is said to be correct if δ 0 and δ 00 are terminable either both or none of them. Furthermore, for each action a performable by δ 0 in the situation s0 , δ 00 in the situation s00 has to enable the performance of the same actions (and viceversa). Moreover, the resulting configurations (δ 0 , do(a, s0 )) and (δ 00 , do(a, s0 )) must still satisfy SameConf ig. The use of the bisimulation criteria to state when a predicate SameConf ig(· · · ) is correct, derives from the notion of equivalence introduced in [64]. When comparing the execution of two formally different business processes, the internal states of the processes may be ignored, because what really matters is the process behavior that can be observed. This view reflects the way a PMS works: indeed what is of interest is the set of tasks that the PMS offers to its environment, in response to the inputs that the environment provides. Next we turn our attention to the procedure to adapt the process formalized by Recovery(δ, s, s0 , δ 0 ). Formally is defined as follows: Recovery(δ 0 , s0 , s00 , δ 00 ) ⇔ ∃δa , δb .δ 00 = δa ; δb ∧ Deterministic(δa ) ∧ Do(δa , s00 , sb ) ∧ SameConf ig(δ 0 , s0 , δb , sb ) (4.8) where Deterministic(δ) in general holds if δ does not use the concurrency constructs, nor non-deterministic choices. Recovery determines a process δ 00 consisting of a deterministic δa (i.e., a program not using the concurrency construct), and an arbitrary program δb . The aim of δa is to lead from the situation s00 in which adaptation is needed to a new situation sb where SameConf ig(δ 0 , s0 , δb , sb ) is true. Notice that during the actual recovery phase δa we disallow for concurrency because we need full control on the execution of each service in order to get to a recovered state. Then the actual recovered program δb can again allow for concurrency. 4.5 A Concrete Technique for Recovery In the previous sections we have provided a general description on how adaptation can be defined and performed. Here we choose a specific technique that is actually feasible in practice. Our main step is to adopt a specific definition 60 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION for SameConf ig, here denoted as SameConfig, namely: SameConfig(δ 0 , s0 , δ 00 , s00 ) ⇔ SameState(s0 , s00 ) ∧ δ 0 = δ 00 (4.9) In other words, SameConfig states that δ 0 , s0 and δ 00 , s00 are the same configuration if (i) all fluents have the same truth values in both s0 and s00 (SameState)3 , and (ii) δ 00 is actually δ 0 . The following shows that SameConfig is indeed correct. Theorem 4.1. SameConfig(δ 0 , s0 , δ 00 , s00 ) is correct. Proof. We show that SameConfig is a bisimulation. Indeed: • Since SameState(s0 , s00 ) requires all fluents to have the ¢same values both ¡ 0 00 in s and s , we have that F inal(δ, s0 ) ⇔ F inal(δ, s00 ) . • Since SameState(s0 , s00 ) requires all fluents to have the same values both in s0 and s00 , it follows that the PMS is allowed for the same process δ 0 to assign the same tasks both in s0 and in s00 and moreover for each action a and situation s0 and s00 s.t. SameState(s0 , s00 ), we have that 0 00 SameState(do(a, a and δ 0 such¢ ¢ hold. As a result, for ¡each ¡ 0 0 s ), do(a, s0 )) 0 00 that T rans δ , s , δ 0 , do(a, s ) we have that T rans δ , s , δ 0 , do(a, s00 ) ¡ ¢ and SameConfig δ 0 , do(a, s), δ 00 , do(a, s00 ) . Similarly for the other direction. Hence, the thesis holds. Next let us denote by LinearP rogram(δ) a program constituted only by sequences of actions, and let us define Recovery as: Recovery(δ 0 , s0 , s00 , δ 00 ) ⇔ ∃δa , δb .δ 00 = δa ; δb ∧ LinearP rogram(δa ) ∧ Do(δa , s00 , sb ) ∧ SameConfig(δ 0 , s0 , δb , sb ) (4.10) Next theorem shows that we can adopt Recovery as a definition of Recovery without loss of generality. Theorem 4.2. For every process δ 0 and situations s0 and s00 , there exists a δ 00 such that Recovery(δ 0 , s0 , s00 , δ 00 ) if and only if there exists a δ 00 such that Recovery(δ 0 , s0 , s00 , δ 00 ), where in the latter we use SameConfig as SameConf ig. 3 Observe that SameState can actually be defined as a first-order formula over the fluents, as the conjunction of F (s0 ) ⇔ F (s00 ) for each fluent F . 4.5. A CONCRETE TECHNIQUE FOR RECOVERY 61 Proof. Observe that the only difference between the two definitions is that in one case we allow only for linear programs (i.e., sequences of actions) as δa , while in the second case also for deterministic ones, that may include also if-then-else, while, procedures, etc. (⇒) Trivial, as linear programs are deterministic programs. (⇐) Let us consider the recovery process δ 00 = δa ; δb where δa is an arbitrary deterministic program. Then by definition of Recovery there exists a (unique) situation s such that Do(δa , s0 , s). Now consider that s as the form s = do(an , do(an−1 , . . . , do(a2 , do(a1 , s0 )) . . .)). Let us consider the linear program p = (a1 ; a2 ; . . . ; an ). Obviously we have Do(p, s0 , s). Hence the process δ 00 = p; δb is a recovery process according to the definition of Recovery. The nice feature of Recovery is that it asks to search for a linear program that achieves a certain formula, namely SameState(s0 , s00 ). Moreover, restricting to sequential programs obtained by planning with no concurrency does not prevent any recoverable process from being adapted. In sum, we have reduced the synthesis of a recovery program to a classical Planning problem in AI [52]. As a result we can adopt a well-developed literature about planning for our aim. In particular, if the services and input and output parameters are finite, then the recovery can be reduced to propositional planning, which is known to be decidable in general (for which very well performing software tools exists). Theorem 4.3. Let assume a domain in which services and input and output parameters are finite. Then given a process δ 0 and situations s0 and s00 , it is decidable to compute a recovery process δ 00 such that Recovery(δ 0 , s0 , s00 , δ 00 ) holds. Proof. In domains in which services and input and output parameters are finite, also actions and fluents instantiated with all possible parameters are finite. Hence we can phrase the domain as a propositional one and the thesis follows from decidability of propositional planning [52]. Example 4.1 (cont.). In the running example, let us consider here two cases of discrepancies causing significant deviations. Case 1. The process is between lines 10 and 11 in the execution of the procedure invocation EvalTake(LocA). A certain node a2 is assigned to tasks Go and TakePhoto. Suddenly, an appropriate sensor predicts that a2 is moving soon out of range and, hence, disconnecting from the coordinator device.4 The sensor generates and executes the action f inishedT ask(a2 , Go, RealPosition) 4 Section 5.4 describes a proposal of a Bayesian approach for predicting disconnects before they actually happen. Such an approach has been also implemented in SmartPM. 62 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION where RealPosition is the position where node is going to disconnect. After this action, in the resulting situation s0 fluent at(a2 , RealPosition, s0 ) holds accordingly. The monitor infers that the exogenous event causes a significant deviation (i.e., connected(a2 , Coord) does not hold and the process cannot be completed). Hence, it uses a planner to build a recovery program pursuing the goal connected(a2 , Coord) ∧ at(a2 , LocA) ∧ φ.5 Formula φ denotes the conjunction of all fluents in situation-suppressed form; holding fluents appear in form affirmed, non-holding fluents negated). The planner will build very likely a recovery program similar to the following: δa = [ Assign(a3 , Go); Start(a3 , Go, RealPosition); AckT askCompletion(a3 , Go); Release(a3 , Go); AckT askCompletion(a2 , Go); Start(a2 , Go, LocA); AckT askCompletion(a2 , Go); ] where a3 is a free team member that has been judged as the best to go after a1 . Consequently, the program after the deviation is δ 0 = δa ; δb where δb is the original one from line 12. Case 2. The process is currently executing in any point among lines 1417 of procedure EvalTake(LocA). At this point, the number of photos taken is bigger than constant threshold. For some reason some of those photos are suddenly lost (e.g., the files have been corrupted); hence, fluent photoBuild(LocA, val, s) holds where val < threshold. The monitor senses a significant deviation and, hence, it plans a proper recovery program pursuing the goal photoBuild(LocA, threshold, s) ∧ φ: δa = [ π.a5 [available(a5 ) ∧ ∀c.require(c, TakePhoto) ⇒ provide(a5 , c)]; Assign(a5 , TakePhoto); Start(a5 , TakePhoto, LocA); AckT askCompletion(a5 , TakePhoto); Release(a5 , TakePhoto); ] 5 Observe that if the positions are discretised, so as to become finite, this recovery can be achieved by a propositional planner. 4.6. SUMMARY 63 The example has shown that the approach proposed is not based on the idea of capturing expected exceptions. Other approaches rely on rules to define the behaviors when special events that cause deviations are triggered. Here we simply model (a subset of) the running environment and the actions’ effects, without defining how to manage the adaptation. Modeling the environment, even in detail, is feasible where modeling all possible exceptions is often impossible. 4.6 Summary This chapter has presented the formal foundation of a general approach, based on execution monitoring, for automatic process adaptation in dynamic scenarios. Such an approach is (i) practical, by relying on well-established planning techniques, and (ii) does not require the definition of the adaptation strategy in the process itself (as most of the current approaches do). We have also given the basic concepts of situation calculus and IndiGolog which are extensively used throughout this thesis. The approach proposed in this chapter has been formally proven to be correct and complete and we have shown the application to a significant example stemming from a real scenario, specifically emergency management. This example will be later used in the next chapter as running example when discussing the concrete implementation done using the IndiGolog interpreter developed by the Cognitive Robotics Group of the Toronto University. 64 CHAPTER 4. FRAMEWORK FOR AUTOMATIC ADAPTATION Chapter 5 The SmartPM System This chapter is devoted to describe SmartPM, the concrete implementation of the framework described in Chapter 4. For this aim, we used in IndiGolog platform developed by University of Toronto in collaboration with the Agent Group of RMIT University, Melbourne. Specifically, Section 5.1 overviews the IndiGolog platform used for SmartPM, whereas Section 5.2 shows the concrete choices we made in order to tailor the theoretical framework to a concrete implementation. The concrete implementation of SmartPM has encountered two main group of issues. Firstly, the IndiGolog platform was targeted to the agent and robot programming and, hence, using it for process management, which is not a close application field, has been quite difficult. For example, we needed the inclusion of the construct atomic to define a sequence of actions that have to be executed all together and cannot be interrupted and actions in sequences concurrently executing. Such a kind of construct makes less sense in the field of robots, where it is quite important in process management in order to introduce the concept of transaction. In fact, the development has been carried on with a tight collaboration with the conceivers and developers who very kindly made some changes to meet our requirements. Secondly, the theoretical framework did not consider the features and limitations that are actually available in the platform. For instance, the theoretical framework supposed to stop the process and restructure it by placing the recovery beforehand. In practice, the platform does not allow to change the program that codes the process when already started. In order to overcome this limitation, we committed to use interrupts at different priorities (see Section 5.2). Nowadays, in many pervasive scenarios, such as emergency management or health care, it is not feasible to assume that the area or the house (or whatever else) is equipped with access points providing Wi-Fi networks. In order to have 65 66 CHAPTER 5. THE SMARTPM SYSTEM devices, operators and services to communicate, it is required to deploy quickly a wireless network for the time the communication is necessary that relies on no fixed network. As already told in the Introduction, a Mobile Ad hoc Network (manet) is a P2P network of mobile nodes capable of communicating with each other without an underlying infrastructure. Nodes can communicate with their own neighbors (i.e., nodes in radio-range) directly by wireless links. Nonneighbor nodes can communicate as well, by using other intermediate nodes as relays that forward packets toward destinations. Therefore, manets seem to be appropriate in pervasive scenarios since they can also operate where the presence of access points is not guaranteed, as in emergency management [91]. Sections 5.3 and 5.4 shows to interesting research carried on in order to apply concretely SmartPM to many pervasive scenarios. The former turns to describe a network layer to make communicate devices and services in manet settings. The latter describes the development and testing of an algorithm that enables to alert when mobile devices are going out of range from the others and, hence, the services installed become unavailable. These signals represent exogenous events to be caught by the PMS, which should build a recovery plan trying to avoid the service unavailability. In order to test the effectiveness of SmartPM and of the techniques to support its usage in manet scenarios, the best solution would be on-field tests. But they would require many people moving around in large areas and repeatability of the experiments would be compromised. In these cases, it is better to emulate: during emulation, some software or hardware pieces are not real whereas others are exactly the ones on actual systems. The “nice” feature is that software systems are not aware of being working on some layers that are partially or totally emulated. Therefore, the software is not changed to meet the emulator environment; it can be used in real settings with few or no changes. Section 5.5 describes octopus, a specific emulator to test the SmartPM PMS on manets and the aforementioned components. 5.1 The IndiGolog Platform This section describes the IndiGolog-based platform that we have used to implement the framework described in Chapter 4.1 Part of this section is a summary of the work published in [29] by kind agreement with its authors. The agent platform to be described here is a logic-programming implementation of IndiGolog that allows the incremental execution of high-level Golog-like programs. This implementation of IndiGolog is modular and easily extensible so as to deal with any external platform, as long as the suitable interfacing modules are programmed (see below). 1 Available at http://sourceforge.net/projects/indigolog/. 5.1. THE INDIGOLOG PLATFORM 67 Although most of the code is written in vanilla Prolog, the overall architecture is written in the well-known open source SWIProlog 2 [144]. SWIProlog provides flexible mechanisms for interfacing with other programming languages such as Java or C, allows the development of multi-threaded applications, and provides support for socket communication and constraints. Generally speaking, the IndiGolog implementation provides an incremental interpreter of high-level programs as well as a framework for dealing with the real execution of these programs on concrete platforms or devices. This amounts to handling the real execution of actions on concrete devices (e.g., a real robot platform), the collection of sensing outcome information (e.g., retrieving some sensor’s output), and the detection of exogenous events happening in the world. To that end, the architecture is modularly divided into six parts, namely, (i) the top-level main cycle; (ii) the language semantics; (iii) the temporal projector; (vi) the environment manager; (v) the set of device managers; and finally (vi) the domain application. The first four modules are completely domain independent, whereas the last two are designed for specific domain(s). The architecture is depicted in Figure 5.1. 5.1.1 The top-level main cycle and language semantics The IndiGolog platform codes the sense-think-act loop well-known in the agent community [76]: 1. check for exogenous events that have occurred; 2. calculate the next program step; and 3. if the step involves an action, execute the action. While executing actions, the platform keeps updated an history, which is the sequence of actions performed so far. The main predicate of the main cycle is indigo/2; a goal of the form indigo(E,H) states that the high-level program E is to be executed online at history H. The first thing the main cycle does is to assimilate all exogenous events that have occurred since the last execution step. After all exogenous actions have been assimilated and the history progressed as needed, the main cycle goes on to actual executing the high-level program E. First, if the current program to be executed is terminating in the current history, then the top-level goal indigo/2 succeeds. Otherwise, the interpreter checks whether the program can evolve a single step by relaying on predicate trans/4 (explained below). If the program evolves without executing any action, then the history remains 2 http://www.swi-prolog.org/ 68 CHAPTER 5. THE SMARTPM SYSTEM Figure 5.1: The IndiGolog implementation architecture. Links with a circle ending represent goal posted to the circled module (as from [29]) 5.1. THE INDIGOLOG PLATFORM 69 unchanged and we continue to execute the remaining program from the same history. If, however, the step involves performing an action, then this action is executed and incorporated into the current history, together with its sensing result (if any), before continuing the execution of the remaining program. As mentioned above, the top-level loop relies on two central predicates, namely, final/2 and trans/4. These predicates implement relations T rans and F inal, giving the single step semantics for each of the constructs in the language. It is convenient, however, to use an implementation of these predicates defined over histories instead of situations. Indeed, the constructs of the IndiGolog interpreter never treat about situations but they are always assuming to work on the current situation. So, for example, these are the corresponding clauses for sequence (represented as a list), tests, nondeterministic choice of programs, and primitive actions: final([E|L],H) :- final(E,H), final(L,H). trans([E|L],H,E1,H1) :- final(E,H), trans(L,H,E1,H1). trans([E|L],H,[E1|L],H1) :- trans(E,H,E1,H1). final(ndet(E1,E2),H) :- final(E1,H) ; final(E2,H). trans(ndet(E1,E2),H,E,H1) :- trans(E1,H,E,H1). trans(ndet(E1,E2),H,E,H1) :- trans(E2,H,E,H1). trans(?(P),H,[],H) :- eval(P,H,true). trans(E,H,[],[E|H]) :- action(E), poss(E,P), eval(P,H,true). /* Obs: no final/2 clauses for action and test programs */ These Prolog clauses are almost directly “lifted” from the corresponding axioms for T rans and F inal. Predicates action/1 and poss/2 specify the actions of the domain and their corresponding precondition axioms; both are defined in the domain axiomatization (see below). More importantly, eval/3 is used to check the truth of a condition at a certain history, and is provided by the temporal projector, described next. The naive implementation of the search operator would deliberate from scratch at every point of its incremental execution. It is clear, however, that one could do better than that, and cache the successful plan obtained and avoid planning in most cases: final(search(E),H) :- final(E,H). trans(search(E),H,path(E1,L),H1) :trans(E,H,E1,H1), findpath(E1,H1,L). /* findpath(E,H,L): solve (E,H) and store the path in list L */ /* L = list of configurations (Ei,Hi) expected along the path */ 70 CHAPTER 5. THE SMARTPM SYSTEM findpath(E,H,[(E,H)]) :- final(E,H). findpath(E,H,[(E,H)|L]) :- trans(E,H,E1,H1), findpath(E1,H1,L). So, when a search block is solved, the whole solution path found is stored as the sequence of configurations that are expected. If the actual configurations match, then steps are performed without any reasoning (first final/2 and trans/4 clauses for program path(E,L)). On the other hand, if the actual configuration does not match the one expected next, for example, because an exogenous action occurred and the history thus changed, re-planning is performed to look for an alternative path (code not shown). 5.1.2 The temporal projector The temporal projector is in charge of maintaining the agent’s beliefs about the world and evaluating a formula relative to a history. The projector module provides an implementation of predicate eval/3: goal eval(F,H,B) states that formula F has truth value B, usually true or false, at history H. Predicate eval/3 is used to define trans/4 and final/2, as the legal evolutions of high-level programs may often depend on what things are believed true or false. We assume then that users provide definitions for each of the following predicates for fluent f , action a, sensing result r, formula w, and arbitrary value v: fun fluent(f) f is a functional fluent; rel fluent(f) f is a functional fluent; prim action(a) a is a ground action; init(f,v) v is the value for fluent f in the starting situation; poss(a,w) it is possible to execute action a provided formula w is known to be true; causes val(a,f,v,w) action a affects the value of f Formulas are represented in Prolog using the obvious names for the logical operators and with all situations suppressed; histories are represented by lists of the form o(a, r) where a represents an action and r a sensing result. We will not go over how formulas are recursively evaluated, but just note that there exists a predicate (i) kTrue(w, h) is the main and top-level predicate and it tests if the formula w is at history h. Finally, the interface of the module is defined as follows: 5.1. THE INDIGOLOG PLATFORM 71 eval(F,H,true) :- kTrue(F,H). eval(F,H,false) :- kTrue(neg(F),H). 5.1.3 The environment manager and the device managers Because the architecture is meant to be used with concrete agent/robotic platforms, as well as with software/simulation environments, the online execution of IndiGolog programs must be linked with the external world. To that end, the environment manager (EM) provides a complete interface with all the external devices, platforms, and real-world environments that the application needs to interact with. In turn, each external device or platform that is expected to interact with the application (e.g., a robot, a software module, or even a user interface) is assumed to have a corresponding device manager, a piece of software that is able to talk to the actual device, instruct it to execute actions, as well as gather information and events from it. The device manager understands the “hardware” of the corresponding device and provides a high-level interface to the EM. It provides an interface for the execution of actions (e.g., assign, start, etc.), the retrieval of sensing outcomes for actions, and the occurrence of exogenous events (e.g., disconnect as well as finishedTask). Because actual devices are independent of the IndiGolog application and may be in remote locations, device managers are meant to run in different processes and, possibly, in different machines; they communicate then with the EM via TCP/IP sockets. The EM, in contrasts, is part of the IndiGolog agent architecture and is tightly coupled with the main cycle. Still, since the EM needs to be open to the external world regardless of any computation happening in the main cycle, the EM and the main cycle run in different (but interacting) threads, though in the same process and Prolog run-time engine.3 So, in a nutshell, the EM is responsible of executing actions in the real world and gathering information from it in the form of sensing outcome and exogenous events by communicating with the different device managers. More concretely, given a domain high-level action (e.g., assign(W rkList, Srvc)), the EM is in charge of: (i) deciding which actual “device” should execute the action; (ii) ordering its execution by the device via its corresponding device manager; and finally (iii) collecting the corresponding sensing outcome. To realize the execution of actions, the EM provides an implementation of exec/2 to the top-level main cycle: exec(A,S) orders the execution of action A, returning S as its sensing outcome. When the system starts, the EM starts up all device managers required by the application and sets up communications channels to them using TCP/IP 3 SWIProlog provides a clean and efficient way of programming multi-threaded Prolog applications. 72 CHAPTER 5. THE SMARTPM SYSTEM stream sockets. Recall that each real world device or environment has to have a corresponding device manager that understands it. After this initialization process, the EM enters into a passive mode in which it asynchronously listens for messages arriving from the various devices managers. This passive mode should allow the top-level main cycle to execute without interruption until a message arrives from some device manager. In general, a message can be an exogenous event, a sensing outcome of some recently executed action, or a system message (e.g., a device being closed unexpectedly). The incoming message should be read and handled in an appropriate way, and, in some cases, the top-level main cycle should be notified of the occurred event. 5.1.4 The domain application From the user perspective, probably the most relevant aspect of the architecture is the specification of the domain application. Any domain application must provide: 1. An axiomatization of the dynamics of the world. Such axiomatization would depend on the temporal projector to be used. 2. One or more high-level agent programs that will dictate the different agent behaviors available. In general, these will be IndiGolog programs. 3. All the necessary execution information to run the application in the external world. This amounts to specifying which external devices the application relies on (e.g., the device manager for the ER1 robot), and how high-level actions are actually executed on these devices (that is, by which device each high-level action is to be executed). Information on how to translate high-level symbolic actions and sensing results into the device managers’ low-level representations, and vice-versa, could also be provided. 5.2 The SmartPM Engine According to the framework defined in Section 4, the PMS interrupts the execution of processes when a misalignment between the virtual and physical reality is sensed. When this happens the monitor adapts the program to deal with such a discrepancies. This section describes how the adaptation framework has been concretely implemented in SmartPM. Figure 5.2 shows how conceptually SmartPM has been integrated into the IndiGolog interpreter. At the beginning, we envision a responsible person to design the process specification through a Graphical Tool, namely SPIDE (Figure 5.3 shows a 5.2. THE SMARTPM ENGINE Figure 5.2: Architecture of the PMS. 73 74 CHAPTER 5. THE SMARTPM SYSTEM Figure 5.3: The SPIDE Tool screen shot), which generates an accordant XML files [98]. Specifically, it is meant to generate the XML specification file which should contain a formal domain theory as well as the process schema and the action conditions. SPIDE tailors the approach proposed in [81] to SmartPM: it allows to define specific templates with a finite number of open options. When the instance needs to be created, an operator chooses the proper template from a repository and close the open points, thus transforming the abstract template in a concrete process specification. The XML-to-IndiGolog Parser component translates a SPIDE’s XML specification in three conceptual parts: Domain Program. The IndiGolog program corresponding to the designed 5.2. THE SMARTPM ENGINE 75 process. It includes also some helper procedures to handle the task executions, the interaction with the external services and other features. Domain Axioms It comprises the action theory: the set of fluents modeling world properties of interest, the set of available tasks, and the successorstate axioms which describes how the actions applied on tasks change the fluents. Some axiomatization parts are, in fact, independent of the domain, and, hence, remain unchanged when passing from a domain to another. On the contrary, other axioms are modeled according to the domain and model how domain-dependent fluents change as result of the task executions. Execution Monitor This parts is always generated in the same way and does not take the specific domain into account. Specifically, more details of the first two parts are given in Section 5.2.1, whereas Section 5.2.2 turns to described how the monitoring/recovering mechanism has been coded in SmartPM using IndiGolog. When the program is translated in the Domain Program and Axioms, the Communication Manager (CM) starts up all of device managers, which are basically some drivers for making communicate PMS with the services and sensors installed on devices. PMS holds a device manager for each device hosting services. After this initialization process, CM activates the IndiGolog Engine, which is in charge of executing IndiGolog programs by realising the main cycle described in Section 5.1.1. Then, CM enters into a passive mode where it is is listening for messages arriving from the devices through the device managers. In general, a message can be a exogenous event harvested by a certain sensor installed on a given device as well as a message notifying the beginning or the completion of a certain task. The Communication Manager can be invoked by the IndiGolog Engine whenever it produces an action for execution. The IndiGolog Engine relies on two further modules named Transition System and Temporal Projector. The former is used to compute the evolution of IndiGolog programs according to the statements’ semantic., whereas the latter is in charge of holding the current situations throughout the execution, making possible to evaluate the fluent values. From the one side, the Execution Monitor makes use of CM which notifies which notifies the occurrence of exogenous events; from the other side, it relies on the Temporal Projector to get the updated values of fluents. 5.2.1 Coding processes by the IndiGolog interpreter This sub-section turns to describe how processes can be concretely coded as IndiGolog programs by using the interpreter described in Section 5.1. Inter- 76 CHAPTER 5. THE SMARTPM SYSTEM ested readers may look at Example 5.1, which shows the most significant parts of the interpreter code.4 The process requires a model definition for the predicates that are defined in Section 4.3: service(a), task(x), capability(b), provide(a, b), require(x, b). In addition, we introduced predicate identif iers(i), which defines the valid identifiers for tasks. Indeed, the process specification may comprise certain tasks more than once; of course, different instances of the same task have to be distinguished as they are different pieces of work. Example 5.1. The following is the code of the IndiGolog interpreter giving a definition of the aforementioned predicates for the running example. Specifically, the example assumes the team to be composed by five services, all humans, that are univocally identified by a number. Predicate domain(N,X) has been made available by the IndiGolog interpreter itself. And it holds whether element N is into list X. /* Available services */ services([1,2,3,4,5]). service(Srvc) :- domain(Srvc,services). /* Tasks defined in the process specification */ tasks([TakePhoto,EvaluatePhoto,CompileQuest,Go,SendByGPRS]). task(Task) :- domain(Task,tasks). /* Capabilities relevant for the process of interest*/ capabilities([camera,compile,gprs,evaluation]). capability(B) :- domain(B,capabilities). /* The list of identifiers that may be used to distinguish different instances of the same task*/ task_identifiers([id_1,id_2,id_3,id_4,id_5,id_6,id_7,id_8,id_9, id_10,id_11,id_12,id_13,id_14,id_15,id_16,id_17,id_18,id_19, id_20]). id(D) :- domain(D,task_identifiers). /* The capabilities required for each task */ required(TakePhoto,camera). required(EvaluatePhoto,evaluation). required(CompileQuest,compile). required(SendByGPRS,gprs). 4 Appendix A lists all the code of the interpreter. 5.2. THE SMARTPM ENGINE 77 /The capabilities provided by each service */ provide(1,gprs). provide(1,evaluation). provide(2,compile). provide(2,evaluation). provide(2,camera). provide(3,compile). provide(4,evaluation). provide(4,camera). provide(5,compile). Tasks with their identifiers and inputs are packaged into elements workitem(T ask, Id, Input) of predicates listElem(workitem). The work-item element can be grouped in lists identified by elements worklist(·). The following is the corresponding Prolog code: worklist([]). worklist([ELEM | TAIL]) :- worklist(TAIL),listelem(ELEM). Indeed, actions assign(·) and release(·) take as input elements worklist(·). In fact, this implementation assigns one worklist(·) to one proper service that is capable to execute all tasks in the list. The assignment of lists of tasks to services rather than single tasks is motivated by the fact that we are willing to constrain multiple tasks to be executed by the same service. Example 5.1 (cont.). The example shows the definition of the different types of valid work items and their input parameters. Specifically, the first definition of listElem below gives the definition of work items of tasks Go, CompileQuest, EvaluatePhoto, TakePhoto. The second gives the definition of work items of SendByGPRS. The former group relies on the definition of predicate location that represents the possible locations in the geographic area of interest. /* Definition of predicate location(...) identifying locations in the geographic area of interest */ gridsize(10). gridindex(V) :gridsize(S), get_integer(0,V,S). location(loc(I,J)) :- gridindex(I), gridindex(J). 78 CHAPTER 5. THE SMARTPM SYSTEM /* member(ELEM,LIST) holds if ELEM is contained in LIST */ member(ELEM,[HEAD|_]) :- ELEM=HEAD. member(ELEM,[_|TAIL]) :- member(ELEM,TAIL). /* Definition of predicate listelem(workitem(Task,Id,I)). It identifies a task Task with id Id and input I */ listelem(workitem(Task,Id,I)) :- id(Id), location(I), member(Task,[Go,CompileQuest,EvaluatePhoto,TakePhoto]). listelem(workitem(SendByGPRS,Id,input)) :- id(Id). According to the framework of Chapter 4, there exist two classes of fluents, domain-dependent and domain-independent. The domain-independent fluents are enabled and f ree, as defined in the framework of Section 4, as well as assigned(LW rk, Srvc), which is not part of the theoretical framework and has been introduced for some implementation reasons (see below in this Section). Predicate assigned(·) holds if a certain worklist Lwrk is assigned to a service Srvc as result of the execution of action assign(Lwrk, Srvc). On the basis of some of these fluents we can define the four PMS actions, which are named Primary Actions in the terminology of the IndiGolog intepreter: assign, start, ackT askCompletion and release. The domain-dependent fluents can be represented in any form, relational or functional, and their successor-state axioms can be as complex as the domain needs. Example 5.1 (cont.). For the sake of brevity, we are showing below only the definitions of fluents assigned(·) and enabled(·) and their successor-state axioms. As far as the actions, assign and release can be executed in any case, whereas start(T ask, Id, Srvc, I) can be executed only if a certain work list LW rk is assigned to Srvc, there exists an element workitem(Task,Id,I) in LW rk. Moreover, T ask has to be enabled to Srvc, which means Srvc has previously executed action readyT oStart(T ask, Id, Srvc). The IndiGolog interpreter defines two procedures and(F1 , F2 ) and or(F1 , F2 ). The first is true if F1 and F2 are two formulas that hold in the current situation; the second if at least one between F1 and F2 holds. F1 and F2 are formulas that may be conjunction or disjunction of sub-formulas, which may include fluents, procedures, generic predicates, etc. /* Indicates that list LWrk of workitems has been assigned to service Srvc */ rel_fluent(assigned(LWrk,Srvc)) :- worklist(LWrk), 5.2. THE SMARTPM ENGINE 79 service(Srvc). /* assigned(LWrk,Srvc) holds after action assign(LWrk,Srvc) */ causes_val(assign(LWrk,Srvc),assigned(LWrk,Srvc),true,true). /* assigned(LWrk,Id,Srvc) holds no longer after action release(LWrk,Srvc) */ causes_val(release(LWrk,Srvc),assigned(LWrk,Srvc),false,true). /* Indicates that task Task with id Id has been begun by service Srvc */ rel_fluent(enabled(Task,Id,Srvc)) :- task(Task), service(Srvc), id(Id). /* enabled(Task,Id,Srvc) holds if the service Srvc calls readyToStart((Task,Id,Srvc)*/ causes_val(,enabled(Task,Id,Srvc),true,true). /* enabled(Task,Id,Srvc) holds no longer after service Srvc calls exogenous action finishedTask(Task,Id,Srvc,V)*/ causes_val(finishedTask(Task,Id,Srvc,_), enabled(Task,Id,Srvc),false,true). /* ACTIONS and PRECONDITIONS (INDEPENDENT OF THE DOMAIN) */ prim_action(assign(LWrk,Srvc)) :- worklist(LWrk),service(Srvc). poss(assign(LWrk,Srvc), true). prim_action(ackTaskCompletion(Task,Id,Srvc)) :- task(Task), service(Srvc), id(Id). poss(ackTaskCompletion(Task,Id,Srvc), neg(enabled(Task,Id,Srvc))). prim_action(start(Task,Id,Srvc,I)) :- listelem(workitem(Task,Id,I)), service(Srvc). poss(start(Task,Id,Srvc,I), and(enabled(Task,Id,Srvc), and(assigned(LWrk,Srvc), member(workitem(Task,Id,I),LWrk)) )). prim_action(release(LWrk,Srvc)) :- worklist(LWrk),service(Srvc). poss(release(LWrk,Srvc), true). 80 CHAPTER 5. THE SMARTPM SYSTEM Below we show some of the fluents that have been defined for the running example. Specifically we show fluents at(Srvc) and evaluationOK(Loc). Careful readers may note that at is defined here as a functional fluent, which returns locations, rather than as a relational fluent. In addition, we show the abbreviation hasConnection(Srvc), which returns true if Srvc is connected to Service 1 through a possible multi-hop path. Indeed, Service 1 is supposed to be deployed on the device that hosts the SmartPM engine. Such an abbreviation makes use of the IndiGolog procedure some(n, F (n)) which returns true if there exists a value n which makes hold formula F (n). /* at(Srvc) indicates that service Srvc is in position P */ fun_fluent(at(Srvc)) :- service(Srvc). causes_val(finishedTask(Task,Id,Srvc,V),at(Srvc),loc(I,J), and(Task=Go,V=loc(I,J))). rel_fluent(evaluationOK(Loc)) :- location(Loc). causes_val(finishedTask(Task,Id,Srvc,V), evaluationOK(loc(I,J)), true, and(Task=EvaluatePhoto, and(V=(loc(I,J),OK), and(photoBuild(loc(I,J),N), N>3)))). proc(hasConnection(Srvc),hasConnectionHelper(Srvc,[Srvc])). proc(hasConnectionHelper(Srvc,M), or(neigh(Srvc,1), some(n, and(service(n), and(neg(member(n,M)), and(neigh(n,Srvc), hasConnectionHelper(n,[n|M]))))))). The realisation of the execution cycle of a work list (i.e., a list of work items) is based on procedure isP ickable(W rkList, Srvc). It holds if W rkList is a list of proper work items and Srvc is capable to perform every task defined in every work item in such a list (i.e., Srvc provides all of the capabilities required). 5.2. THE SMARTPM ENGINE 81 In order to add a certain work list W rkList to the process specification, designers should use procedure manageT ask(W rkList), which takes care of (i) assigning all tasks in work list W rkList to one proper service, (ii) performing start(·) and ackT askCompletion(·), waiting for readyT oStart(·) f inishedT ask(·) from services, as well as (iii) releasing services when all tasks in the work list have been executed. Example 5.1 (cont.). Procedure manageT ask(W rkList) is internally composed by three sub-procedures. Firstly, it calls manageAssignment(W rkList) that picks a certain Srvc and turns assign(W rkList, Srvc). Then procedure manageExecution(W rkList) is invoked and such a procedure executes actions start(start(T ask, Id, Srvc, I) and ackT askCompletion(T ask, Id, Srvc) one by one for each work item workitem(T ask, Id, I) in list W rkList. Finally, the last sub-procedure is manageT ermination(W rkList) which makes the picked service Srvc free again by using the PMS action realise. It is worthy noting the use of the IndiGolog construct atomic([a1 ; . . . ; an ]) to provide an atomic execution of a action sequence a1 , . . . , an . Here atomicity is intended in the sense that all of these actions are performed sequentially and any other procedure is stuck till the whole sequence execution. For instance, in procedure manageAssignment the atomic constructs is used in order to prevent the same service to be picked by different executions of procedure manageAssignment. Otherwise, this would cause obvious inconsistences. Procedure isExecutable uses the IndiGolog construct f indall(elem, f ormula, set), which works as follows: it takes all instances of elem that makes f ormula true and puts all of them in set set. elem and f ormula are unified by the same term name; that means f ormula has to have a non-ground term named elem. Being that said, in procedure isExecutable(T ask, Srvc) term A denotes the set of all capabilities required by task T ask, whereas C denotes all capabilities provided by service Srvc. When can Srvc execute T ask? If the set A of the capabilities required by T ask is a sub set of C, the capabilities provided by Srvc. proc(isPickable(WrkList,Srvc), or(WrkList=[], and(free(Srvc), and(WrkList=[A|TAIL], and(listelem(A), and(A=workitem(Task,_Id,_I), and(isExecutable(Task,Srvc), isPickable(TAIL,Srvc)))))) ) ). 82 CHAPTER 5. THE SMARTPM SYSTEM proc(isExecutable(Task,Srvc), and(findall(Capability,required(Task,Capability),A), and(findall(Capability,provide(Srvc,Capability),C), subset(A,C)))). /* PROCEDURES FOR HANDLING THE TASK LIFE CYCLES */ proc(manageAssignment(WrkList), [atomic([pi(Srvc,[?(isPickable(WrkList,Srvc)), assign(WrkList,Srvc)])])]). proc(manageExecution(WrkList), pi(Srvc,[?(assigned(WrkList,Srvc)=true), manageExecutionHelper(WrkList,Srvc)])). proc(manageExecutionHelper([],Srvc),[]). proc(manageExecutionHelper([workitem(Task,Id,I)|TAIL],Srvc), [start(Task,Id,Srvc,I), ackTaskCompletion(Task,Id,Srvc), manageExecutionHelper(TAIL,Srvc)]). proc(manageTermination(WrkList), [atomic([pi(n,[?(assigned(WrkList,n)=true), release(X,n)])])]). proc(manageTask(WrkList), [manageAssignment(WrkList), manageExecution(WrkList), manageTermination(WrkList)]). Finally, if the framework is properly configured, the program that codes a process results to be quite simple. Specifically, for the running example is the following: proc(branch(Loc), while(neg(evaluationOk(Loc)), [ manageTask([workitem(CompileQuest,id_1,Loc)]), 5.2. THE SMARTPM ENGINE 83 manageTask([workitem(Go,id_1,Loc), workitem(TakePhoto,id_2,Loc)]), manageTask([workitem(EvaluatePhoto,id_1,Loc)]), ] ) ). proc(process, [rrobin([branch(loc(2,2),branch(loc(3,5)),branch(loc(4,4)))]), manageTask([workitem(SendByGPRS,id_29,input)]) ] ). The next sub-section turns to describe how adaptability is realized in this implementation. 5.2.2 Coding the adaptation framework in IndiGolog Figure 5.4 shows how the actual implementation of the adaptation framework is coded by the IndiGolog interpreter. In the remaing of this section, we name as exogenous events every unexpected exogenous action executed by the environment. Service actions readyT oStart(·) and f inishedT ask(·) are not unexpected but, rather, “good” expected actions which change the fluents to achieve to the process goals. Specifically, the framework implementation relies on three additional domain-independent fluents: f inished(s). In the starting situation it is false. It is turned to true when the process is carried out. Indeed, before finishing the execution, the process itself executes the action finish. The corresponding successorstate axioms is, hence, the following: f inished(do(t, s)) ⇔ f inished(s) ∨ t = f inish. adapting(s) In the starting situation it is false. It is turned to true when a recovery plan starts being built and is turned to false when the recovery plan is found and executed. In order to set and unset this fluent there exist two actions adaptStart and adaptFinish. The successor-state axioms is, hence, as follows: ¡ ¢ adapting(do(t, s)) ⇔ adapting(s) ∧¢t 6= adaptF inish ¡ ∨ ¬adapting(s) ∧ t = adaptStart exogenous(s). In the starting situation, it is false. It is turned to true when any exogenous action occurs. Action resetExo, when executed, aims at restoring the fluent to value true. 84 CHAPTER 5. THE SMARTPM SYSTEM Main() 1 h(¬f inished ∧ exogenous) → Monitor()i; 2 h¬f inished → (Process(); f inish)i; 3 h¬f inished → (wait)i; Monitor() 1 if (Relevant()) 2 then Adapt; 3 resetExo; Adapt() 1 adaptStart; ¡ ¢ 2 AdaptingProgram(); adaptF inish; 3 ii ¡ ¢ 4 while (adapting) do wait(); AdaptingProgram() ¡ 1 Σ SearchP rogram, 2 assumptions( 3 [hAssign(workitem(T ask, Id, Input), Srvc), readyT oStart(T ask, Id, Srvc)i, 4 hStart(T ask, Id, Srvc, Input), f inishedT ask(T ask, Id, Srvc, Input)i] 5 ¢ ) 6 SearchProgram() ¡ 1 π (T ask, Input, Srvc); 2 isP ickable(workitem(T ask, Id Adapt, Input), Srvc)?; 3 assign([workitem(T ask, Id Adapt, Input)], Srvc); 4 start(T ask, Id Adapt, Srvc, Input); 5 ackT askCompletion(T ask, Id Adapt, Srvc); 6 ¢ release[workitem(T ask, Id Adapt, Input)], Srvc); ∗ 7 ; 8 (GoalReached)?; Figure 5.4: The process adaptation procedure represented using the IndiGolog formalism 5.2. THE SMARTPM ENGINE 85 Main Procedure The main procedure of the whole IndiGolog program is Main, which involves three interrupts running at different priorities. All these interrupts are guarded by fluent f inished(·). When it holds, that means the process execution is completed successfully. Therefore, these interrupts cannot fire anymore. The first highest priority interrupt fires when an exogenous event has occurred (i.e., condition exogenous is true). In such a case the Monitor procedure. If no exogenous event has been sensed, the second triggers and the execution of the actual process is attempted. If also the progress cannot be progress, the third is activated, which consists just in waiting. The fact that the process cannot be carried on does not necessarily mean that the process is stuck forever. For instance, the process cannot progress because a certain task cannot be assigned to any qualified member (i.e., the pick is unable to find any member providing all capabilities required by that task) as all of them are currently involved in the performance of other tasks. If we did not add the third interrupt, when the process is unable to progress, IndiGolog would consider the program as failing. The monitoring/repairing procedure The Monitor procedure checks through procedure Relevant whether the exogenous event has been relevant, i.e. some fluents have changed their value consequently. If so, the Adapt procedure is launched, which will build the recovery program/process. Both if changes are relevant or are irrelevant, the procedure concludes by executing action resetExo, which turns fluent exogenous(·) to false. Let us describe how procedure Relevant works. The IndiGolog interpreter used in this realization is always evaluating situation-suppressed fluents where the situation is always intended to be the current one. Therefore, there is no way to get access to past situations in order to check if the application of the exogenous event has changed some fluents. In the light of this, for each defined fluent F (~x) in the action theory D, we give a definition of another fluent Fprev (~x) that keeps the F value in the previous situation: ∀a.Fprev (~x, do(a, s)) = x ⇔ F (~x, s) = x When an exogenous event occurs, before applying the corresponding action on the fluents, we copy the value of each fluent F to Fprev . Then, we apply the changes to every fluent as consequence of the action and, finally, we check for changes through procedure Relevant. At higher level, procedure Relevant should be modeled in second-order logics as follows (using the 86 CHAPTER 5. THE SMARTPM SYSTEM situation-suppressed form for fluents): ^ ∀~x.domainF (~x) ⇒ F (~x) = a ∧ Fprev (~x) = a F ∈D where domainF (~x) holds whether ~x is an appropriate input for F . Of course, being based on Prolog, the IndiGolog interpreter does not accept formulas in second-order logic. The only solution is to enumerate explicitly all fluents (say n) and to connect them by operators and as well as it is to define predicates domainF i(~x): ¢ def ¡ φ =¡ ∀~x.domainF 1 (~x) ⇒ ∃a.F 1(~x) = a ∧ F 1prev (~x) =¢a ∧ ∀~x.domainF 2 (~x) ⇒ ∃a.F 2(~x) = a ∧ F 2prev (~x) = a ∧¡. . . ¢ ∧ ∀~x.domainF n (~x) ⇒ ∃a.F n(~x) = a ∧ F nprev (~x) = a (5.1) If Equation 5.1 does not hold, the exogenous event has caused a relevant deviation. Please note that the number of existing fluents and appropriate inputs are finite and, hence, the approach of enumerating all of them is practically realizable. Procedure Adapt invokes AdaptingProgram in order to build and execute the recovery program and, at the same time, to remain waiting till the recovery program is totally executed. Procedure Adapting Program achieves to build the recovery plan as follows. Let φ be the formula representing the state that has to be restored (i.e., the formula in Equation 5.1 instantiated on the action theory of the current process domain). Theorem 4.2 guarantees that if there exist some recovery programs for a certain deviation, then there exists also a linear one. Therefore, we can focus on searching for linear programs. Specifically, the linear recovery program can be abstracted as follows: δerec = (π a.a)∗ ; (φ)?; The program above would state to iterate for a non-deterministic number of times the operations of picking non-deterministically a action a and executing it. Finally, the condition φ is checked. When executing δerec , the non-determinism has to be solved by choosing the number of iteration times as well as the actions a to be picked at each cycle. If we use the IndiGolog search operator and we execute Σδrec , the interpreters will use the mechanism for off-line lookahead so as to solve the non-determistic choices in a way that the whole program can terminate. Therefore, the following program is exactly the recovery plan: δrec = Σ[(π a.a)∗ ; (φ)?; ]; (5.2) 5.2. THE SMARTPM ENGINE 87 When practically implementing that, program δerec corresponds to procedure SearchProgram where formula φ is there named GoalReached. In addition, we have restricted the researching space in the light of the fact we already know that there is a specific pattern of the sequence of actions required for the whole execution of single tasks. In this way, the search discards directly action sequences that do not respect such a pattern without evaluating them. Reminding that the fluents are not changed by PMS actions, if we considered only such actions, the recovery plan meant to achieve goal GoalReached would fail as no PMS action can change fluents. Therefore, the search operator should try to find the recovery plan on the basis of some assumptions [121]. Specifically for SmartPM, there are two assumptions: the first is that the action readyT oStart(T ask, Id, Srvc) performed by a certain service Srvc is expected to follow the PMS action Assign(workitem(T ask, Id, Input), Srvc); the second concerns the PMS action Start(T ask, Id, Srvc, Input), which is supposed to come before f inishedT ask(T ask, Id, Srvc, Input) by the PMS. Once specified these assumptions, the search operator considers that, for instance, Start(·) may contribute to the achievement of a certain goal φ, given that a proper Start(·) is going to be followed by a corresponding proper f inishedT ask(·). And the latter action is able to change fluents. What does happen if assumptions are not respected? For example, the action f inishedT ask does not follow Start or returns parameter values different from those assumed. In those cases, the recovery plan is consider as failed, and a new recovery plan is searched by applying again Equation 5.2 and considering the new values of fluents. And, if found, it is executed. That means we do not recovery the recovery plan; instead, we create a new plan achieving φ starting from the current situation, discarding the previous. The code of the implementation is available in Appendix A; specifically the features for monitoring and repairing are coded between lines 222 and 345. Some screen shots. We would like to close the explanation of the SmartPM engine by showing some screen shots of the PMS. Figure 5.5 depicts the main window of SmartPM showing the log of all actions (both exogenous and of PMS). Specifically Figure 5.5(a) shows all of actions performed by the PMS and service 5 ranging from assign to ackTaskCompletion. In the windows, it is easy to see the presence of rows starting with =======> EXOGENOUS EVENT that represent the actions executed by service 5, which are considered by the PMS as exogenous events, though “good” ones. Figure 5.5(b) shows the logging of the behaviour of the system as results on exogenous event disconnect(4,loc(9,9)). This exogenous event, a “bad” one, is launched to notify that service 4 is predicted for performing a task to move to location (9,9), where it would be disconnected. This is a significant deviation of 88 CHAPTER 5. THE SMARTPM SYSTEM (a) The actions executed for performing task Go for location (5,5) (b) The recovery planning for handling exogenous event disconnect of service 4 Figure 5.5: The main window of the IndiGolog interpreter used by SmartPM. 5.2. THE SMARTPM ENGINE (a) (c) 89 (b) (d) Figure 5.6: The proof-of-concept Work-list Handler for SmartPM (subfigures a, b and c) and a Work-list Handler that we developed at an earlier stage and we are willing to integrate with SmartPM (subfigure d ) 90 CHAPTER 5. THE SMARTPM SYSTEM the physical reality from the virtual one, and requires the PMS to adapt the process by building a recovery plan. The final plan consists in moving a free service, namely service 1, to a location, specifically (3,6), in order to make sure that when service 4 will stay connected when reaching destination location loc(9,9). Although the example may seem trivial, it shows the power of the SmartPM approach: no designer specified how to deal with disconnections; in this case, disconnection has been handled easily, since there is a service that is not occupied performing other tasks. If, the case was not this, PMS automatically would have chosen a different strategy. Figures 5.6(a)-(c) depict the proof-of-concept Work-list Handler, which emulates the Graphic User Interface for PDAs for supporting the distribution of tasks to human services. We developed it for the sake of testing the actual functioning of the SmartPM PMS. We believe that it is also worthy showing in Figure 5.6(d) the Work-list Handler of ROME4EU, a previous attempt to deal with unexpected deviations (see [7]). It does not do any reasoning able to detect discrepancies and recover and, hence, it is not able to recovery from unexpected contingencies (it uses a pre-planned recovery approach). Nevertheless, it is valuable as it is entirely running on PDAs, where many of other PMSs are not. We plan to integrate the ROME4EU’s work-list handler into SmartPM so as to provide a tool for task distribution to human operators equipped with PDAs. 5.2.3 Final discussion This section has been devoted to describe how SmartPM has been implemented by using the IndiGolog platform developed by the University of Toronto. Processes are coded by IndiGolog programs. We have shown in Section 5.2.1 the feasibility of the approach of coding processes in IndiGolog. We have also underlined the program parts that may remain unchanged when passing from a process domain to another and those which have to be defined case by case. SPIDE, the Graphic Tool that allows designers to define graphically abstract process templates and, upon instantiating, create their concrete specification. SPIDE specifications are exported as XML files, which includes information useful to generate the required IndiGolog programs and the whole domain theory. Furthermore, thanks to the use of IndiGolog SmartPM has made possible to represent the adaptation features directly as IndiGolog procedures. The adaptation is based on the IndiGolog search operator which relies on a quite inefficient planning mechanism implemented in Prolog. Therefore, the current implementation should be considered as proof-of-concept rather than a final implementation.5 5 That is the main motivation why we do not provide here any testing results for judging 5.2. THE SMARTPM ENGINE 91 The next step we are currently working on is to overcome the intrinsical planning inefficiency of Prolog by making use of efficient state-of-the-art planners to build the recovery program/process. As also claimed in [29], the step should be theoretically and practically feasible Some authors have already considered the problem of integrating Gologlike programs with planners, which are mostly compliant with PDDL [92, 50]. PDDL is an action-centred language, inspired by the well-known STRIP formulations of planning problems. In addition to STRIP, PDDL allows to express a type structure for the objects in a domain, typing the parameters that appear in actions and constraining the types of arguments to predicates. At its core it is a simple standardisation of the syntax for expressing this familiar semantics of actions, using pre- and post-conditions to describe the applicability and effects of actions. Fritz et al. [5] develops an approach for compiling Gologlike task specifications together with the associated domain definition into a PDDL 2.1 planning problem that can be solved by any PDDL 2.1 compliant planner. Baier et al. [4] describes techniques for compiling Golog programs that include sensing actions into domain descriptions that can be handled by operator-based planners. Fritz et al. [51] shows how ConGolog concurrent programs together with the associated domain specification can be compiled into an ordinary situation calculus basic action theory; moreover, it shows how the specification can be complied into PDDL under some assumptions. As far as the client, Figure 5.6 has shown the current version of the worklist handler, just a proof-of concept for the sake of testing the SmartPM engine. As future development, we envision two two types of work-list handler: a fullfledged version for ultra mobile devices and a “compact” version for PDAs. First steps have been already done in these directions. The version for ultra mobile has been currently operationalized for a different PMS (see Section 7.2). The same holds also for the PDA version that has been currently developed during this thesis in the ROME4EU Process Management System, a previous valuable attempt to deal with unexpected deviations (see [7]). In conclusion, we are willing to underline once more that the approach proposed, which this section has shown an implementation of, is not another way to catch pre-planned exception. Other approaches rely on rules to define the behaviors when special events are triggered. Here we simply model (a subset of) the running environment and the actions’ effects, without considering any possible exceptional events. We argue that, in most of cases, modeling the environment, even in detail, is easier than modeling all possible exceptions. the performance 92 5.3 CHAPTER 5. THE SMARTPM SYSTEM The Network Protocol This section aims at describing an implementation of a manet layer for PDAs and PCs to allow the multi-hop communication. Indeed, the current operating systems allows to add devices to mobile ad-hoc networks (i.e., mobile networks without access points), but two devices that are not in radio range cannot communicate. By implementing multi-hop communication features, some devices that are not in radio range can exchange data packets using intermediate nodes are relays. Passing node by node, the packets reach the appropriate receivers, conceptually in the same way as packets flow through public world-wide Internet to arrive at servers (and viceversa back at clients). We are willing to use SmartPM in an emergency management scenario where services communicate with the PMS through manets. Therefore, we decided to implement a concrete multi-hop manet layer, starting from a preexisting implementation by the U.S. Naval. We extended it in order to be working on the last generation of PDAs and low-profile devices. In order to verify the actual feasibility of data packet exchange in manet networks, we performed emulation by using octopus so as to let PDAs really exchange packets. An important concern is that, when testing, since all nodes are in the same laboratory room, the interference among nodes was significantly higher than whether those node were place in a real area. Nevertheless, we discovered and proved a relationship between laboratory and on-the-spot results, thus being able to derive on-the-spot performance levels from those got in the laboratory. Section 5.3.1 compares with relevant work and describes some technical aspects of the U.S. Naval implementation, which we have started from. Section 5.3.2 shows how tests have been conducted and the results obtained. Section 5.3.3 gives some final remarks which influenced the use of SmartPM in manet scenarios. 5.3.1 Protocols and implementations The purpose of this section is to overview of protocols and actual available implementations for providing multi-hop delivering features in manets, pointing out pros and cons. Routing protocols for manets can be divided in (i) topology-based or (ii) position-based. A position-based routing needs information about the current physical position of a node, that can be acquired through “localization services” (e.g., a GPS), which very recently are becoming easily available on PDAs (e.g., [75, 6]). Topology based protocols use information about the existent links between node pairs. These protocols can be classificated by the “time of route calculation”: (i) proactive, (ii) reactive and (iii) hybrid. 5.3. THE NETWORK PROTOCOL 93 A proactive approach to the manet routing seeks to maintain constantly an updated topology knowledge, known to every node. This results in a constant overhead of routing traffic, but no initial delay in communication. Example protocols are OLSR and DSDV [107]. Reactive protocols seek to set up routes on-demand. If a node is willing to initiate a communication with another node to which it has no route, the routing protocol will try to establish such a route upon request. DSR [71], AODV 6 and DYMO 7 are all reactive protocols. Finally hybrid protocols use both proactive and reactive approaches, as ZRP 8 . Out of these routing protocols, some implementations exist, mainly for laptops, and only a few of them works on PDAs. Protocols that require special equipment on board of devices or on the field, such as position-based protocols, were discarded in our study because we aim at using off-the-shelf devices and at operating with no existing infrastructures (e.g., in emergency management). Moreover, we notice that reactive protocols in general have worse performance than proactive ones in term of reactiveness to changes in the topology. Conversely proactive protocols require more bandwidth [106]. A working implementation of AODV is WINAODV [143]; DYMO is the most recent project and hence it is still in the standardization stage; an implementation for PDAs does not exist yet. Three OLSR working implementations are available. The OLSRD 9 has a strong development community and it can be extended through plug-ins. The “OLSRD for Windows 2000 and PocketPc” implementation 10 is a porting of the laptop OLSR version to mobile devices. But these two projects, designed for older Windows CE versions, seem not to be working properly on the latest Windows CE version (Windows Mobile 6). The NRL (US Naval Research Lab) implementation 11 offers QoS functionalities, appears as a mature project and works on Unix/Windows/WinCE. Although it seems not to be working on the latest version of Windows Mobile-based PDAs, it results to be a good starting point to extend in some features. NRLOLSR is a research oriented OLSR implementation, evolved from OLSR draft version 3. It is written in C++ according to an object oriented paradigm, and built on top of the NRL protolib library 12 for guaranteeing system portability. 6 http://www.faqs.org/rfcs/rfc3561.html http://tools.ietf.org/ html/draft-ietf-manet-dymo-02 8 http://www.tools.ietf.org/id/ draft-ietf-manet-zone-zrp-04.txt 9 http://www.olsr.org 10 http://www.grc.upv.es/calafate/olsr/olsr.htm 11 http://cs.itd.nrl.navy.mil/work/olsr/index.php 12 http://cs.itd.nrl.navy.mil/work/protolib/ 7 94 CHAPTER 5. THE SMARTPM SYSTEM Figure 5.7: MAC interference among a chain of nodes. The solid-line circle denotes a node’s valid transmission range. The dotted-line circle denotes a node’s interference range. Node 1’s transmission will corrupt the node 4’s transmissions to node 3 Protolib works with Linux, Windows, WinCE, OpenZaurus, ns-2, Opnet; it can works also with IPv6. It provides a system independent interface; so, NRLOLSR does not make any direct system calls to the device operating system. Timers, socket calls, route table management, address handling are all managed through Protolib calls. To work with WinCE, Protolib uses the RawEther component to handle at low level raw messages and get access to the network interface cards. The core OLSR code is used for all supported systems. Porting NRLOLSR to a new system only requires re-defining existing protolib function calls. NRLOLSR has non-standard command line options for research purposes, such as “shortest path first route calculations”, fuzzy and slowdown options, etc. Moreover, it uses a link-local multicast address instead of broadcast by default. 5.3.2 Testing Manets One of the most significant tests described later concerns the throughput in a chain of n links, when the first node is willing to communicate with the last. In this chain, every node is placed at a maximum coverage distance from the previous and the next node in the chain, such as in Figure 5.7. In the 5.3. THE NETWORK PROTOCOL 95 shared air medium, any 802.11x compliant device cannot receive and/or send data in presence of an interference caused by another device which is already transmitting. From other studies (e.g., [84]) we know that every node is able to communicate only with the previous and the next, whereas it can interfere also with any other node located at a distance less or equal to the double of the maximum coverage distance. Therefore, if many devices are in twice the radio range, only one of them will be able to transmit data at once. In our tests for the chain throughput, all devices are in the same laboratory room, which means they are in a medium sharing context. The chain topology is just emulated by octopus. Of course, having all devices in the laboratory, the level of interference is much higher than on the field; hence, the throughput gets a significant decrease. We have achieved a way to compute a theoretical on-field throughput for a chain from the result obtained in the laboratory. Let Qf ield (n) be the throughput in a real field for a chain of n links (i.e., n+1 nodes). We are willing to define a method in order to compute it starting from laboratory-measured throughput Qlab (n). Here, we aim at finding a function Conv(n), such that: Qf ield (n) = Conv(n) · Qlab (n) (5.3) in order to derive on-field performance. We rely on the following assumptions: 1. The first node in the chain wishes to communicate with the last one (e.g, by sending a file). The message is split into several packets, which pass one by one through all intermediate nodes in the chain. 2. Time is divided in slots. In the beginning of each slot all nodes, but the last one, try to forward to the following in the chain a packet, which slot by slot arrives at the last node. 3. Communications happen on the TCP/IP stack. Hence, every node that has not delivered a packet has to transmit it again. 4. The laboratory throughput Qlab (n) = nαβ , for some values of α and β. This assumption is realistic as it complies several theoretical works, such as [84, 57]. We have proved the following statement: Statement. Let us consider a chain formed by (n+1) nodes connected through n links. On the basis of assumptions above, it holds13 : ¡ n ¢β Conv(n) = b c + 1 2 3 13 b·c denotes the truncation to the closest lower integer (5.4) 96 CHAPTER 5. THE SMARTPM SYSTEM Proof. From the first assumption, we can say that, if the i-th node successes in transmitting, then (i − 1)-th, (i − 2)-th, (i + 1)-th and (i + 2)-th cannot. Let us name the following events: (i) Dn be the event of delivering a packet in a chain of n links and (ii) Sni be the event of delivering at the i-th attempt. Let us name Ti,n as the probabilistic event of delivering a packet in a network of n links (i.e., n + 1 nodes) after i retransmissions 14 . For all n the probability of delivering after one attempt is the same as the probability of deliver a packet: P (T1,n ) = P (Dn ). Conversely, probability P (T2,n ) is equal to the probability of not delivering at the first P (¬Sn1 ) and of delivering at the second attempt P (Sn2 ): P (T2,n ) = P (Sn2 ∩ ¬Sn1 ) = P (Sn2 ) · P (¬Sn1 |Sn2 ) (5.5) Since, for all i, events Sni are independent and P (Sni ) = P (Dn ), Equation 5.5 becomes: P (T2,n ) = P (Sn2 ) · P (¬Sn1 ) = P (Dn ) · (1 − P (Dn )) In general, the probability of delivering a packet to the destination node after i retransmissions is: (i−1) P (Ti,n ) = P (Sni ) · P (¬Sn ) · . . . · P (¬Sn1 ) = = P (Dn ) · (1 − P (Dn ))i−1 (5.6) We can compute the average number of retransmissions, according to Equation 5.6 as follows: P Tn =P ∞ i=1 P (Ti,n ) = (5.7) 1 i−1 = = ∞ i=1 P (Dn ) · (1 − P (Dn )) P (Dn ) In a laboratory, all nodes are in the same radio range. Therefore, independently on the nodes number, P (Dnlab ) = 1/n (5.8) On the field, we have to distinguish on the basis of the number of links. Up to 2 links (i.e., 3 nodes), all nodes interfere and, hence, just one node out of 2 or 3 can deliver a packet in a time slot. So, P (D1f ield ) = 1 and P (D2f ield ) = 1/2. For links n = 3, 4, 5, two nodes success: P (Dnf ield ) = 2/n. For links n = 6, 7, 8, there are 3 nodes delivering: P (Dnf ield ) = 3/n. Hence, in general we can state: P (Dnf ield ) = 14 b n3 c + 1 n (5.9) Please note this is different with respect to Sni , since Ti,n implies deliver did not success up to the i − 1-th attempt 5.3. THE NETWORK PROTOCOL 97 By applying Equations 5.8 and 5.9 to Equation 5.7, we derive the number of retransmission needed for delivering a packet: T f ield (n) = T lab (n) = n n bn c+1 3 (5.10) Fixing the number of packets to be delivered, we can define a function f that expresses the throughput in function of the number of sent packets. If we have a chain of n links and we want to deliver a single packet from the first to the last node in the chain, then we have altogether to send the number n of links times the expected value for each link Tn . Therefore: Qlab (n) = f (T lab (n) · n) = f (n2 ) 2 Qf ield (n) = f (T f ield (n) · n) = f ( b nnc+1 ) (5.11) 3 From our laboratory experiments described in Section 5.3.2, as well as from other theoretical results [84]), we can state f (n2 ) = nαβ . By considering it and Equations 5.11, the following holds: ¢β ¡ n Qf ield (n) Qlab (n) 2 = c + 1 (5.12) ⇒ Q (n) = Q (n) · b f ield lab 2 f (n2 ) 3 f ( nn ) b 3 c+1 The test-bed and experiments The test-bed devices are all off-the-shelf, certified for the 802.11b standard. Specifically, we used one HP iPAQ 5550 (CPU 400 MHz) running PocketPC 2003/WinCE 4.2 and three ASUS P527 (CPU 200 MHz) equipped with Windows Mobile 6.0/WinCE 5.0. These are complemented by 4 PDAs emulated through the PDA emulator of MS Visual Studio .NET. Such emulated PDAs are running on usual laptops and guaranteed performance levels are slightly less than ASUS PDAs. Therefore, in every test, manets are only composed by PDAs. We build the ad hoc network with 802.11b, and we connect all the devices with encryption and RTS/CTS ability turned off. One more workstation (equipped with a wireless card) is running the octopus emulator and plays the role of gateway: devices are supposed to send any packet to the target destination; but actually every packet is captured by octopus, which decides whether or not to forward it to the destination by considering whether or not the sender and the destination node are neighbor in the kept virtual map. Each device is running the NRLOLSR protocol implementation specific for its operating system (WinCE or Windows XP). We investigate on three kinds of tests: the performance of chain topology; some tuning related to the protocol; some tests with moving devices. 98 CHAPTER 5. THE SMARTPM SYSTEM Figure 5.8: Test results for a manet chain in the laboratory, and estimated on-the-spot results Performance of the chain topology. The aim of this test is to get the maximum transfer rate on a chain. To obtain the measurements an application for Windows CE was built (using the .NET Compact Framework 2.0), which transfers a file from the first to the last node on top of TCP/IP, reporting the elapsed time. All the devices use the routing protocol with the default settings and HELLO INTERVAL set to 0.5 seconds. octopus emulates the chain topology and grabs all broadcast packets. When a node wants to communicate to another node, it sends packets directly to it if this is in his neighborhood, otherwise it sends them following the routing path. Both real and emulated devices were used; each reported value is the mean value of five test runs. Figure 5.8 shows the throughput outcomes. The blue curve tracks the laboratory results; as stated in Section 4, we found through interpolation that the curve follows the trend Qlab (n) = nαβ where α = 385 and β = 1.21. The green curve is the maximum theoretical throughput computed by Equation 5.4. We believe the actual throughput we can trust when developing applications is between the green and the blue curve. 5.3. THE NETWORK PROTOCOL 99 Tuning of the protocol. There are a lot of parameters of NRLOLSR that can be changed but only few of them have a strong impact on the protocol effectiveness. We focus on HELLO INTERVAL that is the most important value because it influences the reactivity on topology changes. We test how increasing or decreasing this parameter could affect the topology knowledge, and, hence, the reactivity of the network. As every mobility pattern can be stepwise considered as a crossing of chain of nodes, we investigate a single chain, by considering it as a “building block”. The scenario is as shown in Figure 5.7: the nodes in the chain are fixed and not moving; each node knows only two neighbors; at time t node 1 enters in the range of node 2; we compute the time elapsed between t and the first application message sent by 6 and received by 1. To do this a client/server application that continuously sends UDP messages from the first node to the last node was built; this, indeed, introduce a small delay that can be ignored. This interval is referred as FPT (First Packet Time) and it can be broken down as follows: F P T = 2 · chain time + build route time (5.13) where chain time is the time used by the packet to travel along all the chain and to come back, and build route time is the fraction of time that is necessary to the head node to build the new routing table and choose the correct path for the packet. To catch the exact time, in this test, the head node and the entering node are laptop instead of PDAs, so it easy to use a network sniffer software (that is not available on PDAs). Again the mobility emulation is provided by the octopus machine. Figure 5.9 shows the trend of FPT with different values for HELLO INTERVAL. Each reported value is the mean value of eight runs. The curve decreases linearly except on the last point, where the interval is set to 0.1 second. For interval less than 0.5 seconds the FPT increases. A minimum around 0.5s is due to the inability of devices to follow the network load. The value of the minimum depends upon the CPU, the RAM, in general upon the hardware configuration of the PDA: more powerful devices should return a smaller minimum. All these values have to be considered for one single traffic flow, so in a real scenarios where the traffic is very high and there are multiple flows, it is important to choose an interval value that allows fast topology reactivity and that does not overload too much the devices. Tests with moving devices. This kind of test aims to determine whether or not the NRLOLSR implementation is suitable for a real environment where nodes are often moving. Indeed, in a real field it is important not to break the 100 CHAPTER 5. THE SMARTPM SYSTEM Figure 5.9: Time elapsed to establish a direct communication in a chain of five nodes communication among movements of nodes. If a team member is transmitting information to another team member, and nodes topology changes, all data must be delivered successfully, provided that the sender and the receiver are connected at all times through a multi-hop path, maybe changing over the time. In order to emulate a setting of moving devices, we investigate three topologies, as shown in Figure 5.10, where the dashed line shows the trajectory followed by a moving device. Such topologies are designed in order to have (i) the moving node always connected at least another node, and (ii) each node is connected in some ways to at least another one, i.e., there are not disconnected node (no partitions in the manet). A WinCE application is used that continually sends 1000-byte longs TCP/IP packets between node S and node D. We tested every topology five times and every run was 300-seconds long. Outcomes are demonstrated to be quite good, for every topology: during every run all data packets were correctly delivered to the destination. Wed experience only some delays when the topologies were changing for a node movement. Indeed, while a new path is set up, data transmission incurs in 100% losses since the old path cannot be used for delivering. At application level, we are using reliable TCP and, hence, packets delivering is delayed since every single packet has to be transmitted again and again until the new path is built up. TCP defines a timeout for retries; if a packet cannot be delivered by a 5.3. THE NETWORK PROTOCOL 101 Figure 5.10: Dynamic topologies for testing TCP/IP disconnections certain time amount, an error is returned at application level and no attempts are going to be done anymore. In order not to incur in TCP timeouts, the node motion speed is crucial: if nodes are moving too fast, topologies are changing too frequently and, hence, the protocol is not reactive enough to keep routes updated. In the tested topologies, we have discovered that the maximum speed is around 18 m/s (65 km/h) such that TCP timers never expire. 5.3.3 Final Remarks The results depicted in Figure 5.8 allows to carefully take into account the throughput that a manet of real devices can nowadays support. Surely, on the basis of the previous discussions, any configuration of a manet will present a performance that lies in the area between the two lines, being one the possible worst case and the other the possible best case. We have shown that for more 102 CHAPTER 5. THE SMARTPM SYSTEM than 5 devices we have a throughput of about 50 Kbytes/sec. As a matter of fact, the data exchanged between services and the PMS engine are quite limited in size and compatible with such a limited bandwidth. The fact that SmartPM itself works in manet scenarios does not mean that the services integrated do. Services to be integrated should be conceived and developed in order to limit the bandwidth they require. 5.4 Disconnection Prediction in Manets This section aims at illustrating a technique to predict disconnections of devices in Mobile Ad-Hoc Networks before the actual occurrence. When working on the spot, team members move in the affected area to carry out the tasks assigned to services. If using manets, movements may cause possible disconnections and, hence, unavailability of nodes, and, finally, unavailability of provided services. The SmartPM adaptation should be able to realize when devices are disconnecting and enact an appropriate recovery plan to avoid to lose such devices and the services they provide. This section aims at showing a specific sensor that is able to predict disconnection before they actually occur. Indeed, once a device disconnects, it gets out of control and, hence, SmartPM cannot generate appropriate recovery plans that involve actions for such devices with the result of reducing the effectiveness of such plans. Figure 5.11 shows how the disconnection predictor is located into the overall SmartPM architecture. The prediction is done by a central entity called Disconnection Prediction Coordinator which is currently implemented in C#. When a disconnection of a given device a is predicted, the coordinator informs the corresponding Prediction Manager, which is physically located inside the SmartPM architecture. This manager generates for each of the service s installed on a an exogenous action/event disconnect(s, loc). Parameter loc is a location pair (x, y) identifying the location where a (and all its services) are predicted to move to once disconnected.. Finally the Communication Manager notifies the IndiGolog engine of the occurred event. Our predictive technique is based on few assumptions: 1. Each device is equipped with specific hardware that allows it to know its distance from the surrounding connected (i.e., within radio range) devices. This is not a very strong assumption, as either devices are equipped with GPS or specific techniques and methods (e.g., TDOA time difference of arrival, SNR - signal noise ratio, the Cricket compass, etc.) are easily available. Kusy et al. [79] present a precise technique to track multiple wireless nodes simultaneously. It relies on measuring the position of tracked mobile nodes through radio interferometry. This is 5.4. DISCONNECTION PREDICTION IN MANETS 103 Figure 5.11: The architecture of the disconnection predictor in SmartPM. guaranteed to reduce significantly the error with respect to GPS. Nevertheless, Hadaller et al. [58] have devised techniques to mitigate the error when computing node position through GPS. Indeed, they performed experiments where the error has been reduced to 3 meters when nodes are not moving and to 20 meters when nodes are at 80 km/h. 2. There are no landmarks (i.e., static devices with GPS) in the manet; we are indeed interested in very dynamic manets, where the availability of landmarks can not be supposed. 3. At start-up, all devices are connected (i.e., for each device there is a path - possibly multi-hop - to any other device). The reader should note that we are not requiring that each device is within the radio range of (i.e., one hop connection to) any other device (tight connection), but we require only a loose connection, which can be guaranteed by appropriate routing protocols, such as its implementation described in Section 5.3. 4. A specific device in manet, referred to as coordinator, is in charge of centrally predicting disconnections. As all devices can communicate at start-up and the ultimate goal of our work is to maintain such connections through predictions, it is possible to collect centrally all the information from all devices. The coordinator may coincide with the 104 CHAPTER 5. THE SMARTPM SYSTEM node hosting the SmartPM core engine but may be any other node in the same network. The predictive technique is essentially as follows: at a given time instant ti the coordinator device collects all distance information from other devices (for assumptions (1) and (3)); on the basis of such information, the coordinator builds a probable connection graph that is the probable graph at the next time instant ti+1 in which the predicted connected devices are highlighted. On the basis of such prediction, the coordinator layer will take appropriate actions (which are no further considered in the following of this section). The remaining of this section starts with evaluating the state of the art of mobility prediction. Then, we enter deeply inside the technique we aim at proposing. 5.4.1 Related Work Much research on mobility prediction has been carried on (and still it is in progress) above all for cellular phone systems [2, 85]. These approaches are based on Markov models, which predict the mobile user future’s location on the basis of its current and past locations. The aim is to predict whether a mobile user is leaving a current cell (crossing the cell boundaries) and the new cell where she is going. Such an information is then used for channel reservation in the new cell. Anticipating reservation should lower the probability of a call to be dropped during handoff 15 due to the absence of a free channel for the call in the new cell. The main differences with our approach are related different scenarios: manet versus mobile phone networks. Indeed, peculiarities of manets consist in that higher mobility, compared with phone networks. In manets, links between couples of devices disappear very frequently. That does not happen in phone cells, which are very big: leaving a cell and entering into a new is rare with respect to how often manet links falls down. We use a centralized approach like in cellular network where a coordinator collects information to allow prediction. The difference is that our approach takes into account the knowledge of all distances among all users. Indeed, we don’t have any base station; therefore, we do not have just to predict the distance of any mobile device to it. We are interested in the distance from any device to anyone else. In the literature, several approaches predict the state of connectivity of manet nodes. The most common approaches assume that some of nodes are aware of their location through GPS systems in order to study node motions 15 In cellular telecommunications, the term handoff refers to the process of transferring an ongoing call or data session from one channel connected to a core network or cell to another. 5.4. DISCONNECTION PREDICTION IN MANETS 105 and predict disconnections. In [103] the authors perform positioning in network using range measurements and angle of arrival measurements. But their method requires a fraction of nodes to disseminate their location information such that other nodes can triangulate their position. In [116] the probability that a connection will be continuosly available during a period of time is computed only if at least one node knows its position and its speed through GPS. Our approach is more generic as it doesn’t require any specific location techniques: every hardware allowing to know node distances is fine. In [137] manets are considered as a combination of clusters of nodes and it studies the impact (i.e., the performances) of two well defined mobility prediction schemes on the temporal stability of such clusters; unlike our approach the authors use the pre-existing predictive models while the novelty of our approach consists in the formalization of a new model based on Bayesian filtering techniques. In [45] neighbor prediction in manets is enacted through a suitable particle filter and it uses the information inside the routing table of each node. Routing table is continuously updated by the underlying manet protocol. The first drawback is that it can operates only with those protocols that work by updating routing tables. Since it is based only on routing table updates, it predicts how long couples of nodes are going to be connected on the basis of how long they have been connected in the past. It doesn’t consider whether couples of nodes are moving closer or drifting apart, nor node motion speed. Our approach takes such an information also into account, making prediction more accurate. Fox et al. [49] address the issue of robot location estimation. For each position pi and each robot rj , the technique gives the probability for rj to be in pi . This approach cannot be easily used to compute when nodes are going to disconnect. 5.4.2 The Technique Proposed Bayesian Filters Bayes filters [13] probabilistically estimate/predict the current state of the system from noisy observations. Bayes filters represent the state at time t by a random variable Θt . At each point in time, a probability distribution Belt (θ) over Θt , called belief, represents the uncertainty. Bayes filters aim to sequentially estimate such beliefs over the state space conditioned on all information contained in the sensor data. To illustrate, let’s assume that the sensor data consists of a sequence of time-indexed sensor observations z1 , z2 , ...., zn . The Beli (θ) is then defined by the posterior density over the random variable Θt conditioned on all sensor data available at time t: Belt (θ) = p(θ|z1 , z2 , ...zt ) (5.14) 106 CHAPTER 5. THE SMARTPM SYSTEM Generally speaking, the complexity of computing such posterior density grows exponentially over time because the number of observations increases over time; it is necessary for making the computation tractable the following two assumptions: 1. The system’s dynamic is markovian, i.e., the observations are statistically independent; 2. The devices are the only subjects that are capable to change the environment. On the basis of the above two assumptions, the equation in a time instant t can be expressed as the combination of a prediction factor Belt−1 (θ) (the equation in the previous time instant) and an update factor that realizes the update of the prediction factor on the basis of the observations in the time instant t. In our approach, the random variable Θt belongs to [0, 1] and we use the Beta(α,β) function as a belief distribution to model the behavior of the system, according to the following equation: Belt (θ) = Beta(αt ,βt ) (5.15) The beta distribution is a family of continuous probability distributions defined on interval [0, 1] parameterized by two positive shape parameters. The probability density function of the beta distribution is: Beta(α,β) (x) = R 1 0 xα−1 (1 − x)β−1 uα−1 (1 − u)β−1 du While the mean value and the variance are closed-form expression, the Cumulative distribution function can be only computed through numerical analysis. Mean value and variance are defined as follows: α E(X) = α+β V ar(X) = (α+β)2αβ (α+β+1) In Bayesian Filtering, values α and β represent the state of the system and vary according to the following equations: ½ αt+1 = αt + zt (5.16) βt+1 = βt + zt In our approach, the observation zt represents the variation of the relative distance between nodes (i,j) normalized with respect to radio range in the time period [t-1,t]. It is used to update the two parameters α and β of the Beta function according to Equation 5.16. The evaluated Beta(α, β) function (i,j) predicts the value of θt+1 estimating the relative distance that will be covered by the nodes (i,j) in the next time period [t,t+1]. 5.4. DISCONNECTION PREDICTION IN MANETS 107 timer: a timer expiring each T seconds. iBuffer[x,y]: a bi-dimensional squared matrix storing distance among couples of nodes X and Y. bayesianBuffer[x,y]: a bi-dimensional square matrix storing a triple (α, β, distance) for each couple of nodes X and Y. upon delivering by node i of tuple(i, j, dist) 1 iBuf f er[i, j] ← dist upon expiring of timer() 1 localBuf f er ← iBuf f er[i, j] 2 /*empty intermediate buffer*/ 3 for (i, j) ∈ ibuf f er 4 do ibuf f er[i, j] ← RADIO RAN GE 5 6 for (i, j) ∈ localBuf f er 7 do if localBuf f er[i, j] ← RADIO RAN GE 8 then observation ← 1 9 else observation ← (localBuf f er[i, j] − bayesianBuf f er[i, j].distance)/RADIO RAN GE 10 observation ← (observation + 1)/2 11 bayesianBuf f er[i, j].distance ← localBuf f er[i, j] 12 bayesianBuf f er[i, j].alpha ← u ∗ bayesianBuf f er[i, j].alpha + observation 13 bayesianBuf f er[i, j].beta ← u ∗ bayesianBuf f er[i, j].beta + (1 − observation) Figure 5.12: Pseudo-codes of the Bayesian algorithm for predicting node distances. Prediction of distances Our approach relies on clock cycles whose periods are T . The pseudo-code for the coordinator is described in Figure 5.12. We assume the iBuffer data structure to be stored only at Team Leader and accessed only by local threads in a synchronized way. For each ordered couple (i, j) of nodes, in the n-th (i,j) (i,j) cycle, the monitor stores two float parameters, αn and βn , and the last (i,j) observed distance dn−1 . Let us assume a node k comes in a manet during the m-th clock cycle. (k,j) (k,j) Then, for each manet node j we initialize αm = βm = 1. In such a way (k,j) we get the uniform distribution in [0, 1] and, so, every distance dm+1 gets the same probability. For each time period T , each generic node i sends a set of tuples (i, j, dj ) to the coordinator, where j is an unique name of a neighboring node and dj is the distance to j. The coordinator collects continuously such tuples (i, j, dj ) coming from the nodes in an intermediate buffer. We do no assumption about clock synchronization. So, every node collects and sends information to Team Leader according to its clock, which is in general shifted with respect to the one of other nodes. Monitor performs prediction according to the same clock T : at the begin- 108 CHAPTER 5. THE SMARTPM SYSTEM ning of the generic n-th clock cycle upon timer expiring, it copies the tuples (i, j, djn ) from the intermediate buffer to another one and, then, it empties the former buffer to get ready for updated values. In the clock cycle, for each collected tuple (i, j, dj ) monitor updates the parameters as follow by a bayesian filter: ( (i,j) (i,j) (i,j) αn+1 = u · αn + on (5.17) (i,j) (i,j) (i,j) βn+1 = u · βn + (1 − on ) (i,j) where on is an observation and u ∈ [0, 1] is a constant value. Constant u aims for permitting old observations to age. As new observations arrive, the previous ones get less and less relevance. Indeed, old observations do not capture the updated status of manet connectivity and motion. The value for observation can be computed from the relative distance variation between i and j, scaled with radio-range: (i,j) ∆drn(i,j) (i,j) dn − dn−1 = radio range (5.18) where radio range is the maximum distance from where two nodes can communicate with each other. (i,j) Possibly dn can miss in the cycle n. The distance between i and j could miss because i and j are not in radio-range or packets sent by i to Team Leader are lost or delivered lately. (i,j) It is straightforward to prove ∆drn to range in [-1, 1] interval. This range is not suitable for Bayesian filter since observations should be between 0 and 1. So we map the value in Equation 5.18 into the suitable range [0, 1] as follows16 : (i,j) (i,j) dn −dn−1 dn and dn−1 are available radio range (5.19) o(i,j) = if dn is unavailable n 11 2 if dn is available but dn−1 is not In sum, our Bayesian approach estimates the variation of the future distance between every couple of nodes, normalized in the [0, 1] range. Values greater than 0.5 mean nodes to drift apart and smaller values to move closer. If the value is equal to 0.5, node i is estimated not to move with respect to j. The parameters α and β are the inputs for Beta distribution Beta(α, β), ¡ (i,j) (i,j) (i,j) ¢ where the expectation θn+1 = E Beta(αn+1 , βn+1 ) is the variation of the distance between i and j in radio-range percentage that will be estimated at the beginning of (n + 1)-th clock cycle. 16 (i,j) If a node has entered in this cycle we assume on = 0.5, i.e., it is not moving. 5.4. DISCONNECTION PREDICTION IN MANETS 109 At this stage we can estimate the distance between nodes i and j at the beginning of (n + 1)-th clock cycle. That can be done from Equation 5.19 by (i,j) (i,j) replacing the observation term on with the estimated value θn+1 . Hence: (i,j) (i,j) (i,j) f den+1 = dn + ∆d = n (i,j) (i,j) = dn + (2θ − 1) ∗ radio range (5.20) (i,j) (j,i) (i,j) (j,i) It should hold dn = dn ; so, it should be den+1 = den+1 . But we have (i,j) (j,i) to consider den+1 6= den+1 . Indeed distance sent by i about distance (i, j) can differ from what is sent by j about the same distance. This is why distances are collected at beginning of clock cycles but these can be shifted. Indeed, information can be different, as collected in different moments. Therefore, estimated distance dei,j is computed by considering both dei,j n+1 n+1 (i,j) and den+1 , through different weights. (i,j) (j,i) e(i,j) e(j,i) dei,j n+1 = reln+1 ∗ dn+1 + reln+1 ∗ dn+1 (i,j) where reln+1 liability q and (i,j) σn+1 = is a factor for the estimation is inversely proportional ¡ (i,j) (i,j) ¢ V ar(Beta αn+1 , βn+1 ) : it (i,j) reln+1 = 1 (i,j) σn+1 1 (i,j) σn+1 + reto (j,i) = 1 (j,i) σn+1 σn+1 (i,j) (j,i) σn+1 + σn+1 . Connected Components Computation Disconnection prediction depends on a parameter γ, which stands for the fraction of the radio-range for which the predictive technique doesn’t signal a (i,j) (i,j) disconnection anomaly17 . Let be P (discn+1 ) = P (den+1 ≥ γradio range); two nodes i and j are predicted going to disconnect if and only if (i,j) (i,j) (j,i) (j,i) reln+1 ∗ P (discn+1 ) + reln+1 ∗ P (discn+1 ) > 1 2 (5.21) i.e. two nodes i and j are estimated disconnecting if it is more probable their distance to be greater than γradio range rather than distance to be smaller than such a value. We could tune more conservativeness by lowing γ (i.e. 17 As an example, in IEEE 802.11 with 100 meters of radio-range, γ equal to 0.7 means that for a communication distance of 70 meters the prediction algorithm signals a probable disconnection. 110 CHAPTER 5. THE SMARTPM SYSTEM the fraction of radio-range in which disconnections are not predicted). If we consider Equation 5.20, then: (i,j) P (discn+1 ) ¯ = P (¯ ¡ = P θ(i,j) ≥ (i,j) dn radio range 1+γ 2 − ¯ + (2θ(i,j) − 1)¯ ≥ γ) ¢ d(i,j) n 2∗radio range (5.22) where the last term in Equation 5.22 is directly computable from the estimated beta distribution: Z 1 ¡ ¢ (i,j) P (θ > k) = Beta α(i,j) , β (i,j) k Once the algorithm predicts which links exist at the next cycle, we can compute easily the connected components (i.e., sets of nodes that are predicted to be connected). Afterwards, on the basis of the connected components, disconnection anomalies are identified by the monitor. Connected components are computable through “The Mobile Gamblers Ruin Algorithm” below, where an edge between couples of nodes in the connection graph exists if Equation 5.21 is false. Note that an error could be introduced by techniques for communication distance evaluation: as our model is based on a Markov chain made of communication distances between devices, and since the measured distances could include an approximation error compared to real communication distances, t this error could affect our model. Let’s assume that for every S(i,j) there is an average error 4S introduced by the real measure. Thus, by observing that our model is linear, it follows that the 4S is spread all over the measures but (t+1) (t+1) doesn’t depend on t, so Sp(i,j) is actually Sp(i,j) ± 4S. Indeed, the exact value of 4S depends on which technique is used for distances evaluation, but as it (t+1) is typically small compared to Sp(i,j) , then our average error on the prediction model is only partially affected by this error. The “The Mobile Gambler’s Ruin” (MGR) algorithm is derived from the Markov chain model of the well-known gambler’s ruin problem [47, 62]. Such a study of the device movements and the consequent distance prediction is based on Markov chains, because the success of a prediction depends only on events of previous time frame units. Instead of using a markovian process in time domain, we are going to focalize on spatial domain and we will build a matrix, which is similar to the one presented in the original gambler’s ruin model but with other elements. Let’s consider a square matrix of |E| × |E| elements, where |E| = m, with m, with m is the total number of mobile devices in the manet. We build M = (mij ) as a m × m symmetric matrix, in which mij = 1 is the Equation 5.21 is false or, otherwise mij = 0 if the equation is true18 . 18 The matrix is of course symmetric since always there holds mij = mji 5.4. DISCONNECTION PREDICTION IN MANETS 111 FUNCTION MGR() 1 numcomps ← 0 2 Comps ← newArray of integer[m]; 3 for i ← 0 to (m − 1) 4 do if Comps[i] = 0 5 then numcomps ← numcomps + 1 6 Comps[i] ← numcomps 7 CCDFSG(M, i, numcomps, Comps[]) 8 return Comps[] SUB CCDFSG(M, i, numcomps, Comps[]) 1 for i ← 0 to (m − 1) 2 do if Comps[j] = 0 and M [i, j] = 1 3 then numcomps ← numcomps + 1 4 CCDFSG(M, j, numcomps, Comps[]) 5 FUNCTION TEST CONNECTION(i, j, Comps[]) 1 if Comps[i] = Comps[j] 2 then T EST ← true 3 else T EST ← f alse 4 return T EST Figure 5.13: Pseudo-Code of the MGR algorithm. (i,j) (j,i) Every diagonal element mii = 1 since the P (discn+1 ) = P (discn+1 ) = 0. That follows for definition: the distance of a mobile device from itself is always equal to 0. The matrix M = (mij ) can be considered as the Adjacency matrix of an (undirected) graph where the set of nodes are devices and an arc exists between two nodes if they are foreseen as direct neighbors. The strategy of the MGR algorithm, which is described in Figure 5.4.2, is to find the connected components of the graph (using the CCDFSG procedure), and then, by giving two devices ei and ej , to verify if they belong to the same connected component (the TEST CONNECTION function); if it is true then ei , ej will still communicate in the next time period; else they will lose their connection within the next time period. Using this strategy, after building the matrix M = (mij ), we can verify which devices are connected, directly (i.e., one hop) or indirectly (i.e., multi hop), and thus let decide when disconnection management techniques should be activated in order to keep the connection between the involved devices. The aim of such techniques should be to have a unique connected component in the graph. The MGR algorithm computes the connected components starting from the matrix that represents the graph. The output of the MGR program is the Comps array in which for each i-th element there is an integer value corresponding to the connected component it belongs. For example, if we have a set of devices E = {e1 , ..., em } and they form a graph with k connected 112 CHAPTER 5. THE SMARTPM SYSTEM Generic Peer Coordinator Device BayesianPredServer it.uniroma1.dis.Octopus BayesianBuffer PredictiveTimer BayesianTuple Buffer DistanceServer Information About Neighbors 0..* PMS Manager BayesianPredClient TCP/IP Sockets Disconnection Signalling Process Management System Prediction Manager Manager Figure 5.14: The components of the actual implementation. components, we will have an output vector of this shape: ¡ ¢ 0 0 ... 1 ... 2 ... k − 1 (5.23) Thus for two different devices ei , ej we have only to test, using the TEST CONNECTION program, if they have the same value in the vector (5.23), It will give us a confidence about the probability of being still connected in the next time period. 5.4.3 Technical Details We implemented the Bayesian algorithm on actual devices. We coded in MS Visual C# .NET as it enables to write applications once and deploy them on any device for which a .NET framework exists (PCs and PDAs included). In this section, we describe the technical details of packages and classes for implementing the Bayesian algorithm. We can identify two sides in the implementation as described in Figure 5.14: the code running on the coordinator device, which realizes the prediction, and that on the generic peers sending information about neighbors to the coordinator. The code of generic peers is conceptually easy. It is basically composed of two modules: it.uniroma1.dis.Octopus. We tested our algorithm by the octopus virtual environment described in Section 5.5. octopus is intended to emulate 5.4. DISCONNECTION PREDICTION IN MANETS 113 small manet and holds a virtual map of the are where nodes are arranged. This module is intended to query octopus for knowing node neighbors and their distance. BayesianPredClient. This module includes internally two timers. The first timer has a clock T, where T is the same as defined in Figure 6.1. For each clock period, it gets information about neighbors (who and how far they are) by using the it.uniroma1.dis.Octopus module. Then, it arranges such an information in a proper packet, which is sent to coordinator. Upon expiring of the second timer, the client sends a command to octopus to change the position of the node which this device is mapped to. Of course, this timer uses also the it.uniroma1.dis.Octopus module. The core of the coordinator predictor is the BayesianPredServer module. In the specific case, it worthy breaking it down in the composing classes: DistanceServer. This module implements a TCP/IP server to retrieve the neighboring information from peers (sent by them through the module BayesianPredClient). At the same time, it stores retrieved information in the intermediate buffer, which is implemented by the module Buffer. It corresponds to event handler for upon delivering of a tuple from a peer as defined in Figure 6.1. Buffer. It implements the intermediate buffer module, written by the DistanceServer module and read/made empty by PredictiveTimer. This module guarantees synchronized accesses. PredictiveTimer. This is a timer that repeats each T seconds. It implements the event upon expiring of timer as defined in Figure 6.1. Consistently to the pseudo-code, it accesses to the Buffer module to get new information from other peers, as well as the BayesianBuffer module. The latter module stores the information to compute for each couple of nodes the Equations 5.17 e 5.19. This module uses also the it.dis.uniroma1.Octopus module. Indeed, Team Leader is a node itself and it can lead to disconnections. Therefore, it has to ask for neighbors to octopus and predict distances to any other node. BayesianBuffer, BayesianTuple. The BayesianBuffer class handles and ¡ ¢ stores the triple α(i,j) ,β(i,j) ,d(i,j) , each one represented by a BayesianTuple object. A second module composes the coordinator architecture and is named PMSManager. It is in charge of communicating with the proper device management of the SmartPM engine to inform when disconnections are predicted 114 CHAPTER 5. THE SMARTPM SYSTEM by module BayesianPredServer, specifically by using class PredictiveTimer. The device manager, in its turn, will generate an appropriate exogenous action disconnect(·) to inform the IndiGolog engine about the disconnection. 5.4.4 Experiments We conclude the section of the bayesian algorithm for disconnection prediction by giving the result of some experiments performed for the sake of verifying the accuracy of predictions. Therefore, testing does not involve the SmartPM adaptive PMS. In order to test the implementation, we used emulation by octopus (see Section 5.5). This allows to test the feasibility of an actual implementation beyond the theoretical soundness of the approach. octopus keeps a map of virtual areas, which users can design and show by a GUI. Such a GUI enables the users to put in that map the virtual nodes and bind each one to a different real device. Furthermore, users can add possible existing obstacles in a real scenario: ruins, walls and buildings. The test-bed consists of nine machines (PCs and PDAs). In addition to these, there is a further PC that hosts the octopus virtual environment. Each of the nine machines is bound to a different virtual node of octopus’ virtual map. We set the testing virtual map as 400 × 300 meters wide and communication radio-range as 100 meters. At the beginning, nodes are located into the virtual map in a random fashion in order to form one connected component. Afterwards, each S seconds, every node chooses a point (X,Y ) in the map and begin heading towards at a speed of V m/s. Both S and V are Gaussian random variables: the mean and variance are set as, respectively, 450 and 40 seconds for S and 3 and 1.5 m/s for V . The couple (X,Y ) is chosen uniformly at random in the virtual map. Of course, devices used in tests do not move actually: nodes move only in the virtual map. For this purpose, devices send particular commands to a specific octopus socket for instructing node motions. The first set of experiments has been intended to verify which error in percentage is obtained for different values of clock period T . The error here is defined as the gap between the estimated distances den at (n − 1)-th clock cycle and the actual measures dn at n-th clock cycle. The value is scaled with respect to the radio-range: The Figure 5.15(a) shows the outcome for the clock periods equal to 15, 20, 30, and 45 seconds. We have set the parameter u of Equation 5.17 to value 0.5 and performed ten tests per clock period. Every test was 30 minutes long. The results show, of course, that the error percentage grows high as clock period increases. Probably the most reasonable value for real scenarios 5.4. DISCONNECTION PREDICTION IN MANETS 115 25,00% 22,50% The error percentage 20,00% 17,50% 15,00% Best Case Worst Case 12,50% 10,00% 7,50% 5,00% 2,50% 0,00% 15 20 30 45 Polling time (a) The smallest and largest measured error in percentage, changing clock periods. 22,00% 21,53% 21,50% 21,00% The error percentage 20,50% 20,30% 19,94% 20,00% 19,55% 19,50% 19,03% 19,02% 19,00% 18,28% 18,50% 18,09% 18,00% 17,59% 17,44% 17,50% 17,00% 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 u (b) The measured error in percentage, changing the weight of past observations. Figure 5.15: Experiment results of the disconnection prediction module. 116 CHAPTER 5. THE SMARTPM SYSTEM is 30–45 seconds (smaller values are not practically feasible since manets would be probably overloaded by “distance” messages). Please consider the greatest clock period we tested: the error ranges between 24.34% and 26.8% (i.e., roughly 25 meters). Afterwards, in a second tests set, we fixed clock period to 30 seconds, testing for u equal to 0.01, 0.05, 0.1, 0.2, . . . , 0.8. We even tripled the frequency which nodes start moving with. The outcomes are depicted in Figure 5.15(b), where x-axis corresponds to u values and y-axis to the error percentage. The trend is parabolic: the minimum is obtained for u = 0.3 where the error is 17.44% and the maximum is for u = 0.8 where the error is 21.54%. Small values for u mean that the past is scarcely considered whereas large values mean the past is strongly taken into account. This matches our expectation: we get the best results for the intermediate values. That is to say that the best tuning is obtained when we consider the past neither too little nor too much. As far as SmartPM and other possible applications, they can rely on such predictions. Indeed, setting polling time to 30 seconds, we have got errors around 18% for u = 0.3. If range is supposed to be 100 meters, the mean error is around 18 meters. Indeed, if we set γ = 0.75 (i.e., disconnections are predicted when nodes are more than 75 meters far away), we would be sure to predict every actual disconnection. That means no disconnection is not handled, although coordination layer (distributed or centralized, local or global) will be alerted about some false negatives, enacting recovery actions to handle unreal disconnections. 5.5 The OCTOPUS Virtual Environment This section turns to describe an emulator, namely octopus, that we developed with the purpose of testing SmartPM in pervasive scenarios by using manet. Indeed, when developing any software system (including SmartPM), it is needed to study alternatives for the design and implementation of software modules, analyze possible trade-offs and verify whether specific protocols, algorithms and applications are actually working. There exist three way to perform analysis and tests: (i) simulation, (ii) emulation and (iii) field tests. Clearly on-field tests would be the most preferred solution, but they require many people moving around in large areas and repeatability of the experiments would be compromised. Simulation and emulation allow to perform several experiments in a cheaper and more manageable fashion than field tests. Simulator and emulator (i.e., hardware and/or software components enabling simulation or emulation) do not exclude each other. Simulation can be used at an earlier stage: it enables to test algorithms and evaluate their perfor- 5.5. THE OCTOPUS VIRTUAL ENVIRONMENT 117 mance before starting actually implementing on real devices. Simulators, such as NS219 [112], GlomoSim [147] or OMNeT++ [126], allow for several kinds of hardware, through appropriate software modules (such as different device types, like PDAs or smartphones, or networks, like Ethernet or WLAN 802.11). Even if the application code written on top of simulators can be quickly written and performances easily evaluated, such a code must be throw out and rewritten when developers want to migrate on real devices. The emulators’ approach is quite different: during emulation, some software or hardware pieces are not real whereas others are exactly the ones on actual systems. All emulators (for instance, MS Virtual PC or PDA emulator in MS Visual Studio .NET) share the same idea: software systems are not aware of being working on some layers that are partially or totally emulated. On the other hand, performance levels can be worse: operating systems running on Microsoft Virtual PC work slower than on a real PC with the same characteristics. Anyway, software running on emulators can be deployed on actual systems with very few or no changes. On the basis of such considerations, we developed octopus, a complete emulator environment for manets.20 Our emulator is intended to emulate small scale manets (10-20 nodes). Instead of making the whole manet stack virtual, which would require duplication of a large amount of code, we decided to emulate only the physical MAC layer, leaving the rest of the stack untouched. octopus keeps a map of virtual areas that users can show and design by a GUI. Such a GUI enables users to put in that map virtual nodes and bind each one to a different real device. Further, users can add possible existing obstacles in a real scenario: ruins, walls, buildings. The result is that real devices are unaware of octopus: they believe to send packets to destinations. Actually, packets are captured by octopus, playing the role of a gateway. The emulator analyzes the sender and the receiver and takes into account the distances of corresponding virtual nodes, the probability of losses as well as obstacles screening direct view21 . On the basis of such information, it decides whether to deliver the packet to the receiver. The advantage of octopus is that, in any moment, programmers can remove it and perform field manet tests without any kind of change. The aim here is to present octopus and its novel features. Later we investigate existing solutions by taking into account several comparing dimensions, specifically: Minimal initial effort. The time amount necessary to learn and start us19 NS2 enables both simulation and emulation. Here, we refer to NS2’s simulation features. octopus can be downloaded at: http://www.dis.uniroma1.it/∼deleoni/Octopus. 21 We assume whenever two nodes are not directly visible, every packet sent from one node to the other one is dropped. 20 118 CHAPTER 5. THE SMARTPM SYSTEM ing the emulator. Several emulators require to write complex scripts to model channels in detail. We are interested in algorithms for the application layer (and not for the network one), whose performances are only slightly modified by the channel and network parameters. Portability. This feature gets a twofold meaning: from one hand, it means code to be ported in non-emulated environments with few or no changes. From the other hand, we refer portability as the ability to enable, during emulation, the use of several platforms, such as PCs with Linux or Windows and PDAs with Windows CE or PalmOS. Handling of Obstacles. The virtual map, which emulator holds, should allow users to insert obstacles representing walls, ruins, buildings. Virtual nodes should move into the map by passing around without going over such obstacles. Movements should be as realistic as possible, according to well-know patterns. Run-time Event Support. During experiments, destinations of the nodes are required to be defined at run-time, according to the behavior of client applications. Essentially, movements cannot be defined in a batch way; conversely, during emulations, nodes have to interactively inform the emulator about the movement towards given destinations. As of our knowledge, octopus is the first manet emulator enabling clients to interactively influence changes in the topology, upon firing of events which were not defined before the beginning of the emulation. Other emulators require the specification in batch mode, i.e., when the emulation is not yet started, of which and when events fire. In addition, octopus allows to include any kind of device, even PDAs or smartphones, and applications, whereas other approaches support only some platforms or applications coded in specific languages. Finally, octopus supports and handles possible obstacles, packet losses and enhanced movement models, like Voronoi [69]. Please note that, though octopus was built for testing SmartPM, its applicability is broader and comprises all software systems that developers are willing to test on manet without having to write some code that is thrown away after the experiments. 5.5.1 Related Work There exist several mobile emulators in the literature, even if they do not provide the features which we need for our intends. 5.5. THE OCTOPUS VIRTUAL ENVIRONMENT Patched NS2 MobiEmu MNE MobiNet EMWIN NEMAN JEMU Initial effort High Low Medium Medium Low Low Low Code needs changes? No No Yes No No No Yes Platform Linux Linux Linux All All Linux All (only Java) Obstacle handling Yes (Little) No No No No No (but planned) No 119 Run-time support No No No No No No (but planned) No Table 5.1: Summary of features provided by some manet emulators NS2 [112] on its own enables to emulate only wired networks. Anyway, Magdeburg University has developed a NS2’s patch to perform wireless emulation [90]. This patched NS2 version can emulate an arbitrary IEEE 802.11 network by connecting real devices to the emulator machine. This solution actually enables to build applications as if the emulator were not present and to switch from a real and to an emulated environment without any change. Anyway, it gets some drawbacks: (i) client hosts have to be Linux-based and, thus, Windows-based computers or PDAs cannot be used; (ii) it is needed to write complex TCL scripts to set up all emulated aspects of wireless links. So a detailed manet configuration makes sense when people want to emulate protocols of lower layers and it is important to consider several physical parameters. But in the case where we want to test application software (whose performance and correctness is only slightly affected by such parameters), we would like to easily configure emulated manets by a GUI so as to minimize initial effort. Moreover, (iii) NS2, even patched, does not allow to put possible obstacles on the map. At the most, people can define some Voronoi’s paths for node movements to get a similar result, assuming them to be around obstacles. However, we want that two virtual nodes are unable to communicate with each other if they are not in direct sight (e.g., a building is located between them). This is not possible by NS2. Finally, (iv) possible events during emulation are decided at batch time in TCL scripts. So, clients cannot affect any change in nodes topology. Other emulators, such as MobiEmu [148], MNE [88] MobiNet [89], EMWIN [149] and NEMAN [115], show similar problems. EMWIN is one of the few emulators supporting any kind of devices. It works in a distributed fashion: so-called emulator nodes are real machines and physically attached to a fast ethernet switch. Emulator nodes can be installed on whichever platform, PCs or PDAs. Every emulator node represents a sort of virtual hub where up to 8 Virtual Mobile Nodes (VMNs) can be connected. Therefore, EMWIN can emulate any platform (PDAs included), even if it does not handle obstacles, nor it allows to insert new events at run-time. JEmu [48] replaces, for each client, the lowest layer of the communication stack by an emulated one. The emulated stack sends packets to the JEmu 120 CHAPTER 5. THE SMARTPM SYSTEM server. It decides, taking into account certain information (e.g., distance, collision, etc.) whether the actual destination can receive them (even if ostacles are not handled). If so, the emulator forwards these packets to the JEmu client of the receiver. JEmu is totally written in Java so it works only with Java software. Furthermore, applications need many changes if emulated by JEmu. Table 5.1 summarizes the features which we are interested in for octopus. In this table, “Patched NS2” refers to NS2 enhanced by Madgeburg’s patch. Its “Little” obstacle support means that people might define Voronoi’s paths for node movements, assuming paths to be around obstacles. The NEMAN entries referred as “planned” are the features which authors will implement in future releases: specifically they plan in future to handle obstacles and to enable applications to influence at run-time links topology. As you can see from the table, no emulator allows applications to modify at run-time nodes topology. All emulators are based on the same way of using: at design-time, possibly through a GUI, users set up the scenario and a virtual map, binding virtual wireless nodes to real devices, as well as the moments when events fire, such as reaching a given position. Afterwards, applications are running on devices to be emulated. When such a preparation phase finishes, emulation starts and events fire according to the specified schedule. We want to enable the firing of events which were unforeseen during the arrangement of emulation scenarios. In the “real world” the events, such as movements, are caused by users, which interact with applications on board of devices. In general, and especially when testing novel prototypes of application software on top of manets, applications on devices may influence the link topology and nodes motion (e.g., when executing tasks, devices . Therefore batch emulations might be completely useless. Moreover, obstacles are not handled by other emulators. We think that these aspects are important to make emulations realistic. So, we introduced such novel features in octopus. 5.5.2 Functionalities and Models octopus provides functionalities to emulate a wireless local area network by an intuitive and user friendly graphic interface. Main features provided by octopus are described as follows: Integrated graphical scenario editor. Emulation scenario setup is fully managed through a GUI and there is no need to know or use any scripting language at all. This choice has been made to allow the average user, even with only basic network knowledge, to focus mainly on the experimental aspects. Real time node mobility management. In our target experiments, destinations that nodes want to reach, have to be defined at run-time, accord- 5.5. THE OCTOPUS VIRTUAL ENVIRONMENT 121 ing to the behavior of client applications. Essentially, movements cannot be defined in a batch way before emulation starts; on the contrary, during emulations, nodes have to somehow inform emulator about their movements towards a given destination22 . This feature is implemented as a TCP server listening for special “movement” commands sent by software on board of devices. We know this breaks our constraint, which states software on devices do not have to be modified when removing emulator. Anyway, changes, if any, are extremely bounded. Basically they consist in “commenting” invocations to octopus. In this case we could not avoid to violate it: since those events are generated at run-time by software on devices, only such software can send those commands. However, if we do not need this feature, software on devices actually does not have to be modified when the emulator is removed. Packets losses. The emulation system supports user-defined packet loss policies, described by a customizable range based function pd (r). The function pd (r) = k means the probability of a packet sent by a node to be delivered to a node r-meters far is k. octopus supports also a more advanced modelling of packet loss based on the ricean fading, which is also more compliant with the real behaviour of a wireless communication channels. Section 5.5.2 gives more details. Obstacle-aware mobility model. Two movement models are available in octopus: Way-Point and Voronoi [69]. The first one assumes nodes to move straight on the line joining starting and destination point. The latter is more realistic and it takes into account even possible obstacles along the path. The devised algorithm is based by the Voronoi plane tessellation model. Section 5.5.2 gives more detail about this algorithm. Broadcast address emulation support. In some algorithms, we may want peers to broadcast a message to every peer in radio range. Since devices are connected through a real LAN23 , we cannot use the normal broadcast IP address (i.e., x.x.x.255), as it would send the packet to all peers in the network without considering the routing table. We want to broadcast only to virtual neighbors. This issue is resolved by adding a customizable virtual broadcast address instead of the usual one. In the following, some details of octopus are given. 22 This makes sense when behavior of client applications is controlled by humans. octopus and other actual devices have to be deployed in the same LAN in order to have octopus to be able to reach other devices. 23 122 CHAPTER 5. THE SMARTPM SYSTEM Voronoi Mobility Model In order to develop a realistic mobility management, the nodes, living in the emulated environment, move avoiding obstacles. As a matter of fact, humans follow predefined paths to reach a place, such as roads and sidewalks: emulated environments should show similar behaviors. octopus allows to define polygonal obstacles in the virtual map and it generates the graph of all possible segments of paths that do not cross them. The algorithm we have devised derives from Original Voronoi algorithm. Original Voronoi assumes to have a given set P of points in the plane and builds some special lines. Voronoi’s lines describe closed polygons in the plane. Each polygons includes exactly one pi ∈ P of the original points. For each pi , the corresponding polygon contains all points which are closer to pi than other pj ∈ P . Since obstacles are polygons and not simply points, a generalization is needed: 1. Generate a “sampled” version of every obstacle by sampling every side of every obstacle and replacing each one with a sequence of points. The sampling rate can be defined by users. 2. Generate Voronoi diagram by considering points generated at point 1. 3. Remove segments crossing one or more obstacles. That means all segments having at least one of the two vertices inside an obstacle are removed. octopus Voronoi diagram is computed as dual of Delaunay triangulation [55], as it gets actually lower realization complexity. Each segment generated by the Voronoi algorithm represents a possible part of the path that nodes are forced to follow in order to move without crossing an obstacle. Packet Loss Models In order to model the packet losses due to the unreliability of the physical channel, octopus comes with two channel models. A first model relies on the definition of a customized function; a second is based on the ricean fading. Customized packet loss function. The first model concerns the possibility of advanced users to define their own loss function pd (r). The modelling of a loss function tells which is the probability of delivering a packet when the possible receiver is far away from the send r meters. For instance, users can model perfect reliability by defining pd (r) = 1 ∀r ∈ [0, rrange], where rrange is the radio range of the specific transmission technology, e.g., 100 meters for IEEE 802.11b/g and 10 meters for Bluetooth. 5.5. THE OCTOPUS VIRTUAL ENVIRONMENT 123 Since obstacles are present in the virtual area, we assume radio signal do not pass through obstacles; this means that each packet sent by a node to another is surely dropped if the couple of nodes is not in direct sight. In the real world, a wireless device may measure its distance to the others by signal to noise ratio (SNR) techniques: the higher is the physical distance, the higher is the noise in communication channel and, hence, SNR. However, that gives an approximate “communication distance” between two peers: this method does not give the exact physical distance for other factors, such as thin obstacles among devices or other interferences, which can cause noise incrementing. So, communication distance dem and real distance dem may differ. It is too c r difficult (and perhaps even impossible) to emulate physical factors affecting communication distance. Therefore, octopus define communication distance between two nodes a and b as follow: ½ em dr (a, b) if nodes are in direct sight em dc (a, b) = ∞ if at least one obstacle divides a and b The probability to deliver a packet is given by evaluating the user defined loss function where input is dem c . So, the probability to deliver to a node b a packet sent by a node a: pa,b = pd (dem c (a, b)) ∈ [0, 1] When a wants actually to send a packet to b, octopus computes pa,b . Then, it generates a random value x ∈ [0, 1] from an uniform distribution. Finally, octopus follows the rule “if x ≤ pa,b then deliver else drop” to decide whether such a packet has to be delivered or dropped. Ricean Fading. A second way octopus feature to model packet losses is based on the ricean fading extensively used to model wireless channels. Rician fading is a stochastic model for radio propagation anomaly caused by partial cancellation of a radio signal by itself [104]. These anomalies are generate due to small changes the elements in the environments where the wireless signal has to propagate (e.g., objects changing their position, people moving in the area, doors or windows opening or closing). In such situations, the signal arrives at the receiver by different paths in different points in time, and such different “versions” interfere with each other. Here we do not want to detail more how the channel has been modeled to take Ricean Fading into account. It is only worthy telling the reduction of the signal strength (and, consequently, the probability of packet losses) when the distance is r is characterized by the Ricean Distribution: ³ xν ´ x2 +ν 2 r f (r) = 2 · exp− 2σ2 ·I0 2 σ σ where ν and σ are two parameters that are depending on some aspects of the channel of interest. 124 CHAPTER 5. THE SMARTPM SYSTEM Figure 5.16: An OCTOPUS screenshot. MainWindow ConfigurationWindow FunctionWindow GUI OctopusServer 0..N OctopusClientThread Server GUI MobileNode Location <<Singleton>> Octopus GatewayManager SimulationEnvironment <<Library>> JPCap FunctionManager <<Library>> JEP 0..N 0..N Gateway Obstacle 0..N VoronoiGraph DelaunayTriangulation DijkstraPathfinder Environmental Manager Figure 5.17: The OCTOPUS’ class diagram 5.5. THE OCTOPUS VIRTUAL ENVIRONMENT 5.5.3 125 The OCTOPUS Architecture octopus is completely written in Java; in particular, it has been tested both on Windows and on Linux. The octopus architecture relies basically on four modules: Environment Manager. It is the core module and the octopus’ behavior depends on its setting. Users can setup several parameters, such as area size, node positions, radio ranges and obstacles. It also computes the Voronoi’s graph. This module is used by the Gateway module to get information to learn whether a packet has to be delivered or not. Gateway. octopus plays the role of a gateway: this module intercepts all packets sent by nodes involved in the emulation and addresses every network issue. It decides whether to forward by taking into account distance information from the Environment Manager. The Gateway module implements the packet dropping policy described in 5.5.2. Server. This module implements the TCP Server, listening on the 8888 port. Such a server is intended to receive command from applications about events (like movements) to trigger and to reply to queries coming from clients. For instance, a client could ask which are neighbors or which is the distance from them. The communication protocol is a trivial textual protocol. We have also realized a C# module masking socket accesses behind an easy API. GUI. In order to minimize the effort to setup initial scenario and bind virtual nodes to the actual devices, octopus is provided with a Graphical User Interface. It enables to perform any configuration aspect in a friendly fashion, without having users to learn any special scripting language. At design-time users can insert in the virtual area nodes, obstacles and buildings by “point-and-click”, as in any drawing software. GUI allows users to load/save scenarios and settings from/to XML files without having to setup every time scenarios from scratch. At run-time, GUI shows the exact position of virtual nodes in the maps. Figure 5.16 depicts an octopus screenshot: the right panel shows the virtual area, whereas the top part is used at design-time to configure scenarios (nodes, positions, etc.) The left panel describes the nodes mappings and other information, allowing, also, users to change position of nodes by firing manually some events. The gray rectangles and lines represent, respectively, obstacles and Voronoi lines, which nodes follow during motions. If a proper option is active (as it is in the figure), the GUI shows virtual neighbor nodes by a blue line connecting each couple of nodes in radio-range. Another 126 CHAPTER 5. THE SMARTPM SYSTEM option enables the GUI to design a circle centered in every node to show the radio range. Figure 5.17 shows the classes composing octopus and classifies them with respect to modules described above: Environmental Manager. Octopus class is singleton (i.e., at most one instance may exist) and derives from the SimulationEnvironment class. SimulationEnvironment describes the physical environment to be emulated and manages also the mobility aspect by VoronoiGraph class. SimulationEnvironment class contains a list of MobileNode, Location and Obstacle instances in order to get a complete environment description. Since Delaunay triangulation is dual of Voronoi but computationally more efficient, a DelaunayTriangulation class is used by VoronoiGraph class. DijkstraPathFinder class is used by VoronoiGraph to compute a path from a source point to a destination. Gateway. The network level is managed by the GatewayManager class that uses the JPCap library 24 in order to capture and forward LAN packets. To evaluate whether a packet has to be delivered or lost, the GatewayManager is supported by FunctionManager that makes use of the JEP library in order to parse a user-defined loss function. Server & GUI. The octopus TCP/IP server is multi-threaded and implemented by the OctopusServer class. It is multi-threaded as it manages multiple connections at the same time: each connection is handled by a different OctopusClientThread object. 5.6 Summary This chapter has presented the SmartPM system, i.e. a Process Management System which features automatic adaptation using execution monitoring. SmartPM has been built on top of the IndiGolog interpreter developed by University of Toronto and the RMIT university, Melbourne. Section 5.1 has given an overview the interpreter platform and how it can be used for specifying IndiGolog programs. After that, the SmartPM engine has been introduced in detail describing how processes can be concretely coded in IndiGolog (Section 5.2). Programs that describe processes are ideally composed by a part that is mostly static and does not depend on the process and a second part which codes the specific process. The static part codes execution monitoring and planning; it is worthy highlighting that even monitoring and planning 24 JPCAP Web site – http://netresearch.ics.uci.edu/kfujii/jpcap/doc 5.6. SUMMARY 127 is directly representable (and, in fact, practically represented) as IndiGolog procedure. This makes the IndiGolog programs self-contained as regards to process execution and adaptability. The strength of this chapter is that every aspect theoretically described in the previous chapter has been concretely implemented and tested. Finally, we have complemented SmartPM with some external modules to enable its use for emergency management. Specifically, we have presented a technique based on Bayesian filtering for detecting one particular type of change in the execution environment (Section 5.4.2): disconnection of devices of rescue operators. We have also provided SmartPM with a network protocol discussing conceptual and technical aspects of the network traffic (Section 5.3). Finally, this chapter describes octopus, an manet emulator that we have used for testing the disconnection sensor (Section 5.5). Nevertheless, octopus can be useful for experimentation a variety of application domains, i.e. all of those domains in which people want to test the concrete implementation of algorithms for manet and check for the practical feasibility. 128 CHAPTER 5. THE SMARTPM SYSTEM Chapter 6 Adaptation of Independent and Concurrent Process Branches This chapter aims at improving upon what is described in Chapter 4. Indeed, we propose a novel adaptation technique that is more efficient, being able to exploit concurrent branches. In the framework described in Chapter 4, whenever a process δ needs to be adapted, it is blocked and a recovery program consisting of a sequence of actions h = [a1 , a2 , . . . , an ] is placed before them, so as to obtain a new process δ 0 = [h, δ]. The original process may consist of different concurrently running branches δ = δa k δb k . . . k . . . δn , and in case of adaptations all of them are temporarily interrupted. Thus, all branches can only resume the execution after the whole recovery sequence has been executed. Although one knows what branches cannot progress, the framework in Chapter 4 cannot adapt such branches individually. Indeed, it is not known whether the different branches act upon the same variables/conditions. And adapting branches one by one could change some variables/conditions and, hence, “break” other branches, which would be unable to progress. Therefore, we refine here that approach by automatically identifying whether concurrent branches are independent (i.e., neither working on the same variables nor affecting the same conditions). If independent, we can automatically synthesize a recovery process h for δ such that it affects only the interested branch (say δa ), without having to block the other branches: δ 0 = [h; δa ] k δb k . . . k . . . δn In order to apply the proposed technique, some additional effort is required by process designers with respect to the technique of Chapter 4. Indeed, the technique is made possible by annotating processes in a “declarative” way. 129 130 CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES We assume that the process designer can annotate actions/sequences with the goals they are intended to achieve. On the basis of such declared goals, independence among branches can be verified. And, later, a recovery process which affects only the branch of interest, without side-effects on the others, can be synthesized. The framework described here is an extension of paper [33]. Section 6.1 gives an overall idea of the adaptation approach, pointing out the general framework. Section 6.2 presents the sound and complete technique for adapting “broken” processes. Section 6.3 outlines an example stemming from emergency management scenarios, showing the use of the proposed technique. On the contrary of the approach of Chapter 4, we have not yet developed a prototype that exploits the technique proposed here. 6.1 General Framework The general framework which we introduce here is derived from that of Chapter 4. Like the previous, this framework considers processes as IndiGolog programs and conditions are expressed in SitCalc. The actions that compose processes are of four types: Assign(a, x), Start(a, x, p), AckT askCompletion(a, x) and Realise(a, x). Services can execute two actions, readyT oStart(a, x) and f inishedT ask(a, x, q). Parameters a,x, p and q identify, respectively, services, tasks, inputs and outputs. The actions performed by both PMS and services work and are interleaved in the same way as described in Section 4.3. The only difference here is that all assignments of tasks belonging to parallel branches must be done before entering in the branches themselves. Releases can happen only after completing all parallel branches. We explain later the reason of these constraints. These actions are acting on some domain-independent fluents, specifically f ree(·) and enable(·), whose definitions are exactly the same as in Equations 4.2 and 4.5. Like in the approach of Chapter 4, there exist other fluents that denote significant domain properties, whose values are modified by service actions f inishedT ask(·). Similarly, the PMS advances the process δ in the situation s by executing an action, resulting in a new situation s0 with the process δ 0 remaining to be executed. The state is represented as fluents that are defined on situations. The process execution is continuously monitored to detect any deviation between physical reality and virtual reality. The PMS collects data from the environment through sensors (here sensor is any software or hardware component enabling to retrieve contextual information). If a deviation is sensed between the virtual reality as represented by s0 and the physical reality as s0e , the PMS internally generates a discrepancy ~e = (e1 , e2 , . . . , en ), which is a 6.2. THE ADAPTATION TECHNIQUE 131 sequence of actions called exogenous events such that s0e = do(~e, s0 ). Let us consider the case in which the remaining program-process to be executed δ is composed by n parallel sub-processes running concurrently: δ = [~ p1 k . . . k p~n ] where every sub-process p~0 i = [a1,i , . . . , am,i ] is a sequence of simple actions.1 The process designers are assumed to have associated every sub-process pi with the goal Gi that pi is meant to achieve before the process enactment. In addition, the concurrent sub-processes are also annotated with an invariant condition C, expressed in the SitCalc.2 . Independence of these sub-processes is maintained assuming this condition C, which must hold in every situation. Checking for independence is a key point of the adaptation technique proposed in this work (see next section). When a divergence is sensed between the virtual and physical reality because of exogenous events, one or more concurrent processes can be broken (i.e, they no longer achieve the associated goals). For each broken branch pi , the recovery procedure generates a handler hi , which is an action sequence that, when placed before pi , allows p0i = (hi ; pi ) to reach goal Gi and, while remaining independent of every parallel branch pj (with j 6= i) with respect to invariant C. 6.2 The adaptation technique This section describes the approach we use to adapt a process composed of concurrent sequential sub-processes. We first give the formal foundations of our adaptation technique, presenting the results that the “monitor and repair” cycle relies upon. Then, we describe the “monitor and repair” cycle, and discuss the conditions under which the technique is sound and complete. 6.2.1 Formalization In order to capture formally the concept of independence among processes, we introduce some preliminary definitions. Definition 6.1. A ground action a preserves the achievement of goal G by a sequence of ground actions [a1 , . . . , an ] under condition C if executing a at any point during [a1 , . . . , an ] does not affect any of the conditions that are required for the goal G to be achieved by [a1 , . . . , an ]. Moreover, executing a preserves 1 If this assumption does not hold, the approach in Chapter 4 is still usable and we do not propose any improvement. 2 Goals Gi and invariant conditions C are given as arbitrary SitCalc formulas that take a situation as last argument 132 CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES C in any situation. Formally: def PreserveAch(a, G, [a1 , . . . , am ], C) = ∀s.C(s) ⇒ C(do(a, s)) ∧ (G(do([a1 , . . . , am ], s)) ⇒ G(do([a, a1 , . . . , am ], s))) ∧ (G(do([a2 , . . . , am ], s)) ⇒ G(do([a, a2 , . . . , am ], s))) ∧ ... (G(do(am , s)) ⇒ G(do([a, am ], s))) ∧ (G(s) ⇒ G(do(a, s))). We then extend the notion above to the case of action sequences: Definition 6.2. A sequence of ground actions [a1 , . . . , am ] preserves the achievement of goal G by a sequence of ground actions p~ under condition C if every action in [a1 , . . . , am ] preserves the achievement of goal G by p~ under condition C. Formally: def PreserveAch([a ~, C) = 1 , . . . , am ], G, p V ~, C). i:1≤i≤n PreserveAch(ai , G, p Given this, we can then define a notion of independence of processes. Definition 6.3. A set of (sequential) processes p~1 , . . . , p~n where each p~i achieves goal Gi are independent with respect to goals G1 to Gn under condition C if for all i and all j 6= i, p~j preserves the achievement of goal Gi by p~i under condition C. Formally: def IndepProcess([~ V Vp1 , . . . , p~n ], [G1 , . . . .Gn ], C) = pj , Gi , p~i , C). i:1≤i≤n j:1≤j≤n∧j6=i PreserveAch(~ Basically, Definition 6.3 looks at the independence of each and every pair of concurrent (sub-)processes. If we assume that every process is composed by m actions, checking this independence is polynomial in the number of actions and concurrent processes. Specifically it requires µ ¶ n × m2 = O(m2 × n2 ) (6.1) 2 checks of PreserveAch(·) as in Definition 6.1 (one for each pair of actions in the concurrent processes). Firstly, we show that if the concurrent sub-processes are independent and some of them progress, then the parts of them that remain to be executed will always remain independent: 6.2. THE ADAPTATION TECHNIQUE 133 Theorem 6.1. For each i ≤ n and for all suffixes p~0 i of p~i D |= IndepProcess([~ p1 , . . . , p~n ], [G1 , . . . .Gn ], C) ⇒ IndepProcess([p~0 1 , . . . , p~0 n ], [G1 , . . . .Gn ], C). Proof. By Definition 6.3 IndepProcess([~ p1 , . . . , p~n ], [G1 , . . . .Gn ], C) holds iff for all i ∈ [1, n] and all j ∈ [1, n] \ {i}: PreserveAch(~ pj , Gi , p~i , C) Let us fix an arbitrary value for i ∈ [1, n] and j ∈ [1, n] \ i. Let p~j = [a1,j , a2,j , . . . , am,j ]. By Definition 6.2 ^ PreserveAch(~ pj , Gi , p~i , C) ⇔ PreserveAch(ak,j , Gi , p~i , C) (6.2) k:1≤k≤m Let us fix p~0j = [at,j , a2,j , . . . , am,j ] an arbitrary t-long suffix of p~j . By Equation 6.2: ^ PreserveAch(ak,j , Gi , p~i , C) k:t≤k≤m and, consequently: PreserveAch(p~0j , Gi , p~i , C). If PreserveAch(~ pi , Gi , p~j , C) holds, then PreserveAch(~ pi , Gj , p~0j , C) does, as well. After fixing an arbitrary suffix of p~i , namely p~0 , and repeating the steps above, the following holds: i PreserveAch(p~0j , Gi , p~0i , C). Values i and j have been given arbitrarily, as well as the process suffixes. Therefore, for all suffixes p~0i of each p~i and all suffixes p~0j of each p~j s.t. pj 6= pi , the following holds: PreserveAch(p~0j , Gi , p~0i , C). Hence, the thesis is proven. Next, we show that if n processes p~1 , . . . , p~n achieve their respective goals G1 , . . . , Gn and are independent according to Definition 6.3 with respect to a certain condition C, then any interleaving of the execution of the processes’ actions will achieve each process’s goal, and condition C will continue to hold. Let D the current domain theory, i.e. the set of all fluents, all actions acting on those fluents as well as all fluent pre-conditions. Then: 134 CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES Theorem 6.2. D |= IndepProcess([~ p1 , . . . , p~n ], [G1 , . . . .Gn ], C) ⇒ ∀s.G1 (do([~ p1 ], s)) ∧ . . . ∧ Gn (do([~ pn ], s)) ∧ C(s) ⇒ 0 0 [∀s .Do([~ p1 k . . . k p~n ], s, s ) ⇒ G1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )]. Proof. By induction on the total length of all processes. Let |pi | be the length of process pi , i.e. the number of actions of p~i . If all processes are empty sequences, then the result trivially follows (base case). Assume the result holds if the total length of all processes is k. We will show that it must also hold for k + 1 (induction step). Assume C(s), G1 (do([~ p1 ], s)) ∧ . . . P∧ Gn (do([~ pn ], s)) and n 0 Do([~ p1 k . . . k p~n ], s, s ). Processes are such that i=1 |pi | = k + 1. Let us assume branch p~i = [a1,i , a2,i , . . . , am,i ] evolves by executing action a1,i . Let p~0 i = [a2,i , . . . , am,i ] be what is left of p~i after executing a1,i and let s1 = do(a1,i , s). By applying Theorem 6.1: IndepProcess([~ p1 , . . . , p~n ], [G1 , . . . .Gn ], C) ⇒ IndepProcess([~ p1 , . . . , p~i−1 , p~0 i , p~i+1 , . . . , p~n ], [G1 , . . . .Gn ], C). Since IndepProcess([~ p1 , . . . , p~n ], [G1 , . . . .Gn ], C), then for all j 6= i PreserveAch(~ pi , Gj , p~j , C) and in particular PreserveAch(a1,i , Gj , p~j , C). Therefore by Definition 6.1: G(do([~ p1 ], s1 )) ∧ . . . ∧ G(do([~ pi−1 ], s1 )) ∧ G(do([p~0 i ], s1 )) ∧ G(do([~ pi+1 ], s1 )) ∧ . . . ∧ G(do([~ pn ], s1 )) ∧ C(s1 ) In order to complete the proof, it is now to prove that IndepProcess([~ p1 , . . . , p~i−1 , p~0 i , p~i+1 , . . . , p~n ], [G1 , . . . .Gn ], C)∧ G(do([~ p1 ], s1 )) ∧ . . . ∧ G(do([~ pi−1 ], s1 )) ∧ G(do([p~0 i ], s1 )) ∧ G(do([~ pi+1 ], s1 )) ∧ . . . ∧ G(do([~ pn ], s1 )) ∧ C(s1 ) ⇒ 0 0 ~ [∀s .Do([~ p1 k . . . k p~i−1 k p i k p~i+1 k . . . k p~n ], s1 , s0 ) ⇒ 0 G1 (s ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )]. And that holds by the induction hypothesis as |~ p1 | + . . . + |~ pi−1 | + |p~0 i | + |~ pi+1 | + . . . + |~ pn | = k. 6.2. THE ADAPTATION TECHNIQUE 135 Next, building on the previous results, we show that such an “independent handler” ~h can be used for handling a discrepancy ~e breaking a process pi , while allowing all other processes to execute concurrently and achieve their respective goals: Theorem 6.3. Let p~0 i be the process broken by a discrepancy ~e. D |= ∀s̃, ~e.IndepProcess([p~0 1 , . . . , p~0 i−1 , [~h; p~0 i ], p~0 i+1 , . . . , p~0 n ], [G1 , . . . , Gi−1 , Gi , Gi+1 , . . . , Gn ], C) ∧ G1 (do([p~0 1 ], do(~e, s̃))) ∧ . . . ∧ Gi−1 (do([p~0 i−1 ], do(~e, s̃))) ∧ Gi+1 (do([p~0 i+1 ], do(~e, s̃))) ∧ . . . ∧ Gn (do([p~0 n ], do(~e, s̃))) ∧ Gi (do([~h, p~0 i ], do(~e, s̃))) ∧ C(do(~e, s̃)) ⇒ [∀s0 .Do([p~0 k . . . k p~0 k [~h, p~0 ] k p~0 k . . . k p~0 ], 1 i−1 i i+1 do(~e, s̃), s0 ) ⇒ G1 (s0 ) ∧ . . . ∧ Gi−1 (s0 ) ∧ Gi (s0 ) ∧ Gi+1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )]. n Proof. That derives trivially from applying Theorem 6.2 where: def • p~i = [~h, p~0i ]; def • pk = p~0k ∀k : 1 ≤ k ≤ n ∧ k 6= i; def • s = do(~e, s̃). Finally, we show that adding an action sequence ~h for handling a discrepancy ~e that breaks a process pi will preserve process independence. That holds provided ~h is built as independent of every sub-process different from p~i with respect to condition C. Let R(Gi , do(~ pi )) be the situation-suppressed expression for regression Rs (Gi (do(~ pi , s)). Theorem 6.4. Let p~i be the process broken by a discrepancy ~e. ¡ D |= IndepProcess [~ p1 , . . . , p~i−1 , p~i , p~i+1¢, . . . , p~n ], [G , . . . , Gi−1 , Gi , Gi+1 , . . . , Gn ], C ∧ ¡ V1 ~ ~0j , C) ¢ j:1≤j≤n∧j6=i PreserveAch(h, Gj , p s 0 ∧PreserveAch(~ pj , R (Gi (do(~ pi , s)), h, C ⇒ IndepProcess([~ p1 , . . . , p~i−1 , [~h; p~i ], p~i+1 , . . . , p~n ], [G1 , . . . , Gi−1 , Gi , Gi+1 , . . . , Gn ], C). 136 CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES Proof. Let us denote ~h = [h1 , . . . , hm ] and p~i = [a1,i , . . . , al,i ]. Let us fix an arbitrary process p~j 6= p~i without loss of generality. By hypothesis ^ PreserveAch(hk , Gj , p~j , C) (6.3) k:1≤k≤m as well as ^ PreserveAch(ak,i , Gj , p~j , C) (6.4) k:1≤k≤l From Equations 6.3 and 6.4, it results by Definition 6.2: PreserveAch([~h; p~i ], Gj , pj , C) (6.5) Let se be the situation after the occurrence of a discrepancy ~e. Handler ~h is built such that: ¡ ¢ ∃s̃.Rs̃ Gi (do(~ pi , s̃)) ∧ do(h, se ) = s̃ ∧ Gi (do(~ pi , s̃)) In the light of this: PreserveAch(p~j , R(Gi (do(~ pi ))), ~h, C) ⇒ PreserveAch(p~j , Gi , ~h, C) Therefore, since by the hypothesis of PreserveAch(p~j , Gi , p~i , C), the following holds: processes PreserveAch(p~j , Gi , [~h; p~i ], C) independence (6.6) Since Equations 6.5 and 6.6 are true for every process pj and by the hypothesis of independence of all processes, the thesis holds. 6.2.2 Monitoring-Repairing Technique On the basis of the results in the previous section, we propose in Figure 6.1 an algorithm for adaptation. This algorithm, which is meant to run inside the PMS, relies on 2 arrays giving information about the status of the n processes concurrently running: whether each is completed or not and, in case of completion, whether successfully or unsuccessfully. Initially every element of both arrays is set to false. Routine monitor relies on every process pi sending a message to the PMS when it either terminates successfully (message successf ullycompleted(i)) or an exception is sensed such that such pi can no longer terminate successfully3 3 That is D 2 Rsnow (Gi (do(pi , snow ))) where snow is the current situation after sensing a discrepancy. 6.2. THE ADAPTATION TECHNIQUE completed[]: array of n elements succeeded[]: array of n elements initially() 1 for i ← 1 to n 2 do completed[i] ← false 3 succeeded[i] ← false SUB monitor([p1 , . . . , pn ], [G1 , . . . , GN ], C, si ) 1 if (¬IndepP rocess([p1 , . . . , pn ], [G1 , . . . , GN ], C)) 2 then throw exception 3 while (∃i.¬completed[i]) 4 do m ← waitForMessage() 5 if m = successf ullyCompleted(i) 6 then completed[i] ← true 7 succeeded[i] ← true 8 if m = exception(ie , se ) 9 then h ← buildHandler(ie , se , [p1 , . . . , pn ], [G1 , . . . , Gn ], C) 10 if h = fail 11 then completed[ie ] ← true 12 throw exception 13 else 14 pie ← [h; pie ] 15 start(pie ) FUNCTION buildHandler(ie , se , [p1 , . . . , pn ], [G1 , . . . , Gn ], C) 1 for i ← 1 to n 2 do pi ← remains(pi , se ) 3 h ← planByRegres(R(Gie , do(~ pie ))), 4 se , [p1 , . . . , pn ]/pie , [G1 , . . . , Gn ]/Gie , C) FUNCTION planByRegres(Goalh , si , [p1 , . . . , pn−1 ], [G1 , . . . , Gn−1 ], C) 1 if D |= Goalh (si ) 2 then return nil 3 else a ← chooseAction(Goalh , si , [p1 , . . . , pn−1 ], 4 [G1 , . . . , Gn−1 ], C) 5 if a = nil 6 then 7 return fail 8 else 9 h0 ← planByRegres(Rs (Goalh (do(a, s))), 10 si , [p1 , . . . , pn−1 ], [G1 , . . . , Gn−1 ], C) 11 if h = fail 12 then return fail 13 else return [h0 ; a] FUNCTION chooseAction(Goalh , si , [p1 , . . . , pn−1 ], [G1 , . . . , GN ], C) 1 choose an action a s.t. {∃s(¬Goalh (s) ∧ Goalh (do(a, s)))}∧ 2 ∀i.1 ≤ i ≤ (n − 1) ⇒ P reserveAch(a, Gi , pi , C) 3 ∧ P reserveAch(pi , Goalh , a, C) 4 return a //even nil if there exists no selectable action Figure 6.1: Pseudo-Code for the adaptation technique. 137 138 CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES (message exception(ie , se ) where ie is the “broken” process and se is the resulting situation after the discrepancy occurrence). We assume that the situation representing the current state in the real word is known and that we have complete knowledge of the fluents in that situation. Moreover we assume that in every situation we can get access to the fluent values in every past situation.4 Finally, we assume, as well, that every process pi is consuming and reducing its size with the execution of tasks. Process pi denotes always the part of process that still needs to be executed: the parts already execute are cut out. The routine is applicable if all processes are independent of each other. Therefore, before starting its monitoring and repairing, it checks whether the process independence assumption holds (lines 1,2). If not, it throws an exception, assuming that in this case an alternative and more intrusive approach would be used [40]. Later on, in the “monitor and repair” cycle, we listen for arriving messages (line 4). If the message concerns the successful completion of a sub-process, then the arrays are updated accordingly (lines 5-7). Otherwise, the message is about a sub-process pie that has been broken by a discrepancy. pie is implicitly halted and we call function buildHandler to search for an adaptation handler h. If such a handler h is found, it is prefixed to the broken process pie , which becomes (h; pie ) (line 14). Finally, the adapted process is started again (line 15). How does the buildHandler function synthesize this handler? Lines 1-2 update all processes pi so that they represent the subparts that remain to execute. Then, the function invokes a regression planner (line 3) [111, 52], which searches for a plan backwards from the goal description. Specifically, the regression planner tries to generate a sequential plan that, starting from the current situation se , arrives at some situation sh such that pie can be executed again and achieve Gie , i.e. Rsh (Gi (do(pie , sh ))). The regression planning procedure planByRegression recursively and incrementally builds a plan5 checking that every selected action is independent of each pj (with j 6= ie ) with respect to invariant condition C. Indeed, Theorems 6.4 and 6.3 ensure that if the handler only includes actions that are independent of each pj (with j 6= ie ), then, for all possible interleavings, process (h; pie ) will achieve its goal Gi and every other process pj with j 6= ie will continue to achieve its goals Gj . Observe that Theorem 6.1 ensures if processes were originally independent regarding their respective goals and no exceptions are raised, as they evolve, they remain independent. The technique proposed in this chapter is proved to be sound and complete 4 5 This could be done by logging and storing them in a repository. Here we assume that plans are returned in form of IndiGolog programs. 6.2. THE ADAPTATION TECHNIQUE 139 if exactly one branch is broken and needs to be recovered. Conversely, if a discrepancy breaks many processes, say k, we assume to repair them one by one till the k-th, i.e. the i-th is repaired when the first i − 1 have been already repaired. Of course, this approach is greedy: we repair the i-th without considering the next k − i branches to be recovered; in addition, the sequence of repairing process is arbitrary. For instance, there might be different choices to repair a given i-th branch. Some of them would make impossible to repair one the next k − i branch, whereas other choices would not. Since the technique does not take into account the next branches, all of choices would be equivalent, and, hence, one might be done such that some branches are not repairable anymore. Or, even, choosing a certain repairing sequence allows to repair, whereas others do not. That is why the technique is only sound for multiple breaks: it could not find a repairing plan, even if it does exist. More formally, let ~e be a discrepancy which breaks k processes, namely p1 , p2 , . . . , pk , and let s̄ be the situation before the discrepancy has occurred. Then: Theorem 6.5 (Soundness). If the algorithm in Figure 6.1 produces handlers: ~h1 = buildHandler(1, do(~e, s̄), [~ p1 , p~2 . . . , p~n ], [G1 , . . . , Gn ], C) ~h2 = buildHandler(2, do(~e, s̄), [(~h1 ; p~1 ), p~2 . . . , p~n ], [G1 , . . . , Gn ], C) ... ~hk = buildHandler(k, do(~e, s̄), [(~h1 ; p~1 ), (~h2 ; p~2 ), . . . , (~hk−1 , p~k−1 ), p~k , . . . , p~n ], [G1 , . . . , Gn ], C) and then and IndepProcess([~ p1 , . . . , p~n ], [G1 , . . . , Gn ], C)∧ G1 (do(~ p1 , s̄)) ∧ . . . ∧ Gn (do(~ pn , s̄)) ∀s0 .Do([(~h1 ; p~1 ) k . . . k (~hk ; p~k ) k p~k+1 k p~n ], do(~e, s̄), s0 ) ⇒ G1 (s0 ) ∧ . . . ∧ Gi−1 (s0 ) ∧ Gi (s0 ) ∧Gi+1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )] IndepProcess([(~h1 ; p~1 ), (~h2 ; p~2 ), . . . , (~hk ; p~k ), p~k+1 , . . . , p~n ]), [G1 , . . . , Gk , Gk+1 , . . . , Gn ], C). Proof. The adaptation algorithm will repair firstly process p~1 since we are assuming that it is enqueued as first. Let h1 be the handler produced by routine planByRegres to repair it. Handler ~h1 is built such that V ~ ~i ) i:2≤i≤n PreserveAch(h1 , Gi , p ∧PreserveAch(~ pi , R(G1 , p~1 )), ~h1 ) 140 CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES Since for hypothesis there holds IndepProcess([~ p1 , . . . , p~n ], [G1 , . . . , Gn ], C), by Theorem 6.4 that follows: IndepProcess([(~h1 ; p~1 ), p~2 , . . . , p~n ]), [G1 , . . . , Gn ], C) (6.7) Let ~h2 be the handler produced by routine planByRegres to repair p~2 . Handler ~h2 is built such that PreserveAch(~h2 , G1 , (~h1 ; p~1 ))∧ PreserveAch((~h1 ; p~1 ), R(G2 , do(~ p2 )), ~h2 ) V ~ ~i ) i:3≤i≤n PreserveAch(h2 , Gi , p ∧PreserveAch(~ pi , R(G2 , do(~ p2 )), ~h2 ) From Equation 6.7 and by Theorem 6.4, that follows: IndepProcess([(~h1 ; p~1 ), (~h2 ; p~2 ), . . . , p~n ]), [G1 , . . . , Gn ], C). Therefore, after repairing p~k , we obtain: IndepProcess([(~h1 ; p~1 ), (~h2 ; p~2 ), . . . , (~hk ; p~k ), p~k+1 , . . . , p~n ]), [G1 , . . . , Gk , Gk+1 , . . . , Gn ], C). (6.8) For all i ∈ [1, k], hi has been built such ¡ as do(hi , do(~e, s))¢ takes to any situation s̃ where Rs̃ (G(pi , s̃)) and, hence, G do([~h1 ; p~1 ], do(~e, s)) . From this result and from Equation 6.8, Theorem 6.2 proves: ∀s0 .Do([(~h1 ; p~1 ) k . . . k (~hk ; p~k ) k p~k+1 k p~n ], do(~e, s̄), s0 ) ⇒ G1 (s0 ) ∧ . . . ∧ Gi−1 (s0 ) ∧ Gi (s0 ) ∧Gi+1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 )] Theorem 6.6 (Completeness). If ¡ ∃~h. ∀s0 .Do([~ p1 k . . . k p~i−1 k (~h; p~i ) k p~i+1 k . .¢. k p~n ], 0 do(~e, s̄), s ) ⇒ G1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 ) and IndepProcess([p1 , . . . , pi−1 (~h; pi ), , pi+1 , . . . , pn ], [G1 , . . . , Gn ], C) then buildHandler returns a repairing handler: ~h = buildHandler(i, do(~e, s̄), [p1 , . . . , pn ], [G1 , . . . , Gn ], C) 6.2. THE ADAPTATION TECHNIQUE 141 such that ∀s0 .Do([~ p1 k . . . k p~i−1 k (~h; p~i ) k p~i+1 k . . . k p~n ], do(~e, s̄), s0 ) ⇒ G1 (s0 ) ∧ . . . ∧ Gn (s0 ) ∧ C(s0 ) (6.9) and indepP rocesses([p1 , . . . , (h; pi ), pn ], [G1 , . . . , Gi , . . . , Gn ], C) (6.10) Proof. Let us assume p~i be the process broken by a discrepancy ~e. Let se = do(~e, s) be the resulting situation. If there exist handlers meeting the requirements of the hypothesis, the invocation of procedure buildHandler returns one of them, namely ~h: ~h = buildHandler(i, do(~e, s̄), [p1 , . . . , pn ], [G1 , . . . , Gn ], C) if and only if planByRegress returns ~h: ~h = planByRegres(R(Gi , do(~ pi ))), se , [p1 , . . . , pn ]/pi , [G1 , . . . , Gn ]/Gi , C) Therefore, we move to prove that if any handler ~h exists such that it achieves any goal Goalh and hypotheses 6.9 and 6.10 hold, then it can be returned by planByRegres. We prove by induction on the length of ~h. Let us assume that there may be several plans h~1 , . . . , h~n . Let fix an arbitrary one, namely ~h, whose length is k: ~h = [h1 , . . . , hk ] Let denote Goalh = R(Gi , do(~ pi )). If k = 0 (base step), then Goalh already holds, and, hence, planByRegres returns nil (line 2 of the procedure). Otherwise, let us assume by induction hypothesis that the plan h~0 = [h1 , . . . , hk−1 ], being (k − 1)-long, can be returned by invoking: ~h = planByRegres(Rs (Goalh (do(a, s))), si , [p1 , . . . , pn−1 ], [G1 , . . . , Gn−1 ], C) In fact, since do(h~0 , se ) = se and Goalh (do(a, se)) holds, regression Rs (Goalh (do(a, s))) is true in se. 142 CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES In order that h~0 could be the returned handler [h0 , a], it is to prove a can be coinciding with hk . Please note that, given that in situation se = do(h~0 , se ) regression Rs (Goalh (do(a, s))) holds, the following must also hold: ¬Goalh (e s) (6.11) In addition, since handler ~h is such that Goalh (do(~h, se )) is true, hk has to be the action turning formula Goalh (·) to true: Goalh (do(hk , se)) (6.12) Moreover, since h is one of the handler meeting the hypothesis: IndepProcess([p1 , . . . , pi−1~h, pi+1 , . . . , pn ], [G1 , . . . , Gn ], C) then ∀j. (j 6= i) ⇒ P reserveAch(hk , Gj , pj , C)∧ P reserveAch(pj , Goalh , hk , C) (6.13) Therefore, action hk meets the constraints in Equations 6.11, 6.12 and 6.13, according to which procedure chooseAction picks an action a. In the light of this, hk is, in fact, an action returnable by such a procedure. Note that the above procedure becomes easily realizable in practice if the PMS works in a finite domain (e.g., using discretized positions based on actual GPS positions) and propositional logic is sufficient. In fact, one can use an off-the-shelf regression planner such those mentioned in [111, 52]. 6.3 An Example from Emergency Management In this section, we discuss an example of adaptation in a process concerning emergency management. A team is sent to the affected area. Actors are equipped with PDAs which are used to carry on process tasks and communicate with each other through a Mobile Ad hoc Network (manet). A possible process for coping with the aftermath of an earthquake is depicted in Figure 6.2. Some actors are assessing the area for dangerous partiallycollapsed buildings. Meanwhile others are giving first aid to the injured people and sending information about required ambulances and filling in a questionnaire about the injured people, which are required by the headquarter. The corresponding IndiGolog program is depicted in . In the activity diagram in Figure 6.2, we have labeled every task with the fluents (in situation-suppressed form) that hold after the successful task execution. 6.3. AN EXAMPLE FROM EMERGENCY MANAGEMENT 143 It is worthy noting that, as already told in Section 6.1 all assignments in the IndiGolog program in Figure 6.3 are made together before the parallel branches are started. The reason steps from the fact that we are willing to make all branches independent. Indeed, if the assignments were done inside the branches themselves, then the assignment made in a branch might be depending on other assignment made in some parallel branches. For the sake of simplicity, we detail only those fluents that we are used later in the example (see later in this section for the successor state axioms): proxy(w, y, s) is true if in situation s, service y can work as proxy for service w. In the starting situation S0 for every pair of services w, y we have proxy(w, y, S0 ) = false, denoting that no proxy has yet been chosen for w. at(w, p, s) is true if service w is located at coordinate p = hpx , py , pz i in situation s. In the starting situation S0 , for each service wi , we have at(wi , pi , S0 ) where location pi is obtained through GPS sensors. inf oSent(d, s) is true in situation s if the information concerning injured people at destination d has been successfully forwarded to the headquarter. For all destinations d inf oSent(d, S0 ) = false. evaluationOK(s) is true if the photos taken are judged as having a good quality, with evaluationOK(S0 ) = false. assisted(z, s) is true if the injured people in area z have been supported through a first-aid medical assistance. We have that for all z assisted(z, S0 ) = false. We assume that the process designers have defined the following goals of the three concurrent sub-processes (as required by the framework): def G1 (s) = Q1Compiled(A, s) ∧ EvaluationOK(s) def G2 (s) = assisted(A, s) def G3 (s) = Q2Compiled(A, s) ∧ inf oSent(A, s) In addition, we are using in this example the invariant condition C(s) = true for all situations s, meaning that we are not using any assumption to show process independence. Before formally specifying the aforementioned fluents, we define some abbreviations: available(w, s): which states a service w is available if it is connected to the coordinator device (denoted by Coord) and is free. 144 CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES connected(w, z, s): which is true if in situation s the services w and z are connected through possibly multi-hop paths. neigh(w, z, s): which holds if the services w and z are in radio-range in the situation s. Their definitions coincide with those given in the example of Chapter 4 and, hence, we do not duplicate them here. The successor state axioms for the aforementioned fluents are as follows: at(w, ¡ loc, do(t, s)) ⇔ 0 ¢ at(w, loc, s) ∧ ∀loc .t 6= f inishedT ask(w, Go, loc0 ) ¡ ¢ ∨ ¬at(w, loc, s) ∧ t = f inishedT ask(w, Go, loc) ∧ started(a, Go, loc, s) ¡ proxy(w, y, do(x, s)) ⇔ proxy(w, y, s)∧ ¢ ¡∀q.x 6= f inishedT ask(w, FindProxy, q) ∨ ∃w.x = f inishedT ask(w, FindProxy, ∅)∧ ¢ ∃p.started(w, FindProxy, p, s ∧ isClosestAvail(w, y, s)) inf oSent(loc, do(t, s)) ⇔ inf oSent(loc, s) ¡ ∨ ∃w.t = f inishedT ask(w, SendToHeadquarter, hloc, OKi) ∧at(w, d, s) ∧ ∃y.(proxy(w, y, z) ∧ neigh(w, ¢ y, s)) ∧∃p.started(a, SendToHeadquarter, p, s) evaluationOK(do(t, s)) ⇔ evaluationOK(s) ¡ ∨ ∃w.t = f inishedT ask(w, Evaluate,¢OK) ∧ ∃p.started(a, Evaluate, p, s) ∧photoT aken(s) ∧ n ≥ threshold assisted(z, do(x, s)) ⇔ assisted(z, s) ¡ ∨ ¬assisted(z, s) ∧ ∃p.started(a, assisted, p, s)∧ ¢ ∃w.x = f inishedT ask(w, assisted, ∅) ∧ at(w, z, s) In the above, we use the abbreviation isClosestAvail(w, y, s), which holds if y is the geographically closest service to w that is able to act as proxy; if there is no available proxy in w’s radio range, y = nil: def ¡ isClosestAvail(w, y, s) = available(y) ∧ at(w, pw , s)∧ at(y,¡py , s) ∧ provide(y, proxy) ∧ neigh(w, y, s) ∧∀z. z 6= y ∧ available(z) ∧¢¢provide(z, proxy) ⇒ ¡k pz − pw k>k¡py − pw k ∨ y = nil ∧ ¬∃z. ¢¢ available(z) ∧ provide(z, proxy) ∧neigh(w, z, s) 6.4. A SUMMARY 145 Automatic Adaptation: an example. To show how our automatic adaptation technique is meant to work, let us consider an example of discrepancy and a handler plan to cope with it. Firstly, let us consider the case where the process execution reaches line 6 of procedure ReportAssistanceInjured. At this stage, a proxy has been found, namely wpr , and the information about injured people is about to be sent to the headquarter to request a sufficient number of ambulances to take them to a hospital. Let s̄ be the current situation. Of course, wpr is selected to be in radio range of actor w4 : neigh(w4 , wpr , s̄). Let’s assume now that wpr moves away for any reason to a position p0 . This corresponds to a discrepancy ~e = [ Assign(wpr , Go); Start(wpr , Go, p0 ); AckT askCompletion(wpr , Go); Release(wpr , Go); ] After the internal execution of the discrepancy, the new current situation is se = do(~e, s̄) where neigh(w4 , wpr , s¯e ) = false. Therefore, action f inishedT ask(wpr , InformInjured()) does not make inf oSent(A) become true as it was supposed to. Since sub-processes EvalTake, AssistInjured and ReportAssistanceInjured are independent, the latter, which is affected by the discrepancy, can be repaired without having to stop the other processes. The goal given to the regression planner is Goalh = photoT aken() ∧ inf oSent(A) ∧ Q1Compiled(A) in situationsuppressed form. The planner returns possibly the following plan, which achieves Goalh while preserving independence: Start(wpr , Go, A); AckT askCompletion(wpr , Go); Start(w4 , InformInjured, A); AckT askCompletion(w4 , InformInjured); Adaptation can be performed by inserting it after line 5 of procedure ReportAssistanceInjured, ensuring that it can achieve its goal without interfering with the other sub-processes. 6.4 A summary In this chapter we have proposed a sound and complete technique for adapting sequential processes running concurrently. Such a technique improves, under 146 CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES the assumption of independence of the different processes, that proposed in Chapter 4, while adopting the same general framework based on planning techniques in AI. In the previous approach, whenever a process needs to be adapted, the different concurrently running branches are all interrupted. And a sequence of actions h = [a1 , a2 , . . . , an ] is placed before them. Therefore, all of the branches could only resume after the execution of the whole sequence. The adaption technique proposed here works on identifying whether concurrent branches are independent (i.e., neither working on the same variables nor affecting some conditions). And, if independent, it can synthesize a recovery process that affects only the branch of interest, without having to block the other branches. Concurrency is a key characteristic of business processes, such that the independence of different branches is likely to yield benefits in practice. The proposed technique is made possible by annotating processes in a “declarative” way. We assume that the process designer can annotate actions/sequences with the goals they are intended to achieve, and on the basis of such declared goals, independence among branches can be verified, and then a recovery process which affects only the branch of interest, without side-effects on the others, is synthesized. 6.4. A SUMMARY 147 Figure 6.2: A possible process to be carried on in disaster management scenarios 148 CHAPTER 6. ADAPTATION OF CONCURRENT BRANCHES Main() 1 π.w0 [available(w0 ) ∧ ∀c.require(c, QuestBuildings) ⇒ provide(w0 , c)]; 2 Assign(w0 , QuestBuildings); 3 π.w1 [available(w1 ) ∧ ∀c.require(c, TakePhoto) ⇒ provide(w1 , c)]; 4 Assign(w1 , TakePhoto); 5 π.w2 [available(w2 ) ∧ ∀c.require(c, EvalPhoto) ⇒ provide(w2 , c)]; 6 Assign(w2 , EvalPhoto); 7 π.w3 [available(w3 ) ∧ ∀c.require(c, FirstAid) ⇒ provide(w3 , c)]; 8 Assign(w3 , FirstAid); 9 π.w4 [available(w4 ) ∧ ∀c.require(c, InformInjured) ⇒ provide(w4 , c)]; 10 Assign(w4 , InformInjured); 11 π.w5 [available(w5 ) ∧ ∀c.require(c, InjuredQuest) ⇒ provide(w5 , c)]; 12 Assign(w5 , InjuredQuest); 13 (EvalTake(w0 , w1 , w2 , Loc) k AssistInjured(w3 , Loc) k 14 ReportAssistanceInjured(w4 , w5 , Loc)); 15 Release(w0 , QuestBuildings); 16 Release(w1 , TakePhoto); 17 Release(w2 , EvalPhoto); 18 Release(w3 , FirstAid); 19 Release(w4 , InformInjured); 20 Release(w5 , InjuredQuest); 21 Release(w6 , SendByGPRS); 22 π.w6 [available(w6 ) ∧ ∀c.require(c, SendByGPRS) ⇒ provide(w6 , c)]; 23 Assign(w6 , SendByGPRS); 24 Start(w6 , SendByGPRS, ∅); 25 AckT askCompletion(w6 , SendByGPRS); 26 Release(w6 , SendByGPRS); EvalTake(w0 , w1 , w2 , Loc) 1 Start(w0 , QuestBuildings, Loc); 2 AckT askCompletion(w0 , QuestBuildings); 3 Start(w1 , Go, Loc); 4 AckT askCompletion(w1 , Go); 5 Start(w1 , TakePhoto, Loc); 6 AckT askCompletion(w1 , TakePhoto); 7 Start(w2 , Evaluate, Loc); 8 AckT askCompletion(w2 , Evaluate); AssistInjured(w3 , Loc) 1 Start(w3 , Go, Loc); 2 AckT askCompletion(w3 , Go); 3 Start(w3 , FirstAid, Loc); 4 AckT askCompletion(w3 , FirstAid); ReportAssistanceInjured(w4 , w5 , Loc) 1 Start(w4 , Go, Loc); 2 AckT askCompletion(w4 , Go); 3 Start(w4 , FindProxy, Loc); 4 AckT askCompletion(w4 , FindProxy); 5 Start(w4 , InformInjured, Loc); 6 AckT askCompletion(w4 , InformInjured); 7 Start(w5 , Go, Loc); 8 AckT askCompletion(w5 , Go); 9 Start(w5 , InjuredQuest, Loc); 10 AckT askCompletion(w5 , InjuredQuest); Figure 6.3: The IndiGolog program corresponding to the process in Figure 4.2 Chapter 7 Some Covered Related Topics In the previous chapters we have discussed a technique to deal with the issue of automatically adapting process instances when some exogenous events produce discrepancies that make impossible them to be completed. We have described, as well, a concrete implementation and, hence, shown the feasibility of the approach. The approach implemented so far can be summarized at high level as in Figure 7.1(a) where the PMS engine of SmartPM is deployed on a certain device, usually an ultra mobile laptop (such as Asus Eep PC 1 ), and, then, other devices exist in which some services, human controlled and/or automatic, are installed. We envision to move from a centralized approach to a distributed one, where the workflow (i.e., process) execution is not orchestrated by a sole node but all devices contribute distributively to carry on the workflow. Indeed, generally speaking, devices may not be powerful enough as well as they might not be continuously connected to this central orchestrator. The different local PMSs coordinate through the appropriate exchange of messages, conveying synchronization information and the outputs of the performed actions by the services. The technique described applies successfully the “Roman Model” for Service Composition [18] to workflow management. Section 7.1 wants to be a first step towards the conceptualization of decentralized orchestrators. In the approach proposed, we do not need the fine granularity to specify explicitly the PMS actions assign, start, ackTaskCompletion, release. Therefore, we model workflow as final-state automata, and the sequence of PMS actions as transactions (i.e., arcs) of the automata. At the same time, the distributed approach of Section 7.1 aims at trying to find a solution for the challenging issue of synthesizing the schema of a workflow according to the available services. Typically, like widely demonstrated in WORKPAD as well as many 1 http://eeepc.asus.com/ 149 150 CHAPTER 7. SOME COVERED RELATED TOPICS (a) The centralized approach (b) The distributed approach Figure 7.1: The centralized vs distributed approach to process management (Some arrows not shown to preserve the figure readability) 7.1. AUTOMATIC WORKFLOW COMPOSITION 151 other research projects 2 , generic workflows for pervasive scenarios are designed a-priori and, then, just before a team is dropped off in the operation field, they need to be customized on the basis of the currently available services offered by the mobile devices of operators actually composing the team. Both in the centralized and distributed approach, some services are humanbased, in the sense that the work performed by services is done by humans with the support of a so-called work-list handler. So far, we have developed only a proof-of-concept implementation for the sake of testing the SmartPM features. We aim in the near future at providing SmartPM with full-fledged work-list handlers. We envision two two types of work-list handler: a version for ultra-mobile PCs and a lighter version for PDAs, “compact” but providing less features. First steps have been already done in these directions. As far as the PDA version, we have already implemented a version during this thesis for the ROME4EU Process Management System, a previous valuable attempt to deal with unexpected deviations (see [7]). A screen shot of the system is also depicted in Figure 5.6(d). We plan to port it to the SmartPM system. As far as the more powerful version for ultra-mobile PCs, Section 7.2 illustrates a possible version. It refers to a new visual tool that can aid users in selecting the “right” task among a potentially large number of tasks offered to them. Indeed, in many scenarios, many different processes need to be carried on at the same time by the same organisation. Therefore, participants can be confronted with a great deal of processes and, hence, tasks among which to pick the next one to work on. Many tools are presenting assigned tasks as a mere list without giving any contextual information helpful for such a choice. At the moment, the tool works in concert with the YAWL Process Management System, but the framework is applicable to any PMS, in general, and to SmartPM, in particular. 7.1 Automatic Workflow Composition This section proposes a novel technique, sound, complete and terminating, able to automatically synthesize such distributed orchestrators, given (i) the target generic workflow to be carried out, in the form of a finite transition system, and (ii) the set of behaviorally-constrained services, again in the form of (non deterministic) finite transition systems. This technique deals with the problem of synthesizing the distributed orchestrator in presence of services with constrained behaviours. 2 cfr. SHARE (http://www.share-project.org), EGERIS (http://www.egeris.org), ORCHESTRA (http://www.eu-orchestra.org), FORMIDABLE (http://www.formidable-project.org) 152 CHAPTER 7. SOME COVERED RELATED TOPICS This issue has some similarities with the one of automatically synthesizing composite services starting from available ones [93, 95, 80, 128, 65, 9]. In particular, [9] considers the issue of automatic composition in the case in which available services are behaviorally constrained, and [11] in the case in which the available services are behaviorally constrained and the results of the invoked actions cannot be foreseen, but only observable afterwards. All the previous approaches consider the case in which the synthesized orchestrator is centralized. On the other side, the issue of distributed orchestration has been considered in the context of Web service technologies [8, 94, 25], but with emphasis on the needed run-time architectures. Our work can exploit such results, even if they need to be casted into the mobile scenario (in which service providers are less powerful). The remaining of the section is as follows. In Section 7.1.1, the general framework is presented. Section 7.1.2 presents a complete example, in which a target workflow, possible available services and the automatically synthesized orchestrators are shown. Section 7.1.3 presents the proposed technique, and finally Section 7.1.4 presents some discussions and future work. 7.1.1 Conceptual Architecture As previously introduced, we consider scenarios in which a team consists of different operators, each one equipped with PDAs or similar hand-held devices, running specific applications. The interplay of (i) software functionalities running on the device and (ii) human activities to be carried out by the corresponding operator, are regarded as services, that suitably composed and orchestrated form the workflow that the team need to carry out. Such a workflow is enacted, during run-time, by the PMS/orchestrator a.k.a. workflow management system). The service behavior is modeled by the possible sequences of actions. Such sequences can be nondeterministic; indeed nondeterministic sequences stem naturally when modeling services in which the result of each action on the state of the service can not be foreseen. Let us consider as an example, a service that allows taking photos of a disaster area; after invoking the operation, the service can be in a state photo OK (if the overall quality is appropriate), or in a different state photo bad, if the operator has taken a wrong photo, the light was not optimal, etc. Note that the orchestrator of a nondeterministic service can invoke the operation but cannot control what is the result of it. In other words, the behavior of the service is partially controllable, and the orchestrator needs to cope with such partial controllability. Note also that if the orchestrator observes the status in which the service is after an operation, then it can understand which transition, among those nondeterministically 7.1. AUTOMATIC WORKFLOW COMPOSITION 153 possible in the previous state, has been undertaken by the service. We assume that the orchestrator can indeed observe states of the available services and take advantage of this in choosing how to continue in executing the workflow. The workflow is specified on the basis of a set of available actions (i.e., those ones potentially available) and a blackboard, i.e., a conceptual shared memory in which the services provide information about the output of an action (cfr. complete observability wrt. the orchestrator). Such a workflow is specified a-priori (i.e., it encodes predefined procedures to be used by the team, e.g., in emergency management), without knowing which effective services are available for its enactment. The issue is then how to compose (i.e., realize) such a workflow by suitably orchestrating available services. In the proposed scenario, such a composition of the workflow is useful when a team leader, before arriving on the operation field, by observing (i) the available devices and operators constituting the team (i.e., the available services), and (ii) the target workflow the team is in charge of, need to derive the orchestration. At run-time (i.e., when the team is effectively on the operation field), the orchestrator coordinates the different services in order to enact the workflow. As a matter of fact, the orchestrator is distributed, i.e., there is not any coordination device hosting the orchestrator; conversely, each and every device hosts a local orchestrator, which is in charge of invoking the services residing on its own device. The various local orchestrators have to communicate with each other in order to make an agreement on the right sequence of services to be called. The communications among orchestrators and between the local orchestrator and the services are carried out through an appropriate middleware, which offers broadcasting of messages and a possible realization of the blackboard [102]. The blackboard, from an implementation point of view, is also realized in a distributed fashion. 7.1.2 A Case Study Let’s consider a scenario where a disastrous event (e.g., an earthquake) breaks out. The scenario is very similar to those already introduced in other chapters: after giving first assistance to people involved in the affected area, a civil protection’s team is sent to the spot. Team members, equipped with mobile devices, need to document damage directly on a situation map so that following activities can be scheduled (e.g., reconstruction jobs). Specifically their work is supposed to be focused on three buildings A, B and C. For each building a report has to be prepared. Those report should contain: (i) a preliminary questionnaire giving a description of the building and (ii) some photos of the building conditions. Filling questionnaires does not require to stay very close 154 CHAPTER 7. SOME COVERED RELATED TOPICS to buildings, whereas taking photos does. We suppose team to be composed of three mobile services M S1 , M S2 , M S3 , whose capabilities include compiling questionnaires and taking/evaluating building pictures, in addition to a repository service RS, which is able to forward the documents (questionnaires and pictures) produced by mobile units to a remote storage in a central hall. Services can read and write some shared boolean variables, namely {qA,qB,qC,pA,pB,pC,available}, held in a blackboard. Each service has its own capabilities and limitations, basically depending on technological, geographical and historical reasons – e.g., a team member who, in the past, visited building A, makes its respective unit able to compile questionnaire A; a unit close to building B can move there, and so on. Mobile services are described by state-transition diagrams where non-deterministic transitions are allowed. Diagrams of Figures 7.2(a) – 7.2(d) describe, respectively, units M S1 – M S3 and RS. An edge outcoming from a state s is labeled by a triple E[C]/A, where both [C] and A are optional, with the following semantics: when the service is in state s, if the set of events E occurs and condition C holds, then: i) change state according to the edge and ii) execute action A. In this context, a set of events represents a set of requests assigned to the service, which can be satisfied only if the condition (or guard) holds (is true). Actions correspond to writing messages on the blackboard, while the actual fulfillment of requests is implicitly assumed whenever a state transition takes place. In other words, each set of events represents a request for some tasks, which are actually performed, provided the respective condition holds, during the transition. Moreover, blackboard writes can be possibly performed. For instance, consider Figure 7.2(a). Initially (state S0), M S1 is able to serve requests: {compile qB} (compile questionnaire about building B), {read pC} (get photo of building C from repository), {move A} (move to, or possibly around, building A) and {req space} (ask remote storage for freeing some space). In all such cases, neither conditions nor actions are defined, meaning that, e.g., {move A} simply requires the unit to reach, i.e., actually moving to, building A, independently of any condition and without writing anything on the blackboard. After building A is reached (S1), a photo can be taken ({take pA}). A request for this yields a non-deterministic transition, due to the presence of two different outgoing edges labeled with the same event and non-mutually-exclusive conditions (indeed, no condition is defined at all). Note that, besides possibly leading to different states (S2 or S3), a non-deterministic transition may, in general, give raise to different blackboard writes, as it happens, e.g., if a request for {eval pC} is assigned when the service is in state S5. State S2 is reached when, due to lack of light, the photo comes out too dark. Then, only photo modification ({modify pA}, which makes it lighter) is allowed. On the other hand, state S3 (the photo 7.1. AUTOMATIC WORKFLOW COMPOSITION 155 req_space / { available = T } { eval_pC } / { pC = F } S5 { write_qB } / { qB = T } { eval_pC } / { pC = T} { compile_qB } S0 { read_pC } { move_A } S4 { write_pA } [available] / { pA = T } { take_pA } S1 S3 { modify_pA } { take_pA } S2 { modify_pA } { modify_pA, req_space} / { available = T } (a) Mobile Service M S1 { eval_pB } / { pB=F } { eval_pB } / { pB=T } S1 { write_qB } / { qB=T } { compile_qB } S0 { read_pB } { compile_qC } S2 { write_pC } [available] / { pC=T } { move_C } S3 { write_qC } / { qC=T } S4 { take_pC } { move_C } S5 { modify_pC } S6 { modify_pC, req_space } / {available=T} (b) Mobile Service M S2 { eval_pA } / { pA = F } S1 { eval_pA } / { pA = T } { write_qA } / { qA = T } { compile_qA } S0 S2 { read_pA } { write_pB } [available] / { pB = T } { move_B } S3 { take_pB } S4 { modify_pB } (c) Mobile Service M S3 { forward } / {available=T} { commit } / {pA=pB=pC=qA=qB=qC=F} S0 { forward } / {available=F} (d) Repository Service RS Figure 7.2: Mobile services 156 CHAPTER 7. SOME COVERED RELATED TOPICS S0 { [¬qA] / compile_qA, [¬qB] / compile_qB, [¬qC] / compile_qC} S1 { [pA & pB & pC] / commit } { [¬pA] / move_A, [¬pB] / move_B, [¬pC] / move_C } { [¬qA] / write_qA, [¬qB] / write_qB, [¬qC] / write_qC, / forward } S3 { [¬pA] / eval_pA, [¬pB] / eval_pB, [¬pC] / eval_pC } S7 { [¬pA] / read_pA, [¬pB] / read_pB, [¬pC] / read_pC } S6 S2 { [¬pA] / move_A, [¬pB] / move_B, [¬pC] / move_C } S8 { [¬pA] / modify_pA, [¬pB] / modify_pB, [¬pC] / modify_pC, [¬available] / req_space } { [¬pA] / take_pA, [¬pB] / take_pB, [¬pC] / take_pC } S4 { [¬pA] / write_pA, [¬pB] / write_pB, [¬pC] / write_pC, / forward } S5 Figure 7.3: The target workflow is quite fine) gives also the possibility to ask the repository for additional space while photo modification is being performed ({modify pA,req space}). In such case, {available=T} is written on the blackboard, which announces that some space is available in the repository and, thus, additional data can be stored there. Moreover, state S3 allows for serving a {write pA} request, which has the effect of writing the taken photo into the remote storage. Such task can be successfully completed only if there is available space, as required by condition [available], and, in such case, it is to be followed by action {pA=T}, in order to announce the availability, in the storage, of a picture of building A. Now, consider the request for {read pC} outgoing from state S0. Such task gets a photo of building C, if any, from the remote storage, and forces a service transition to state S5. Then, {evaluate pC} can be requested with the aim of checking whether or not the photo captures relevant aspects of building C and consequently accepting or rejecting it. Recall that the photo could be not in the storage. If so, a {pC=F} write is performed. Otherwise, either {pC=T} or {pC=F} can be written on the blackboard, depending on whether the picture is accepted or not. Finally, we complete the description of the service by observing that task {write qB} can be requested in order to write a filled questionnaire in the remote storage, assuming it is small enough to be written without satisfying any additional space condition. Semantics of other actions, e.g. write qA, is straightforward and, consequently, diagrams of units M S2 , M S3 and RS can be similarly interpreted. RS is a service representing an interface between mobile units and the communication channel used for sending data to remote storage. In facts, task 7.1. AUTOMATIC WORKFLOW COMPOSITION 157 forward must be performed by RS whenever a mobile unit is asked for writing (e.g. write pC or write qB) some data. Forwarding means receiving data from mobile services and writing it to remote storage. For security reasons, only mobile services are trusted systems which can ask the storage for freeing space (req space) and can access the storage for reading (e.g., read pC), while sending data can be performed only by the repository service. After each forwarding, it may happen that the storage becomes full. This is why the forward task is non-deterministic and may yield either a {available=T} or a {available=F} write on the blackboard. On the other hand, a mobile service performing a {req space} guarantees that remote storage will free some space, consequently it is deterministic and yields a {available=T} write on the blackboard. Finally, RS is allowed to send the remote storage a commit message, which asks the storage for compressing last received data and consequently makes files no longer available for reading. The goal of the team is to collect both questionnaires and photos about all buildings. In Figure 7.3, a graphical representation of the desired workflow is shown where, initially: (i) all services are assumed to be in state S0 and (ii) blackboard state is {qA=qB=qC=pA=pB=pC=F, available=T}. Edges outcoming from each state are labeled by sets of pairs [C]/T , whith the following semantics: if, in current state, condition (guard) C holds, then task T must be assigned to some service. Hence, each state transition may require, in general, the execution of a set of tasks. Observe that the target workflow is deterministic, that is, no two guards appearing inside different sets which label different edges outcoming from the same state can be true at the same time. Intuitively, after having filled all questionnaire and taken one photo per building, the target workflow requires services to iterate between states S3-S8 until a a good photo for each building has been sent to the remote storage. Then, the team must be ready to perform the operation again. In order to guarantee that pictures actually capture relevant aspects of the buildings, a sort of peer review strategy is adopted, i.e., each photo a unit writes in the remote storage must be read, evaluated and approved/rejected by a second unit. Both approval and rejection are publicly announced by writing a proper message on the blackboard (indeed, it is sufficient {pC=F} or {pC=T}). When all documents are sent (questionnaires are not subject to review process) a commit message is sent to the remote storage and the team can start a new iteration. Finally, in Figures 7.4 and 7.5 a solution to the distributed composition problem is presented which consists of a set of local orchestrators which, upon execution, coordinate the services in order to realize the target workflow of Figure 7.3. Recall that each mobile service is attached to a local orchestrator which is able to both assigning tasks to the service itself and broadcasting messages. In order to accomplish their task, that is, realizing workflow transitions 158 CHAPTER 7. SOME COVERED RELATED TOPICS by properly assigning a subset of workflow requests to the respective services, local orchestrators need to access, for each transition: (i) the set of workflow requests and (ii) the whole set of messages other orchestrators sent. For this reason, both workflow requests and orchestrator messages are broadcasted. Each orchestrator transition is labeled by a pair I/O, which means: if, in current state, I occurs, then perform O, where I = hA, M, si and O = hA0 , M 0 i with the following semantics: A is the set of tasks the workflow requests, M is the set of (broadcasted) messages the orchestrator received (including its own messages), s is the state reached by the attached service after tasks assigned by the orchestrator (A0 , see below) have been performed, A0 ⊆ A is the subset of actions the local orchestrator assigns to the attached service and M 0 is the set of messages the orchestrator broadcasts after the service performed A0 . Notation has been compacted by introducing some shortcuts for set representation. In details, (i) “. . .” stands for “any set of elements”: for instance, in the transition between states S0 and S1 of local orchestrator for M S1 (Figure 7.4(a)), the set {. . . commit} represents any set (of tasks) containing commit; (ii) an element with the prefix “-” stands for “anything but the element, possibly nothing”: for instance, in the first (from top) transition between states S4 and S5 of Figure 7.4(a), the set {. . . modify pA, -req space} stands for “any set (of tasks) not including req space and including modify pA”. Observe that local orchestrators are deterministic, that is, at each state, no ambiguity holds on which transition, if any, has to be selected. In general, this is due to the presence of messages, which are useful for selecting which tasks are to be assigned to each service. As an example, observe that third and fourth transitions of Figure 7.4(a) can be performed when a same set of tasks ({. . . req space, modify pA}) is requested by the workflow. The choice of which one is to be assigned to attached service depends on the messages the orchestrator received, which somehow represent other services current capabilities. So, in state S4, when the set of requested tasks includes both req space and modify pA: (i) if received messages include m13 (that is, the message local orchestrator for M S1 sends when the service reaches state S3 from S1), then the orchestrator assigns tasks {modify pA, req space} to the service; (ii) otherwise, the set of assigned tasks is {modify pA} and, consequently, there will be another local orchestrator assigning a set of tasks including req space to its respective service, basically depending on the messages it received. The orchestrators for M S2 and M S3 are roughly similar. The only noticeable difference is in transition between state S4 and S5 where the local orchestrator for M S3 assigns the same action modify pB for the attached service, independently of the other actions to be assigned. Indeed, orchestrators M S1 and M S2 makes this assignment dependent of the actions which are to be assigned to other services. 7.1. AUTOMATIC WORKFLOW COMPOSITION < { … eval_pC }, { … m 51 },S0 > / < { eval_pC }, { m01} > < { … commit }, { … }, { … } > / < { }, { } > S0 159 S8 S7 < { ... compile_qB }, { … m01 }, S4 > / < { compile_qB }, { m41 } > < { … read_pC }, { … m 01 }, S5 > / < { read_pC }, { m51 } > S1 < { ... write_qB } , { ... m41} , S0 > / < { write_qB}, { m01 } > S6 S2 < { ... move_A }, { ... m 01}, S1 > / < { move_A }, { m 11 } > 1 <{ … A} ve _ mo ,{ 0 …m }, S { /< 1> mo A ve_ }, { 1 m1 }> < { … write_pA }, { … m 31 }, S0 > / < { write_pA }, { m01 } > < {… - req_space modify_pA }, { ... m21 } ∪ { … m31 }, S3 > / < { modify_pA }, { m31 } > < { ... req_space }, { ... m 01 }, S0 > / < { req_space }, { m 11 } > S3 < { ... take_pA }, { ... m 11 }, S2 > / < { take_pA }, { m 21 } > < { ... take_pA } , { ... / < { take_pA }, { m 11 }, S3 m 31 } > < { ... req_space, modify_Pa }, { ... m 31 }, S3 > / < {modify_pA, req_space }, { m31 } > S4 > < { … req_space, modify_pA }, { … m 21, m02 }, S3 > / < { modify_pA }, { m30 } > S5 < { … req_space, modify_pA }, { … m21, m62 }, S3 > / < { modify_pA }, {m3 1 } > (a) Local orchestrator for M S1 < { … commit }, { … }, { … } > / < { }, { } > S0 S8 S7 < { ... compile_qC }, { … m 02 }, S3 > / < { compile_qC }, { m 32 } > < { … read_pB }, { … m 02 }, S1 > / < { read_pC }, { m12 } > S1 < { ... write_qC } , { ... m32} , S4 > / < { write_qC}, { m 42 } > S6 S2 < { ... move_C }, { ... m 42 }, S5 > / < { move_C }, { m 52 } > <{ C }, ove_ .. . m { . .. >/< 2 } , S5 m4 { mo ve_C }, { 2 m5 }> < { … write_pC }, { … m 62 }, S0 > / < { write_pA }, { m02 } > < {… - req_space modify_pC }, { ... m62 }, S6 > / < { modify_pC }, { m62 } > < { ... req_space, modify_Pa }, { ... m 32 }, S6 > / < {modify_pC }, { m62 } > S3 < { ... take_pC }, { ... m42 }, S6 > / < { take_pC }, { m62 } > S4 < { … req_space, modify_pA }, { … m 21, m02 }, S6 > / < { req_space }, { m62 } > S5 < { … req_space, modify_pC }, { … m21, m 62}, S6 > / < {req_space, modify_pC }, {m62 } > (b) Local orchestrator for M S2 Figure 7.4: Local orchestrators for services of Figure 7.2 (to be continued) 160 CHAPTER 7. SOME COVERED RELATED TOPICS < { … eval_pA }, { … m 13 }, S0 > / < { eval_pA }, { m 03} > < { … commit }, { … }, { … } > / < { }, { } > S0 S8 S7 < { ... compile_qA }, { … m03 }, S2 > / < { compile_qB }, { m23 } > < { … read_pA }, { … m 03 }, S1 > / < { read_pA }, { m13 } > S1 < { ... write_qA } , { ... m23} , S0 > / < { write_qA }, { m 03 } > S6 S2 < { ... move_B }, { ... m 03 }, S3 > / < { move_B }, { m 33 } > < B ve_ mo { . .. }, { ... m 3 3 }, S < m }, { 3 3 }> < { … write_pB }, { … m 43 }, S0 > / < { write_pB }, { m03 } > 0 < { ... take_pB }, { ... m 33 }, S4 > / < { take_pB }, { m 43 } > S3 >/ _B ove {m < { ... modify_pB }, { ... m43 }, S4 > / < { modify_pB }, { m43 } > S4 S5 (a) Local orchestrator for M S3 < { … commit } , { … }, S0 > / < { commit }, {m04} > S0 τ < { - commit } , { … }, S0 > / < { }, {m04} > S1 τ S8 S7 τ S6 < { … forward } , { … }, S0 > / < { forward }, {m04} > S5 < { … forward } , { … }, S0 > / < { forward }, {m 04} > S2 τ ≡ < { …} , { … }, { … } > / < { }, { } > τ S3 τ S4 (b) Local orchestrator for RS Figure 7.5: Local orchestrators for services of Figure 7.2 (continued) and target workflow of Figure 7.3 7.1. AUTOMATIC WORKFLOW COMPOSITION 7.1.3 161 The Proposed Technique The formal setting. A Workflow Specification Kit (WfSK) K = (A, V) consists of a finite set of actions A and a finite set of variables V, also called blackboard, that can assume only a finite set of values. Actions have known (but not modeled here) effects on the real world, while they do not change directly the blackboard. Using a WfSK K one can define workflows over K. Formally a workflow W over K is defined as a tuple: W = (S, s0 , G, δW , F ), where: • S is a finite set of workflow states; • s0 ∈ S is the single initial state; • G is a set of guards, i.e., formulas whose atoms are equalities (interpreted in the obvious way) involving variables and values.; • δW ⊆ S ×G×2A−{∅} ×S is the workflow transition relation: (s, g, A, s0 ) ∈ δW denotes that in the state s, if the guard g is true in the current blackboard state, then the set of (concurrent) actions A ⊆ A is executed and the service changes state to s0 ; we insist that such a transition relation is actually deterministic: for no two distinct transitions (s, g1 , A1 , s1 ) and (s, g2 , A2 , s2 ) in δW we have that g1 (γ) = g2 (γ) = true, where γ is the current blackboard state; • finally, F ⊆ S is the set of states of the workflow that are final, that is, the states in which the workflow can stop executing. In other words a workflow is a finite state program whose atomic instructions are sets of actions of A (more precisely invocation of actions), that branches on conditions to be evaluated on the current state of the blackboard V. What characterizes our setting however is that actions in the WfSK do not have a direct implementation, but instead are realized through available services. In other words action executions are not independent one from the other but they are constrained by the services that include them. A service is essentially a program for a client (actually the orchestrator, as we have seen). Such a program, however, leaves the selection of the set of actions to perform next to the client itself (actually the orchestrator). More precisely, at each step the program presents to the client (orchestrator) a choice of available sets of (concurrent) actions; the client (orchestrator) selects one of such sets; the actions in the selected set are executed concurrently; and so on. Formally, a service S is a tuple S = (S, s0 , G, C, δS , F ) where: • S is a finite set of states; 162 CHAPTER 7. SOME COVERED RELATED TOPICS • s0 ∈ S is the single initial state; • G is a set of guards, as described for workflows; • C is a set of partial variable assignment for V, that is used to update the state of the blackboard; • δS ⊆ S × G × 2A−{∅} × C × S is the service transition relation, where (s, g, A, c, s0 ) ∈ δS denotes that in the state s, if the guard g is true in the current blackboard state and it is requested the execution of the set of actions A ⊆ A, then the blackboard state is updated according to c and the service changes state to s0 ; • finally, F ⊆ S is the set of states that can be considered final, that is, the states in which the service can stop executing, but does not necessarily have to. Observe that, in general, services are nondeterministic in the sense that they may allow more than one transition with the same set A of actions and compatible guards evaluating to the same truth value 3 . As a result, when the client (orchestrator) instructs a service to execute a given set of actions, it cannot be certain of which choices it will have later on, since that depends on what transition is actually executed – nondeterministic services are only partially controllable. To each service we associate a local orchestrator. A local orchestrator is a module that can be (externally) attached to a service in order to control its operation. It has the ability of activating-resuming its controlled service by instructing it to execute a set of actions. Also, the orchestrator has the ability of broadcasting messages from a given set of M after observing how the attached service evolved w.r.t. the delegated set of actions, and to access all messages broadcasted by the other local orchestrators at every step. Notice that the local orchestrator is not even aware of the existence of the other services: all it can do is to access their broadcasted messages. Lastly, the orchestrator has full observability on the blackboard state. A (messages extended) service history h+ S for a given service S = (S, s0 , G, C, δS , F ), starting in a blackboard state γ0 , is any finite sequence of the form (s0 , γ 0 , M 0 )·A1 ·(s1 , γ 1 , M 1 ) · · · (s`−1 , γ `−1 , M `−1 )·A` · (s` , γ ` , M ` ), for some ` ≥ 0, such that for all 0 ≤ k ≤ ` and 0 ≤ j ≤ ` − 1: • s0 = s0 ; • γ 0 = γ0 ; 3 Note that this kind of nondeterminism is of a devilish nature – the actual choice is out of the client (orchestrator) control. 7.1. AUTOMATIC WORKFLOW COMPOSITION 163 • Ak ⊆ A; • (sj , g, Aj+1 , c, sj+1 ) ∈ δi with g(γ j ) = true and c(γ j ) = γ j+1 that is, service S can evolve from its current state sj to state sj+1 while updating the backboard state from γ j to γ j+1 according to what specified in c; • M 0 = ∅ and M k ⊆ M, for all k ∈ {0, . . . , `}. The set HB+ denotes the set of all service histories for S. Formally, a local orchestrator O = (P, B) for service S is a pair of functions of the following form: P : HB+ × 2A → 2A ; B : HB+ × 2A × S → 2M . Function P states what actions A0 ⊆ A to delegate to the attached service at local service history h+ B when actions A were requested. Function B states what messages, if any, are to be broadcasted under the same circumstances and the fact that the attached service has just moved to state s after executing actions A0 . We attach one local orchestrator Oi to each available service Si . In general, local orchestrators can have infinite states. A distributed orchestrator is a set X = (O1 , . . . , On ) of local orchestrators, one for each available service Si . We call device the pair D = (S, O) constituted by a service S and its local orchestrator O. A workflow mobile environment (WfME) is constituted by a finite set of devices E = (D1 , . . . , Dn ) defined over the same WfSK K. Local Orchestrator Synthesis. The problem we are interested in is the following: given n services S1 , . . . , Sn over WfSK K = (A, V) and an initial blackboard state γ0 , and a workflow W over K, synthesize a distributed orchestrator, i.e., a team of n local orchestrators, such that the workflow is realized by concurrently running all services under the control of their respective orchestrators. More precisely, let S1 , . . . , Sn be the n services, each with Si = (Si , si0 , Gi , Ci , δi , Fi ), γ0 be the initial state of the blackboard, and W = (SW , sW0 , GW , δW , FW ) the workflow to be realized. We start by observing that the workflow (being deterministic) is completely characterized by its set of traces, that is, by the set of infinite action sequences that are faithful to its transitions, and of finite sequences that in addition lead to a final state. More formally, a trace for W is a sequence of pairs (g, A), where g ∈ G is a guard over V and A ⊆ A is non-empty set of actions, of the form t = (g 1 , A1 ) · (g 2 , A2 ) · · · such that there exists an execution history 4 for 4 Analogous the execution histories defined for services except that they do not include messages. 164 CHAPTER 7. SOME COVERED RELATED TOPICS W, (s0 , γ 0 )·A1 ·(s1 , γ 1 ) · · · where g i (γ i−1 ) = true for all i ≥ 1. If the trace t = (g 1 , A1 ) · · · (g ` , A` ) is finite, then there exists a finite execution history (s0 , γ 0 )· · ·(s` , γ ` ) · · · with s` ∈ FW . Now, given a trace t = (g 1 , A1 ) · (g 2 , A2 ) · · · of the workflow W, we say that a distributed orchestrator X = (O1 , . . . , On ) realizes the trace t iff for all ` ` and for all “system history” h` ∈ Ht,X (formally defined defined below) with `+1 ` g (γ ) = true in the last configuration of h` , we have that Extt,X (h` , A`+1 ) is nonempty, where Extt,X (h, A) is the set of (|h| + 1)-length system histories |h|+1 |h|+1 of the form h · [A1 , . . . , An ] · (s1 , . . . , sn , γ |h|+1 , M |h|+1 ) such that: |h| |h| • (s1 , . . . , sn , γ |h| , M |h| ) is the last configuration in h; S • A = ni=1 Ai , that is, the requested set of actions A is fulfilled by putting together all the actions executed by every service. • Pi (h|i , A) = Ai for all i ∈ {1, . . . , n}, that is, the local orchestrator Oi instructed service Si to execute actions Ai ; |h| |h|+1 • (si , gi , Ai , ci , si ) ∈ δi with gi (γ |h| ) = true, that is, service Si can |h| |h|+1 evolve from its current state si to state si w.r.t. the (current) variable assignment γ |h| ; • γ |h|+1 ∈ C(γ |h| ), where C = {c1 , . . . , cn } is the set of the partial variable assignments ci due to each of the service, and C(γ |h| ) is the set of blackboard states that are obtained from γ |h| by applying each c1 , . . . , cn in every possible order; S • M |h|+1 = ni=1 Bi (h|i , A, s|h|+1 ), that is, the set of broadcasted messages is the union of all messages broadcasted by each local orchestrator. k The set Ht,X of all histories that implement the first k actions of trace t and is prescribed by X is defined as follows: 0 = {(s , . . . , s , γ , ∅)}; • Ht,X 10 n0 0 k+1 • Ht,X = S k hk ∈Ht,X Extt,X (hk , Ak+1 ), k ≥ 0; In addition if a trace is finite and ends after m actions, and all along all its m end with all services in guards are satisfied, we have that all histories in Ht,X a final state. Finally, we say that a distributed orchestrator X = (O1 , . . . , On ) realizes the workflow W if it realizes all its traces. In order to understand the above definitions, let us observe that, intuitively, the team of local orchestrators realizes a trace if, as long as the guards in the trace are satisfied, they can globally perform all actions prescribed by 7.1. AUTOMATIC WORKFLOW COMPOSITION 165 the trace (each of the local orchestrators instructs its service to do some of them). In order to do so, each local orchestrator can use the history of its service together with the (global) messages that have been broadcasted so far. In some sense, implicitly through such messages, each local orchestrator gets information on the other service local histories in order to take the right decision. Furthermore, at each step, each local orchestrator broadcasts messages. Such messages will be used in the next step by all service orchestrators to choose how to proceed. Our technical results make use of some outcomes given in [122], which can be summarised by the following theorem. Theorem 7.1. There exists a sound, complete and terminating procedure for computing a distributed orchestrator X = (O1 , . . . , On ) that realizes a workflow W over a WfSK K relative to services S1 , . . . , Sn over K and blackboard state γ0 . Moreover each local orchestrator Oi returned by such a procedure is finite state and require a finite number of messages (more precisely message types). Observe that there exists no finiteness limitation on the number of states of the local orchestrators, nor on the number of messages to be exchanged. Therefore, it does not lose generality. The synthesis procedure is based on the general techniques proposed in [10, 11, 32], based on a reduction of the problem to satisfiability of a Propositional Dynamic Logic formula [61] whose models roughly correspond to orchestrators.5 From a realization point of view, such a procedure can be implemented through the same basic algorithms behind the success of the description logics-based reasoning systems used for OWL6 , such as FaCT7 , Racer8 , Pellet9 , and hence its applicability appears to be quite promising. The reader should note that the technique is not exploited at run-time, but before the execution of the services and the local orchestrators effectively happens, therefore the requirements of mobile scenarios are not violated (e.g., just to have a concrete example, it can be run on a laptop on the jeep taking the team on the operation field). 7.1.4 Final remarks This section has studied the workflow composition problem within a distributed general setting; the solutions proposed here are therefore palatable to a wide range of contexts, e.g., nomadic teams in emergency management, in 5 There are also works that study alternatives based on model checking techniques. http://www.omg.org/uml/ 7 http://www.cs.man.ac.uk/ horrocks/FaCT/ 8 http://www.sts.tu-harburg.de/ r.f.moeller/racer/ 9 http://www.mindswap.org/2003/pellet/ 6 166 CHAPTER 7. SOME COVERED RELATED TOPICS which we have multiple independent agents and a centralized solution is not conceivable. We plan to implement concretely this approach in the context of the research project WORKPAD, widely introduced in Section 2, as well as in another project SM4All [21]. SM4All is investigating an innovative platform for collaborating smart embedded services in pervasive and person-centric environments, through the use of semantic techniques and workflow composition. In conclusion, the kind of problems we dealt with are special forms of reactive process synthesis [109, 110]. It is well known that, in general, distributed solutions are much harder to get than centralized ones [110, 78]. This has not hampered our approach since we allow for equipping local controllers with autonomous message exchange capabilities, even if such capabilities are not present in the services that they control. 7.2 Visual Support for Work Assignment in Process Management Systems This section describes a novel work-list handler that is able to support process participants when choosing the next work item to work on. The work list handler component takes care of work distribution and authorisation issues by assigning work items to appropriate participants. Typically, it uses a socalled “pull mechanism”, i.e., work is offered to all resources that qualify and the first resource to select the work item will be the only one executing it. To allow users to “pull the right work items in the right order”, basic information is provided, e.g., task name, due date, etc. However, given the fact that the work list is the main interface of the PMS with its users it seems important to provide support that goes beyond a sorted list of items. If work items are selected by less qualified users than necessary or if users select items in a non-optimal order, then the performance of the overall process is hampered. Assume the situation where multiple resources have overlapping roles and authorisations and that there are times where work is piling up (i.e., any normal business). In such a situation the questions listed below are relevant. • “What is the most urgent work item I can perform?” • “What work item is, geographically speaking, closest to me?” • “Is there another resource that can perform this work item that is closer to it than me?” • “Is it critical that I handle this work item or are there others that can also do this?” 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 167 • “How are the work items divided over the different departments?” To our knowledge, commercial as well as open source PMSs present work lists simply as a list of work items each with a short textual description. Some products sort the work items in a work list using a certain priority scheme specified at design time and not updated at run time. To support the user in a better way and assist her in answering the above questions, we use diagrams. A diagram can be a geographical diagram (e.g., the map of a university’s campus). But other diagrams can be used, e.g., process schemas, organisational diagrams, Gantt charts, etc. Work items can be visualised by dots on diagrams. By not fixing the type of diagram, but allowing this choice to be configurable, different types of relationships can be shown thus providing a deeper insight into the context of the work to be performed. Work items are shown on diagrams. Moreover, for some diagrams also resources can be shown, e.g., the geographical position of a user. Besides the “diagram metaphor” we also use the “distance metaphor”. Seen from the viewpoint of the user some work items are close while others are farther away. This distance may be geographic, e.g., a field service engineer may be far away from a malfunctioning printer at the other side of the campus. However, many other distance metrics are possible. For example, one can support metrics capturing familiarity with certain types of work, levels of urgency, and organisational distance. It should be noted that the choice of metric is orthogonal to the choice of diagram thus providing a high degree of flexibility in context visualisation. Resources could for example opt to see a geographical map where work items, whose position is calculated based on a function supplied at design time, display their level of urgency. This section proposes different types of diagrams and distance metrics. Moreover, the framework has been implemented and integrated in YAWL.10 YAWL is an open source workflow system based on the so-called workflow patterns. However, the framework and its implementation are set-up in such a way that it can easily be combined with other PMSs. The section is structured as follows. Section 7.2.1 discusses the state of the art in work list visualisation in PMSs, whereas Section 7.2.2 provides a detailed overview of the general framework. Section 7.2.5 focusses on the implementation of the framework and highlights some design choices in relation to user and system interfaces. In Section 7.2.9 the framework is illustrated through a case study. Section 7.2.10 summarises the contributions and outlines avenues of future work aimed at improving the operationalisation of the framework. 10 www.yawlfoundation.org 168 7.2.1 CHAPTER 7. SOME COVERED RELATED TOPICS Related Work Little work has been conducted in the field of work list visualisation. Visualisation techniques in the area of PMS have predominantly been used to aid in the understanding of process schemas and their run time behaviour, e.g. through simulation [60] or process mining [135]. Although the value of business process visualisation is acknowledged, both in the literature [15, 86, 124, 145] and in the industry, little work has been done in the context of visualising work items. The aforementioned body of work does not provide specific support for context-dependent work item selection. This is addressed though in the work by Brown and Paik [17], whose basic idea is close to the proposal here. Images can be defined as diagrams and mappings can be specified between work items and these diagrams. Work items are visualized through the use of intuitive icons and the colour of work items changes according to their state. However, the approach chosen does not work so well in real-life scenarios where many work items may have the same position (especially in course-grained diagrams) as icons with the same position are placed alongside each other. This may lead to a situation where a diagram is completely obscured by its work items. In our approach, these items are coalesced in a single dot of which the size is proportionate to their number. By gradually zooming in on such a dot, the individual work items cam become visible again. In addition, in [17] there is no concept similar to our distance notion, which is an ingredient that can provide significant assistance with work item selection to resources. Finally, the work of Brown and Paik does not take the visualisation of the positions of resources into account. Also related is the work presented in [77], where proximity of work items is considered without discussing their visualization. Most PMSs present work lists as a simple enumeration of their work items, their textual descriptions, and possibly information about their priority and/or their deadlines. This holds both for open source products, as e.g. jBPM11 and Together Workflow12 , as for commercial systems, such as SAP Netweaver13 and Flower14 . An exception is TIBCO’s iProcess Suite15 which provides a richer type of work list handler that partially addresses the problem of supporting resources with work item selection. Figure 7.6 depicts a screen shot of the work list handler. In the bottom left corner a resource’s work list is shown, and above this the lengths of the work lists of other resources is shown. By clicking on a work item, a resource can see it on a Google Map positioned 11 jBPM web site - http://www.jboss.com/products/jbpm Together Workflow web site - http://www.together.at/together/prod/tws/ 13 Netweaver web site - http://www.sap.com/usa/platform/netweaver 14 Flower web site - http://global.pallas-athena.com/products/bpmflower product/ 15 iProcess Suite web site - http://www.tibco.com/software/business process management/ 12 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 169 Figure 7.6: TIBCO’s iProcess Client where it should be executed. The iProcess Suite also supports a kind of lookhead in the form of a list of “predicted” work items and their start times. One can also learn about projected deadline expirations and exception flows. This is achieved through the use of expected durations specified at design time for the various tasks. Our visualisation framework is more accurate as it can take actual execution times of work items of a task into account through the use of log files when considering predictions for new work items of that task. Basically, the iProcess Suite provides support for some specific views (geographical position, deadline expiration) but these are isolated from each other. Our approach allows these views (and others) to be combined (e.g. a geographical view where deadlines are also visualised) thus enabling the use of views that may prove useful in certain contexts. Our approach also generalises over the type of diagram and goes beyond support for a single diagram as in the iProcess Suite (a geographical map). 7.2.2 The General Framework The proposed visualisation framework is based on a two-layer approach: (1) diagrams and (2) the visualisation of work items based on a distance notion. A work item is represented as a dot positioned along certain coordinates on a background diagram. A diagram is meant to capture a particular perspective of the context of the process. Since a work item can be associated with several perspectives, it can be visualised in several diagrams (at different positions). Diagrams can be designed as needed. When the use of a certain diagram 170 CHAPTER 7. SOME COVERED RELATED TOPICS Process context view The physical environment where tasks are going to be performed. The process schema of the case that work items belong to. Deadline expiration of work items. The organisation that is in charge of carrying out the process. The materials that are needed for carrying out work items. Costs versus benefits in executing work items. Possible diagram and mapping A real geographical diagram (e.g., Google map). Work items are placed where they should be performed and resource are placed where they are located. The process schema is the diagram and work items are placed on top of tasks that they are an instance of. The diagram is a time-line where the origin is the current time. Work items are placed on the time-line at the latest moment when they can start without their deadline expiring. The diagram is an organizational chart. Work items are associated with the role required for their execution. Resources are also shown based on their organizational position. The diagram is a multidimensional graph where the axes are the materials that are needed for work item execution. Let us assume that materials A and B are associated with axes x and y respectively. In this case, a work item is placed on coordinates (x, y) if it needs a quantity of x of material A and a quantity y of material B. In this case, the axes represent “Revenue” (the amount of money received for the performance of work items) and “Cost” (the expense of their execution). A work item is placed on coordinates (x, y) if the revenue of its execution is x and its cost is y. In this case one is best off executing work items close to the x axis and far from the origin. Table 7.1: Examples of diagrams and mappings. is envisaged, the relationship between work items and their position on the diagram should be specified through a function determined at design time. Table 7.1 gives some examples of context views and the corresponding work item mapping. Several active “views” can be supported whereby users can switch from one view to another. Resources can (optionally) see their own position on the diagram and work items are coloured according to the value of the applicable distance metric. Additionally, it may be helpful to show executing work items as well as the position of other resources. Naturally, these visualisations are governed by the authorisations that are in place. Our framework assumes a generic lifecycle model as described in [120], which is slightly more elaborated than the SmartPM one. First, a work item is created indicating that it is ready for distribution. The item is then offered to appropriate resources. A resource can commit to the execution of the item, after which it moves to the allocated state. The start of its execution leads it to the next state, started, after which it can successfully complete, it can be suspended (and subsequently resumed ) or it can fail altogether. During runtime a workflow engine (in our case the YAWL engine) informs the framework about the lifecyle states of work items. 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 7.2.3 171 Fundamentals In this section the various notions used in our framework, e.g. work item and resource, are defined formally. Definition 7.1 (Work item). A work item w is a tuple (c, t, i, y, e, l), where: • c is the identifier of the case that w belongs to. • t is the identifier of the task of which w is an instance. • i is a unique instance number. • y is the timestamp capturing when w moved to the “offered” state. • e is the (optional) deadline of w. • l represents the (optional) GPS coordinates where w should be executed. Dimensions y and l may be undefined in case the work item w is not yet offered or no specific execution location exists respectively. The e value concerns timers which may be defined in YAWL processes. A process region may be associated with a timer. When the timer expires, the work items part of the region are cancelled. Note that a work item can potentially be a part of more than one cancellation region and that this has implications for the definition of y. In such a case the latest possible completion time with respect to these cancellation regions is assumed. Definition 7.2 (Resource). A resource r is a pair (j, l), where: • j is the identifier of the resource. • l represents the (optional) GPS coordinates where the resource is currently located. The notation wx is used to denote the projection on dimension x of work item w, while the notation ry is used to denote the projection on dimension y of resource r. For example, wt yields the task of which work item w is an instance. Work items w0 and w00 are considered to be siblings iff wt0 = wt00 . The set Coordinates consists of all possible coordinates. Elements of this set will be used to identify various positions on a given map. Definition 7.3 (Position function). Let W and R be the set of work items and resources. Let M be the set of available maps. For each available map m ∈ M , there exists a function position m : W ∪ R 6→ Coordinates which returns the current coordinates for work items and available resources on map m. 172 CHAPTER 7. SOME COVERED RELATED TOPICS Metric distanceF amiliarity (w, r) distanceGeo Distance (w, r) distanceP opularity (w, r) distanceU rgency (w, r) distanceP ast Execution(w,r) Returned Value How familiar is resource r with performing work item w. This can be measured through the number of sibling work items the resource has already performed. How close is resource r to work item w compared to the closest resource that was offered w. For the closest resource this distance is 1. In case w does not have a specific GPS location where it should be executed, this metric returns 1 for all resources. The ratio of logged-in resources having been offered w to all loggedin resources. This metric is independent from resource r making the request. The ratio between the current timestamp and the latest timestamp when work item w can start but is not likely to expire. The latter timestamp is obtained as the difference between we , the latest timestamp when w has to be finished without expiring, and w’s estimated duration. This estimation is based on past execution of sibling work items of w by r. How familiar is resource r with work item w compared to the familiarity of all other resources that w has been offered to. More information about this metric is provided in the text. Table 7.2: Distance Metrics currently provided by the implementation For a map m ∈ M , the function position m may be partial, since some elements of W and/or R may not have an associated position. Consider for example the case where a work item can be performed at any geographical location or where it does not really make sense to associate a resource with a position on a certain map. As the various attributes of work items and resources may vary over time it is important to see the class of functions position m as time dependent. To formalise the notion of distance metric, a distance function is defined for each metric that yields the distance between a work item and a resource according to that metric. Definition 7.4 (Distance function). Let W and R be the set of work items and resources. Let D be the set of available distance metrics. For each distance metric d ∈ D, there exists a function distanced : W × R → [0, 1] that returns a number in the range [0,1] capturing the distance between work-item w ∈ W and resource r ∈ R with respect to metric d.16 Given a certain metric d and a resource r, the next work item r should perform is a work item w for which the value distanced (w, r) is the closest to 1 among all offered work items. 7.2.4 Available Metrics In Table 7.2 a number of general-purpose distance metrics are informally explained. These are all provided with the current implementation. Later in 16 Please note the value 1 represents the minimum distance while 0 is the maximum. 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 173 this section, we formalise the notion of metrics. Let us denote R as the set of resources currently logged in. In order to make explanations easier, some auxiliary functions are introduced. past execution(w,r) yields the weighted mean of the past execution times of the last h-th work items performed by r among all work item siblings of w. In this context, the past execution time of work item w0 is defined as the duration that elapsed between its assignment to r and its successful completion. Let timei (w, r) be the execution time of the i-th last work item among w’s siblings performed by r, then: j(w,r,h) X αi−1 · timei (w, r) i=1 past execution(w, r) = (7.1) j(w,r,h) X αi−1 i=1 where constant α ∈ [0, 1] and value j(w,r,h) is the minimum between a given constant h and the number of sibling work items of w performed by r. Both h and α have to be tuned through testing. If value j(w,r,h) is equal to zero, past execution(w, r) is assumed to take an arbitrary large number.17 The intuition behind this definition stems from the fact that more recent executions should be given more consideration and hence weighted more as they better reflect resources gaining experience in the execution of instances of a certain task. Res(w) returns all currently logged-in resources that have been offered w: Res(w) = {r ∈ R | w is offered to r}. best past execution(w) denotes the smallest value for past execution(w, r) computed among all logged-in resources r qualified for w. Specifically: best past execution(w) = min past execution(w, r0 ) r0 ∈Res(w) bestDistance(w) returns the minimum geographic distance between a given work-item w and all qualified resources: best Distance(w) = min kwl − rl0 k r0 ∈Res(w) where kwl − rl0 k stands for the Euclidian distance between the GPS coordinates where w should be executed and the GPS location of resource r. Function best Distance(w) is not total since wl may be undefined for certain work items w. 17 Technically, we set it as the maximum floating value. 174 CHAPTER 7. SOME COVERED RELATED TOPICS Using these auxiliary functions the following metrics can be defined: 1. Familiarity. How familiar is resource r with performing work item w. This can be measured through the number of sibling work items the resource has already performed: ½ distanceF amiliarity (w, r) = 0 best past execution(w) past execution(w,r) best past execution(w) → ∞ otherwise The best past execution(w) value can tend to infinite, if nobody has ever executed work items for task wt . Otherwise, if someone executed work item siblings of wt but r did not, then past execution(w, r) → ∞ and, hence, distanceF amiliarity (w, r) → 0. 2. Popularity. The ratio of logged-in resources having been offered w to all logged-in resources. This metric is independent from resource r making the request. The intuition is that if many resources can perform w then it is quite distant from every resource. Indeed, even if a resource doesn’t pick w for performance, it is likely someone else may execute w. Therefore: distanceP opularity (w, r) = 1 − |Res(w)| |R| If every resource can perform w, then the distance is 0. If many resources can perform w, then the value is near to 1. 3. Urgency. The ratio between the current timestamp and the latest timestamp when work item w can start but is not likely to expire. This second timestamp is obtained from we , the latest timestamp when w has to be finished without expiring, and w’s estimated duration. This estimation relies on the past execution by r of w’s sibling work items. Specifically: ½ distanceU rgency (w, r) = 1− 0 tnow we −pastExecution(w,r) we is def ined we is undef ined where tnow stands for the current timestamp. If r has never performed work-items for the same task wt , pastExecution(w, r) → ∞ and, hence, distanceU rgency (w, r) → 0. 4. Relative Geographic Distance. How close is resource r to work item w compared to the closest resource that was offered w. For the closest resource 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 175 this distance is 1. In case w does not have a specific GPS location where it should be executed, this metric returns 1 for all resources. Its definition is: kwl −rl k bestDistancet (w) > 0 1 − bestDistance(w) distanceRelative Geo (w, r) = 0 bestDistancet (w) = 0 1 bestDistancet (w) is undef 5. Relative Past Execution. The metric chosen combines the familiarity of a resource with a certain work item and the familiarity of other resources that are able to execute that work item: ± 1 past execution(w, r) distanceRelative P ast Execution (w, r) = X . ± 1 past execution(w, r0 ) r0 ∈Res(w) (7.2) Let us give an informal explanation. First observe that if exactly one resource r exists capable of performing work item w, then the equation yields one. If n resources are available and they roughly have the same familiarity with performing work item w, then for each of them the distance will be about 1/n. It is clear then that as n increases in value, the value of the distance metric approaches zero. If on the other hand many resources exist that are significantly more effective in performing w than a certain resource r, then the value of the denominator increases even more and the value of the metric for w and r will be closer to zero. For instance, let us suppose that at time t̂ there are n resources capable of performing w. Let us assume that, on average, one of them, namely r1 is ˜ Moreover, let us also assume that the such that past execution(w, r1 ) = d. other resources required twice this amount of time on average in the past, i.e. ˜ for each resource ri (with i > 1) past execution(w, ri ) = 2d. In such a situation, the distance metric value for r1 is as follows: distance(w, r1 , Relative Past Execution) = = = 1 past execution(w,r1 ) Pn 1 1 i=2 past execution(w,ri ) past execution(w,r1 ) + 1 ˜ 1 + d˜ Pdn 1 i=2 2d˜ = 1 2 n−1 = 1 + n 1+ 2 This value is greater than 1/n, if n > 1 (i.e., there are al least two resources that may perform w). If n = 1, then it is easy to see that the obtained value is 1 for both. 176 CHAPTER 7. SOME COVERED RELATED TOPICS Conversely, the value for any other resource ri (with i > 1) is as follows: distance(w, ri , Relative Past Execution) = = = 1 past execution(w,ri ) Pn 1 1 i=2 past execution(w,ri ) past execution(w,r1 ) + 1 + d˜ 1 2d˜ Pn 1 i=2 2d˜ = 1/2 1 n−1 = 1 + n 1+ 2 For all n > 0, this value is smaller than 2 n+1 , that is the metric value for r1 . Work-item ageing. Some of the metrics above suffer from the fact that their values do not change over time. Therefore, if some work-items have a small value with respect to those metrics, it is likely that there are always other work items that have a greater value for those metrics. If resources behave “fairly”, picking always work items that provide more benefit for the organizations, some work-items could remain on a work list for a very long time or even indefinitely. Therefore, we devised a technique of ageing work-items that occur on work lists in such a way that they eventually become the least distant work item. Let d be any metric and χten = distanced (w, r) be the distance value when w becomes enabled, where w, r are, respectively, a metric and resource. The distance value with respect to metric d at time ten + t ages as follows: χten +t = 1 − (1 − χten ) · exp−α·t (7.3) If t = 0, then χten +t = χten and if t → ∞ (i.e., time t increases indefinitely), then χten +t → 1. Please note that if α = 0, then work-items do not age. The greater value α, the more quickly Equation 7.3 approaches 1 when t increases. Vice versa, smaller values of α make the growth of Equation 7.3 with t slower. 7.2.5 Implementation The general framework described in the previous section has been operationalised through the development of a component that can be plugged into the YAWL system. Section 7.2.6 gives an overview of YAWL18 , an open source PMS developed by the Queensland University of Technology, Brisbane (Australia), in cooperation with the Technical University of Eindhoven, The Netherlands. 18 http://www.yawlfoundation.org 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 177 Section 7.2.7 illustrates some of the visualisation features provided by the implementation, whereas Section 7.2.8 focusses on how the component fits within the YAWL architecture. 7.2.6 The YAWL system The YAWL environment is an open source PMS, based on the workflow patterns [120, 134], using a service-oriented architecture. The YAWL engine and all other services (work list handler, web-service broker, exception handler, etc.) communicate through XML messages. YAWL offers the following distinctive features: • YAWL offers comprehensive support for the control-flow patterns. It is the most powerful process specification language for capturing controlflow dependencies. • The data perspective in YAWL is captured through the use of XML Schema, XPath and XQuery. • YAWL offers comprehensive support for the resource patterns. It is the most powerful process specification language for capturing resourcing requirements. • YAWL has a proper formal foundation. This makes its specifications unambiguous and automated verification becomes possible (YAWL offers two distinct approaches to verification, one based on Reset nets, the other based on transition invariants through the WofYAWL editor plugin). • YAWL has been developed independent from any commercial interests. It simply aims to be the most powerful language for process specification. • For its expressiveness, YAWL needs few constructs compared with other languages, such as BPMN. • YAWL offers unique support for exceptional handling, both those that were and those that were not anticipated at design time. • YAWL offers unique support for dynamic workflow through the Worklets-approach. Workflows can thus evolve over time to meet new and changing requirements. • YAWL aims to be straightforward to deploy. It offers a number of automatic installers and an intuitive graphical design environment. 178 CHAPTER 7. SOME COVERED RELATED TOPICS • Through the BPMN2YAWL component, BPMN models can be mapped to the YAWL environment for execution. • The Declare component (released throgh declare.sf.net) provides unique support for specifying workflows in terms of constraints. This approach can be combined with the Worklet approach thus providing very poewrful flexibility support. • YAWL’s architecture is Service-oriented and hence one can replace existing components with one’s own or extend the environment with newly developed components. • The YAWL environments supports the automated generation of forms. This is particularly useful for rapid prototyping purposes. • Automated tasks in YAWL can be mapped to Web Services or to Java programs. • Through the C-YAWL approach a theory has been developed for the configuration of YAWL models.19 . • Simulation support is offered through a link with the ProM environment.20 Through this environment it is also possible to conduct postexecution analysis of YAWL processes (e.g. in order to identify bottlenecks). The YAWL work-list handler is developed as a web application. Its graphical interface uses different tabs to show the various queues (e.g., started work items) (see Figure 7.7). The visualisation framework can be accessed through a newly introduced tab and is implemented as a Java Applet. 7.2.7 The User Interface The position and distance functions represent orthogonal concepts that require joint visualisation for every map. The position function for a map determines where work items and resources will be placed as dots, while the distance function will determine the colour of work items. Conceptually, work item information and resource information is split and represented in different layers. Users can choose which layers they wish to see and in case they choose both layers which of them should overlay the other. 19 20 http://www.processconfiguration.com http://www.processmining.org 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 179 Figure 7.7: The YAWL work-list handler Work-item Layer. Distances can be mapped to colours for work items through a function colour : [0, 1] → C which associates every metric value with a different colour in the set C. In our implementation colours range from white to red, with intermediate shades of yellow and orange. When a resource sees a red work item this could for example indicate that the item is very urgent, that it is one of those most familiar to this resource, or that it is the closest work item in terms of its geographical position. While the colour of a work item can depend on the resource viewing it, it can also depend on which state of the lifecycle it is in. Special colours are used to represent the various states of the work item lifecycle and Table 7.3 provides an overview. The various rows correspond to the various states and their visualisation. Resources can filter work items depending on the state of items. This is achieved through the provision of a checkbox for each of the states of Table 7.3. Several checkboxes can be ticked. There is an additional checkbox which allows resources to see work items that they cannot execute, but they are authorised to see. Resources may be offered work items whose positions are the same or very close. In such cases their visualisations may overlap and they are grouped into a so-called “joint dot”. The diameter of a joint dot is proportional to the number of work items involved. More precisely, the diameter D of a joint dot is determined by D = d(1 + lg n), where d is the standard diameter of a normal dot and n is the number of work items involved. Note that we use a logarithmic (lg) scaling for the relative size of a composite dot. 180 CHAPTER 7. SOME COVERED RELATED TOPICS Work item state Created Offered to single/multiple resource(s) Allocated to a single resource Started Suspended Failed Completed Colour scheme used in the work-list handler Work item is not shown. The colour is determined by the distance to the resource with respect to the chosen metric. The colour ranges from white through various shades of yellow and orange to red. Purple. Black. The same as for offered. Grey. Work item is not shown. Table 7.3: Visualisation of a work item depending on its state in the life cycle. Combining several work items int a single dot raises the question of how the distance of this dot is determined. Four options are offered for defining the distance of a joint dot, one can take a) the maximum of all the distances of the work items involved, b) their minimum, c) their median, or d) their mean. When a resource clicks on a joint dot, all work items involved are enumerated in a list and they are coloured according to their value in terms of the distance metric chosen. Resource Layer. When a resource clicks on a work item the positions of the other resources to whom this work item is offered are shown. Naturally this is governed by authorisation privileges and by the availability of location information for resources for the map involved. Resource visualisation can be customised so that a resource can choose to see a) only herself, b) all resources, or c) all resources that can perform a certain work item. The latter option supports the case where a resource clicks on a work item and wishes to see the locations of the other resources that can do this work item. 7.2.8 Architectural Considerations Figure 7.8 shows the overall architecture of the visualisation framework and the connections with other YAWL components. Specifically, the visualisation framework comprises: The Visualisation Applet is the client-side applet that allows resources to access the visualisation framework and it resides as a separate tab in the work-list handler. The Visualisation Designer is used by special administrators in order to define and update maps as well as to specify the position of work items 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 181 Figure 7.8: Position of the visualisation components in the YAWL architecture. on defined maps. Designers can define positions as fixed or as variable through the use of XQuery. In the latter case, an XQuery expression is defined that refers to case variables. This expression is evaluated at run time when required. Services is the collective name for modules providing information used to depict maps and to place work items (e.g. URLs to locate map images, work item positions on various maps). The YAWL engine is at the heart of the YAWL environment. It determines which work items are enabled and can thus be offered for execution and it handles the data that is involved. While the YAWL engine offers a number of external interfaces, for the visualisation component interfaces B and E are relevant. Interface B is used, for example, by the work list handler to learn about work items that need to be offered for execution. This interface can also be used for starting new cases. Interface E provides an abstraction mechanism to access log information, and can thus e.g. be used to learn about past executions of siblings of a work item. In particular one can learn how long a certain work item remained in a certain state. The work list handler is used by resources to access their “to-do” list. The standard version of the work list handler provides queues containing work items in a specific state. This component provides interface G which allows other components to access information about the relationships between work items and resources. For example, which resources have been offered a certain work item or which work items are in a certain state. Naturally this component is vital to the Visualisation Applet. 182 CHAPTER 7. SOME COVERED RELATED TOPICS In addition to interface G, the Visualisation Applet also connects to the Services modules through the following interfaces: The Position Interface provides information about maps and the positioning of work items on these maps. Specifically, it returns an XQuery over the YAWL net variables that the Visualisation Applet has to compute. The work list handler needs to be consulted to retrieve the current values of these variables. The Metric Interface provides information about available metrics and their values for specific work item - resource combinations. The Resource Interface is used to update and retrieve information concerning positions of active resources on maps. The visualisation framework was integrated into the standard work list handler of YAWL through the addition of a JSP (Java Server Page). All of the services of the visualisation framework share a repository, referred to as Visualisation Repository in Figure 7.8, which stores, among others, XQueries to compute positioning information, resource locations in various maps, and names and URLs of maps. Services periodically retrieve log data through Interface E in order to compute distance metric values for offered work items. For instance, to compute the metric Relative Past Execution (Equation 7.2) for a certain resource, one can see from Equation 7.1 that information is required about the h past executions of sibling work items performed by that resource. To conclude this section, we would like to stress that the approach and implementation are highly generic, i.e., it is relatively easy to embed the visualisation framework in another PAIS. Interface Details. The modules which are collectively named Service are implemented as Tomcat web applications. Specifically, each interface is implemented as a web application and methods are provided as servlets, which take inputs and return outputs as XML documents. Figure 7.9 summarizes the methods offered by all implemented interfaces. Although they are actually servlets and parameters XML documents, we conceptualise them as methods of classes of an object-oriented programming language. Interface Metric provides two methods to get: 1) all available metrics (specifically getMetrics()), which returns the list of metric names and 2) the distance metric value for single work items (i.e., getDistance()), which takes a work item identifier and a metric name as input and returns the value for that metric for that work item. 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 183 Figure 7.9: Details of the interfaces provided. Interface Resource provides two methods basically to get and set the resource position with respect to a specified map. Finally, interface Position allows one to request information about all available maps through method getMaps(). In particular, it returns an array of objects Map. Each object defines two properties: 1) the map name and 2) the URL where the map can be found. Method getResourcePosition() takes a resource identifier and a given map as input, and returns the coordinates of such a resource on the map specified. This information is mostly what resources themselves provide through method setResourceCoordinate() of interface Resource. Method getWorkitemPosition() of interface Position is very similar but operates on work items instead of resources. None of the interfaces accesses the Visualisation Repository database directly for modularity questions. In fact, the Visualisation Repository Interface exists solely for the purpose of masking interaction with database, namely Visualisation Repository Interface. As the various methods are sufficiently self-explanatory we are not providing more details. The only thing worth mentioning is that getLastPastExecutions returns the duration of the last h sibling work items offered within the last limitDays days. This method is required for computing function pastExecution. In order to return the h more recent executions, the method needs to obtain all work items and, then, to sort them in descending order by timestamp when they 184 CHAPTER 7. SOME COVERED RELATED TOPICS moved to the offered state (i.e., work item dimension y). Finally, the method considers the first h work items in such a sorted listed. We foresee an initial filtering, discarding all work items that were offered more than limitDays days ago. If this filtering was not performed, the sorting operation could be computationally hard, as it could involve thousands of work items. Therefore, filtering is meant to reduce the size of the set to be sorted. 7.2.9 Example: Emergency Management In this section we are going to illustrate a number of features of the visualisation framework by considering a potential scenario from emergency management. This scenario stems from a user requirement analysis conducted in the context of WORKPAD [23]. Teams are sent to an area to make an assessment of the aftermath of an earthquake. Team members are equipped with a laptop and their work is coordinated through the use of a PMS. The main process of workflow for assessing buildings is named Disaster Management. The first task Assess the affected area represents a quick onthe-spot inspection to determine damage to buildings, monuments and objects. For each object identified as worthy of further examination an instance of the sub-process Assess every sensible object (of which we do not show the actual decomposition for space reasons) is started as part of which a questionnaire is filled in and photos are taken. This can be an iterative process as an evaluation is conducted to determine whether the questionnaire requires further refinement or more photos need to be taken. After these assessments have finished, the task Send data to the headquarters can start which involves the collection of all questionnaires and photos and their subsequent dispatch to headquarters. This information is used to determine whether these objects are in imminent danger of collapsing and if so, whether this can be prevented and how that can be achieved. Depending on this outcome a decision is made to destroy the object or to try and restore it. For the purposes of illustrating our framework we assume that an earthquake has occurred in the city of Brisbane. Hence a number of cases are started by instantiating the Disaster Management workflow described above. Each case deals with the activities of an inspection teams in a specific zone. Figure 7.10 shows three diagrams. In each diagram, the dots refer to work items. Figure 7.10(a) shows the main process of the Disaster Management workflow, including eight work items. Dots for work items which are instances of the tasks Assess the affected area and Send data to the headquarter are placed on top of these tasks in this figure. Figure 7.10(b) shows the decomposition of Assess every sensible object. Here also eight work items are shown. No resources are shown in these diagrams. Note that on the left-hand side is shown a list of work items that are not on the diagram. For example, 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 185 (a) Disaster Management process diagram showing 4+4=8 work items (b) Assess the affected area sub-net also showing 8 work items (c) Example of a timeline diagram for showing 11 work items Figure 7.10: Examples of Process and Timeline Diagrams for Disaster Management 186 CHAPTER 7. SOME COVERED RELATED TOPICS the eight work items shown in the diagram in Figure 7.10(a) appear in the list of “other work items” in Figure 7.10(b). Figure 7.10(a) uses the urgency distance metric to highlight urgent cases while Figure 7.10(b) uses the familiarity metric to highlight cases closer to the user in terms of earlier experiences. As another illustration consider Figure 7.10(c) where work items are positioned according to their deadlines. This can be an important view in the context of disaster management where saving minutes may save lives. In the diagram shown, the x-axis represents the time remaining before a work item expires, while the y-axis represents the case number of the case the work item belongs to. A work item is placed at location (100 + 2 ∗ x e, 10 + 4 ∗ ye) on that diagram, if x e minutes are remaining to the deadline of the work item and its case number is ye. In this example, work items are coloured in accordance with the popularity distance metric. Figures 7.11 and 7.12 show some screenshots of a geographical map of the city of Brisbane. On these diagrams, work items are placed at the location where they should be executed. If their locations are so close that their corresponding dots overlap, a larger dot (i.e., a joint-dot) is used to represent the work items involved and the number inside corresponds to the number of these items. The green triangle is a representation of the resource whose work list is visualised here. Work items for tasks Assess the affected area and Send data to the headquarters are not shown on the diagram as they can be performed anywhere. In this example, dots are coloured according to the familiarity distance metric. A dot that is selected as focus obtains a blue colour and further information about the corresponding work item is shown at the bottom of the screen (as is the case for work item Take Photos 4 in Figure 7.11(b)). One can click on a dot and see the positions of other resources that have been offered the corresponding work item. For example, by clicking on the dot representing the work item Take photo 4, other resources, represented by triangles, are shown (see Figure 7.11(b)). As for work items, overlapping triangles representing resources are combined. For examples, the larger triangle shown in Figure 7.11(b) represents two resources. Figure 7.12(a) shows the screen shot after clicking on the joint triangle. A resource can thus see the list of resources associated with this triangle. By selecting one of the resources shown in the list, the work items offered to that resource can be seen. The colour of these work items is determined by their value for the chosen distance metric. A zooming feature is also provided. Figure 7.12(b) shows the result of zooming in a bit further on the diagram of Figure 7.12(a). As can be seen, no dots nor any triangles are overlapping anymore. This run-time behaviour stems for some steps made by people responsible of designing work-list visualisation through the Visualisation Designer tool. 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 187 (a) Diagram showing the geographic locations of work items and resources: the triangle represents the resource and most work items are shown as single dots except for the two work items that are clustered into a single dot labeled “2” (b) Information about the selected dot (blue dot) is shown and also other resources are shown Figure 7.11: Examples of Geographic Diagrams for Disaster Management. 188 CHAPTER 7. SOME COVERED RELATED TOPICS (a) When a triangle is selected, the corresponding resources and offered work items are shown (b) When zooming in, clustered work items and resources are separated Figure 7.12: Further examples of Geographic Diagrams for Disaster Management. 7.2. VISUAL SUPPORT FOR WORK ASSIGNMENT IN PMS 189 (a) Assess the affected area sub-net also showing 8 work items (b) Disaster Management process diagram showing 4+4=8 work items Figure 7.13: Uses of the Visualisation Designer tp specify the task positions on diagrams. 190 CHAPTER 7. SOME COVERED RELATED TOPICS As already said, such tool allows to add and remove diagrams valuable for participants as well as to specify the position of tasks on such diagrams. Figure 7.13(b) shows an example of how to specify dynamically task positions. A responsible person opens the YAWL process specification which she is willing to specify the position of the tasks of. This results in a new window is opened (Task List window in Figure 7.13(b)), which comprises all tasks existing in the specification. At this point, users can drag and drop tasks on the defined maps. In this way, users are specifying static positions for tasks. Users can define dynamic positions through a specific window (Insert Position for Task XYZ window in Figure 7.13(b)). It allows to specify some XQueries for defining the X and Y component of the point where the corresponding task should be positioned. These XQueries are defined on the process instance variables and computed at run-time. Figure 7.13(b) depicts another example to define the position of tasks on a process diagram. Specifically, the user is dragging and dropping the desired (static) position for task Assess the affected area. 7.2.10 Final Remarks We have proposed a general visualisation that can aid users in selecting the “right” work item among a potentially large number of work items offered to them. The framework uses the “diagram metaphor” to show the locations of work items and resources. The “distance metaphor” is used to show which work items are “close” (e.g., urgent, similar to earlier work items, or geographically close). Both concepts are orthogonal and this provides a great deal of flexibility when it comes to presenting work to people. For example, one can choose a geographical map to display work items and resources and use a distance metric capturing urgency. The proposed framework was operationalised as a component of the YAWL environment. By using well-defined interfaces the component is generic so that in principle it could be exploited by other PMSs as well under the provision that they are sufficiently “open” and provide the required interface methods. The component is also highly configurable, e.g., it allows resources to choose how distances should be computed for dots representing a number of work items and provides customizable support for determining which resources should be visible. Future work on this concern may go in three directions: 1. Connecting this framework and its implementation to SmartPM. The current implementation works in concert with YAWL and a porting to SmartPM is planned. Although that would require SmartPM to provide all information needed, the framework is independent of any specific Process Management System. 2. Connecting the current framework to geographical information systems 7.3. A SUMMARY 191 and process mining tools like ProM [135]. 3. Geographical information systems store data based on locations and process mining can be used to extract data from event logs and visualise this on diagrams, e.g., it is possible to make a “movie” showing the evolution of work items based on historic data. 7.3 A summary This chapter has introduced some research topicsrelated to the process management in pervasive scenarios. The first deals with the problem of synthesizing a process schema according to the available services and distributing the orchestration among all of them. The second touches the topic of supporting process participants when choosing the next task to work on among the several ones they can be offered to. This second topic is fully available in a workflow product, specifically YAWL. 192 CHAPTER 7. SOME COVERED RELATED TOPICS Chapter 8 Conclusion The topic of this thesis work was directed to process management for highly dynamic and pervasive scenarios. Examples of pervasive scenarios include emergency management, health-care or home automation. These scenarios are characterised by processes that are as complex as the traditional ones of business domains (e.g., loans, insurances). Therefore, the use of Process Management Systems is indicated and very helpful. Unfortunately, most of existing PMSs are intended for business scenarios and are not completely appropriate for the settings in which we are interested. Indeed, pervasive scenarios are turbulent a subject to an higher frequency of unexpected contingencies with respect to business settings, where the environment is mostly static and shows a foreseeable behaviour. Therefore, PMSs for pervasive scenarios should provide a very high degree of operational flexibility/adaptability. In this thesis, we have given a new definition of adaptability suitable for our intends in terms of gap. Adaptability is the ability of the system to reduce the gap between the virtual reality, the model of the reality used to deliberate, and the physical reality, the real-world state with the actual values of conditions and outcomes. When the gap is so significant that the executing process cannot be carried out, the PMS should be able to build a proper recovery plan able to reduce such a gap so as to allow the process to complete. This thesis work proposes some techniques and frameworks to devise a general recovery method able to handle any kind of exogenous event, including those which were unforeseen. When doing that, we encountered main challenges in two directions: (i) conceiving an execution monitor able to determine when exogenous events occur and when they do not allow running processes to terminate successfully; (ii) devising a recovery planner able to build a plan to allow the original process to terminate successfully. For this aim, we have “borrowed” techniques from AI, such as Situation Calculus, In193 194 CHAPTER 8. CONCLUSION diGolog as well as Execution Monitoring in agent and robot programming. We have applied such techniques to a different field, which required a significant effort to conceptualize and formalize. In order to show the feasibility of such techniques, we have conceived and developed a proof-of-concept implementation called SmartPM by using an IndiGolog platform available. In order to make it usable in many pervasive scenarios, such as emergency management, SmartPM needs to work in settings based on Mobile Ad-hoc Networks (manets). In order to make that possible, we had to do some research work on topics related to mobile networking. Specifically, we developed a manet layer to enable the multi-hop communication as well as we conceived and developed a specific technique to predict device disconnections before the actual occurrence (so as to be able to recover on time). The next step on which we are currently working is to overcome the intrinsical planning inefficiency of Prolog by making use of efficient state-of-the-art planners. There are a number of future research directions that arise from this thesis, and we have explained them in detail throughout this thesis itself. But, we are willing to summarize here the most relevant ones: 1. Working on integrating SmartPM with state-of-art planners in order to overcome the intrinsical planning inefficiency of Prolog. This step is anything but not easy and a lot of research is still ongoing. The most challenging issue is to convert Action Theories and IndiGolog programs in a way they can be given as input to planners (e.g., converting to PDDL). 2. Operationalizing the approach described in Chapter 6, more efficient, and integrating it with the framework currently implemented. Indeed, the idea would be that SmartPM should be able to understand process by process when the more efficient approach is applicable and, if not applicable, it should continue using the current implemented approach. 3. Providing SmartPM with full-fledged work-list handlers to facilitate the task distribution to human participants. We envision two two types of work-list handler: a version for ultra mobile devices and a lighter version for PDAs, “compact” but providing less features. First steps have been already done in these directions. The version for ultra mobile has been currently operationalized for a different PMS (see Section 7.2). The same holds also for the PDA version that has been currently developed during this thesis in the ROME4EU Process Management System, a previous valuable attempt to deal with unexpected deviations (see [7]). 4. Working on moving the central SmartPM engine to a distributed approach, where every device contributes in the coordination of the process 195 execution. A first evaluation has been always done from a theoretical perspective (see Section 7.1); it is not to concretely develop it and fit it with the adaptability approach of SmartPM. 196 CHAPTER 8. CONCLUSION Appendix A The IndiGolog Code of the Running Example This appendix lists the code of the running example shown and discussed in Chapters 4 and 5. 1 2 3 4 5 6 7 8 9 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % FILE: aPMS/pms.pl % % AUTHOR : Massimiliano de Leoni, Andrea Marrella, % Sebastian Sardina, Stefano Valentini % TESTED : SWI Prolog 5.id_.1id_ http://www.swi-prolog.org % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 10 11 :- dynamic controller/1. 12 13 14 /* SOME DOMAIN-INDEPENDENT PREDICATES TO DENOTE THE VARIOUS OBJECTS OF INTEREST IN THE FRAMEWORK */ 15 16 17 18 /* Available services */ services([1,2,3,4,5]). service(Srvc) :- domain(Srvc,services). 19 20 21 22 /* Tasks defined in the process specification */ tasks([TakePhoto,EvaluatePhoto,CompileQuest,Go,SendByGPRS]). task(Task) :- domain(Task,tasks). 23 24 25 26 /* Capabilities relevant for the process of interest*/ capabilities([camera,compile,gprs,evaluation]). capability(B) :- domain(B,capabilities). 27 197 198 28 29 30 31 32 APPENDIX A. THE CODE OF THE RUNNING EXAMPLE /* The list of identifiers that may be used to distinguish different istances of the same task */ task_identifiers([id_1,id_2,id_3,id_4,id_5,id_6,id_7,id_8,id_9,id_10, id_11,id_12,id_13]). id(Id) :- domain(Id,task_identifiers). 33 34 35 36 37 38 /* The capabilities required for each task */ required(TakePhoto,camera). required(EvaluatePhoto,evaluation). required(CompileQuest,compile). required(SendByGPRS,gprs). 39 40 /The capabilities provided by each service */ 41 42 43 44 45 46 47 48 49 50 provide(1,gprs). provide(1,evaluation). provide(2,compile). provide(2,evaluation). provide(2,camera). provide(3,compile). provide(4,evaluation). provide(4,camera). provide(5,compile). 51 52 53 54 /* There is nothing to do caching on (required because cache 1 is static) */ cache(_):-fail. 55 56 57 58 59 60 61 62 /* Definition of predicate loc(i,j) identifying the current location of a service */ gridsize(10). gridindex(V) :gridsize(S), get_integer(0,V,S). location(loc(I,J)) :- gridindex(I), gridindex(J). 63 64 65 /*The definition of integer numbers number(Srvc,M) :- get_integer(0,Srvc,M). 66 67 68 /* square(X,Y): Y is the square of X */ square(X,Y) :- Y is X * X. 69 70 71 72 73 74 /* member(ELEM,LIST): returns true if ELEM is contained in LIST */ member(ELEM,[HEAD|_]) :- ELEM=HEAD. member(ELEM,[_|TAIL]) :- member(ELEM,TAIL). listEqual(L1,L2) :- subset(L1,L2),subset(L2,L1). 199 75 76 77 78 79 /* Definition of predicate workitem(Task,Id,I). It identifies a task Task with id Id and input I */ listelem(workitem(Task,Id,I)) :- id(Id), location(I), member(Task,[Go,CompileQuest,EvaluatePhoto,TakePhoto]). listelem(workitem(SendByGPRS,Id,input)) :- id(Id). 80 81 82 worklist([]). worklist([ELEM | TAIL]) :- worklist(TAIL),listelem(ELEM). 83 84 /* DOMAIN-INDEPENDENT FLUENTS */ 85 86 87 88 89 90 /* Basically, there has to be some definition for predicates causes_true and causes_false, at least one for each. We have added the following dummy code: */ causes_true(_,_,_) :- false. causes_false(_,_,_) :- false. 91 92 93 94 /* Indicates that list LWrk of workitems has been assigned to service Srvc */ rel_fluent(assigned(LWrk,Srvc)) :- worklist(LWrk),service(Srvc). 95 96 97 /* assigned(LWrk,Srvc) holds after action assign(LWrk,Srvc) */ causes_val(assign(LWrk,Srvc),assigned(LWrk,Srvc),true,true). 98 99 100 101 /* assigned(LWrk,Id,Srvc) holds no longer after action release(LWrk,Srvc) */ causes_val(release(LWrk,Srvc),assigned(LWrk,Srvc),false,true). 102 103 104 105 /* Indicates that task Task with id Id has been begun by service Srvc */ rel_fluent(enabled(Task,Id,Srvc)) :- task(Task), service(Srvc), id(Id). 106 107 108 109 110 /* enabled(Task,Id,Srvc) becomes true if the service Srvc calls the exogenous action readyToStart((Task,Id,Srvc), indicating the starting of the task Task with id Id */ causes_val(readyToStart((Task,Id,Srvc),enabled(Task,Id,Srvc),true,true). 111 112 113 114 115 /* enabled(Task,Id,Srvc) holds no longer after service Srvc calls exogenous action finishedTask(Task,Id,Srvc,V)*/ causes_val(finishedTask(Task,Id,Srvc,_), enabled(Task,Id,Srvc),false,true). 116 117 118 119 /* free(Srvc) indicates that service Srvc has no task currently assigned */ rel_fluent(free(Srvc)) :- service(Srvc). 120 121 /* free(Srvc) holds after action release(LWrk,Srvc) */ 200 122 APPENDIX A. THE CODE OF THE RUNNING EXAMPLE causes_val(release(_X,Srvc),free(Srvc),true,true). 123 124 125 /* free(Srvc) holds no longer after action assign(LWrk,Srvc) */ causes_val(assign(_LWrk,Srvc),free(Srvc),false,true). 126 127 /* ACTIONS and PRECONDITIONS */ 128 129 130 prim_action(assign(LWrk,Srvc)) :- worklist(LWrk),service(Srvc). poss(assign(LWrk,Srvc), true). 131 132 133 134 prim_action(ackTaskCompletion(Task,Id,Srvc)) :task(Task), service(Srvc), id(Id). poss(ackTaskCompletion(Task,Id,Srvc), neg(enabled(Task,Id,Srvc))). 135 136 137 138 139 140 prim_action(start(Task,Id,Srvc,I)) :listelem(workitem(Task,Id,I)), service(Srvc). poss(start(Task,Id,Srvc,I), and(enabled(Task,Id,Srvc), and(assigned(LWrk,Srvc), member(workitem(Task,Id,I),LWrk)))). 141 142 143 prim_action(release(LWrk,Srvc)) :- worklist(LWrk),service(Srvc). poss(release(LWrk,Srvc), true). 144 145 /* DOMAIN-DEPENDENT FLUENTS */ 146 147 148 149 150 /* at(Srvc) indicates that service Srvc is in position P */ fun_fluent(at(Srvc)) :- service(Srvc). causes_val(finishedTask(Task,Id,Srvc,V),at(Srvc), loc(I,J),and(Task=Go,V=loc(I,J))). 151 152 153 154 155 156 157 rel_fluent(evaluationOK(Loc)) :- location(Loc). causes_val(finishedTask(Task,Id,Srvc,V),evaluationOK(loc(I,J)), true, and(Task=EvaluatePhoto, and(V=(loc(I,J),OK), and(photoBuild(loc(I,J),N), N>3)))). 158 159 160 161 162 163 164 fun_fluent(photoBuild(Loc)) :- location(Loc). causes_val(finishedTask(Task,Id,Srvc,V),photoBuild(Loc),N, and(Task=TakePhoto, and(V=(loc(I,J),Nadd), and(Nold=photoBuild(Loc), N is Nold+Nadd)))). 165 166 167 168 rel_fluent(infoSent()). causes_val(finishedTask(Task,Id,Srvc,V),infoSent, true, and(Task=SendByGPRS,V=OK). 201 169 170 proc(hasConnection(Srvc),hasConnectionHelper(Srvc,[Srvc])). 171 172 173 174 175 176 177 178 proc(hasConnectionHelper(Srvc,M), or(neigh(Srvc,1), some(n, and(service(n), and(neg(member(n,M)), and(neigh(n,Srvc), hasConnectionHelper(n,[n|M]))))))). 179 180 181 182 183 184 185 186 187 188 189 190 proc(neigh(Srvc1,Srvc2), some(x1, some(x2, some(y1, some(y2, some(k1, some(k2, and(at(Srvc1)=loc(x1,y1), and(at(Srvc2)=loc(x2,y2), and(square(x1-x2,k1), and(square(y1-y2,k2),sqrt(k1+k2)<7))))))))))). 191 192 /* INITIAL STATE: */ 193 194 initially(free(Srvc),true) :- service(Srvc). 195 196 197 198 /*All services are at coodinate (0,0) initially(at(Srvc),loc(0,0)) :- service(Srvc). initially(at_prev(Srvc),0) :- service(Srvc). 199 200 201 initially(photoBuild(Loc),0) :- location(Loc). initially(photoBuild_prev(Loc),0) :- location(Loc). 202 203 204 initially(evaluationOK(Loc),false) :- location(Loc). initially(evaluationOK_prev(Loc),false) :- location(Loc). 205 206 207 initially(infoSent(),false). initially(infoSent_prev(),false). 208 209 210 initially(enabled(X,Id,Srvc),false) :- task(X), service(Srvc), id(Id). initially(assigned(X,Srvc),false) :- task(X), service(Srvc), id(Id). 211 212 213 initially(evaluate,false). initially(finished,false). 214 215 /* ACTIONS EXECUTED BY SERVICES */ 202 APPENDIX A. THE CODE OF THE RUNNING EXAMPLE 216 217 218 219 220 exog_action(readyToStart((T,Id,Srvc)) :- task(T), service(Srvc), id(Id). exog_action(finishedTask(T,Id,Srvc,_V)) :- task(T), service(Srvc), id(Id). 221 222 /* PREDICATES AND ACTIONS FOR MONITORING ADAPTATION */ 223 224 225 exog_action(disconnect(Srvc,loc(I,J))) :- service(Srvc), gridindex(I), gridindex(J). 226 227 228 229 /* at(Srvc) assumes the value loc(I,J) after exogenous action disconnect(Srvc,loc(I,J))*/ causes_val(disconnect(Srvc,loc(I,J)),at(Srvc),loc(I,J),true). 230 231 232 prim_action(A) :- exog_action(A). poss(A,true) :- exog_action(A). 233 234 causes_val(disconnect(Srvc,L),exogenous,true,true). 235 236 237 /* Fluents in the previous situation */ fun_fluent(at_prev(Srvc)) :- service(Srvc). 238 239 fun_fluent(photoBuild_prev(Loc)) :- location(Loc). 240 241 fun_fluent(evaluationOK_prev(Loc)) :- location(Loc). 242 243 fun_fluent(infoSent_prev()). 244 245 246 247 248 249 250 251 252 causes_val(disconnect(_,_),at_prev(Srvc),X,at(Srvc)=X) :- service(Srvc). causes_val(disconnect(_,_), photoBuild_prev(Srvc),X,photoBuild(Loc)=X) :- location(Loc). causes_val(disconnect(_,_), evaluationOK_prev(Loc),X,evaluationOK(Loc)=X) :- location(Loc). causes_val(disconnect(_,_), infoSent_prev(),X,infoSent()=X) :- service(Srvc). 253 254 proc(hasConnection_prev(Srvc),hasConnectionHelper_prev(Srvc,[Srvc])). 255 256 257 258 259 260 proc(hasConnectionHelper_prev(Srvc,M), or(neigh_prev(Srvc,1), some(n, and(service(n),and(neg(member(n,M)), and(neigh_prev(n,Srvc),hasConnectionHelper_prev(n,[n|M]))))))). 261 262 proc(neigh_prev(Srvc1,Srvc2), 203 263 264 265 266 267 268 269 270 271 272 some(x1, some(x2, some(y1, some(y2, some(k1, some(k2, and(at_prev(Srvc1)=loc(x1,y1), and(at_prev(Srvc2)=loc(x2,y2), and(square(x1-x2,k1), and(square(y1-y2,k2),sqrt(k1+k2)<7))))))))))). 273 274 /* ADAPTATION DOMAIN-INDEPENDENT FEATURES */ 275 276 277 prim_action(finish). poss(finish,true). 278 279 280 rel_fluent(finished). causes_val(finish,finished,true,true). 281 282 283 rel_fluent(exogenous). initially(exogenous,false). 284 285 rel_fluent(adapted). 286 287 288 prim_action(resetExo). poss(resetExo,true). 289 290 291 292 causes_val(resetExo,exogenous,false,true). causes_val(adaptStart,adapted,false,true). causes_val(adaptFinish,adapted,true,true). 293 294 295 296 297 prim_action(adaptFinish). poss(adaptFinish,true). prim_action(adaptStart). poss(adaptStart,true). 298 299 fun_fluent(photoBuild_prev(Loc)) :- location(Loc). 300 301 fun_fluent(evaluationOK_prev(Loc)) :- location(Loc). 302 303 fun_fluent(infoSent_prev()). 304 305 306 307 308 309 proc(relevant, and(some(Srvc,and(service(Srvc), and(hasConnection_prev(Srvc), neg(hasConnection(Srvc))))), 204 and(some(Loc,and(location(Srvc), and(photoBuild_prev(Srvc)=Y, neg(photoBuild(Srvc)=Y)))), and(some(Loc,and(location(Srvc), and(evaluationOK_prev(Srvc)=Z, neg(evaluationOK(Srvc)=Z)))), and(infoSent=W,neg(infoSent=W))))) 310 311 312 313 314 315 316 317 APPENDIX A. THE CODE OF THE RUNNING EXAMPLE ). 318 319 proc(goalReached,neg(relevant)). 320 321 322 323 324 325 proc(adapt, [adaptStart, ?(writeln(’about to adapt’)), pconc([adaptingProgram, adaptFinish], while(neg(adapted), [?(writeln(’waiting’)),wait])) ] ). 326 327 328 329 330 331 332 333 334 proc(adaptingProgram, searchn([?(true),searchProgram], [assumptions([ [ assign([workitem(Task,Id,_I)],Srvc), readyToStart((Task,Id,Srvc) ], start(Task,Id,Srvc,I), finishedTask(Task,Id,Srvc,I) ] ]) ]) ). 335 336 337 338 339 340 341 342 343 344 345 proc(searchProgram, [star(pi([t,i,n], [ ?(isPickable([workitem(t,id_30,i)],n)), assign([workitem(t,id_30,i)],n), start(t,id_30,n,i), ackTaskCompletion(t,id_30,n), release([workitem(t,id_30,i)],n) ] ), 10), ?(goalReached)] ). 346 347 348 /* ABBREVIATIONS - BOOLEAN FUNCTIONS */ 349 350 351 352 353 354 355 356 proc(isPickable(WrkList,Srvc), or(WrkList=[], and(free(Srvc), and(WrkList=[A|TAIL], and(listelem(A), and(A=workitem(Task,_Id,_I), and(isExecutable(Task,Srvc), 205 isPickable(TAIL,Srvc)))))) 357 ) 358 359 ). 360 361 362 363 proc(isExecutable(Task,Srvc), and(findall(Capability,required(Task,Capability),A), and(findall(Capability,provide(Srvc,Capability),C),subset(A,C)))). 364 365 366 % Translations of domain actions to real actions (one-to-one) actionNum(X,X). 367 368 /* PROCEDURES FOR HANDLING THE TASK LIFE CYCLES */ 369 370 371 proc(manageAssignment(WrkList), [atomic([pi(n,[?(isPickable(WrkList,n)), assign(WrkList,n)])])]). 372 373 374 proc(manageExecution(WrkList), pi(n,[?(assigned(WrkList,n)=true),manageExecutionHelper(WrkList,n)])). 375 376 proc(manageExecutionHelper([],Srvc),[]). 377 378 379 380 proc(manageExecutionHelper([workitem(Task,Id,I)|TAIL],Srvc), [start(Task,Id,Srvc,I), ackTaskCompletion(Task,Id,Srvc), manageExecutionHelper(TAIL,Srvc)]). 381 382 383 proc(manageTermination(WrkList), [atomic([pi(n,[?(assigned(WrkList,n)=true), release(X,n)])])]). 384 385 386 387 388 proc(manageTask(WrkList), [manageAssignment(WrkList), manageExecution(WrkList), manageTermination(WrkList)]). 389 390 391 /* MAIN PROCEDURE FOR INDIGOLOG */ 392 393 394 proc(main, proc(main, mainControl(N)) :- controller(N), !. mainControl(3)). % default one 395 396 397 398 399 400 proc(mainControl(5), prioritized_interrupts( [interrupt(and(neg(finished),exogenous), monitor), interrupt(true, [process,finish]), interrupt(neg(finished), wait) ])). 401 402 403 proc(monitor,[?(writeln(’Monitor’)), 206 404 405 406 407 408 APPENDIX A. THE CODE OF THE RUNNING EXAMPLE ndet( [?(neg(relevant)),?(writeln(’NonRelevant’))], [?(relevant),?(writeln(’Relevant’)),adapt] ), resetExo ]). 409 410 411 412 413 414 415 416 417 418 419 proc(branch(Loc), while(neg(evaluationOk(Loc)), [ manageTask([workitem(CompileQuest,id_1,Loc)]), manageTask([workitem(Go,id_1,Loc), Workitem(TakePhoto,id_2,Loc)]), manageTask([workitem(EvaluatePhoto,id_1,Loc)]), ] ) ). 420 421 422 423 424 425 proc(process, [rrobin([branch(loc(2,2),branch(loc(3,5)),branch(loc(4,4)))]), manageTask([workitem(SendByGPRS,id_29,input)]) ] ). 426 427 428 429 430 % Translations of domain actions to real actions (one-to-one) actionNum(X,X). Bibliography [1] M. Adams, A. H. M. ter Hofstede, W. M. P. van der Aalst, and D. Edmond. Dynamic, extensible and context-aware exception handling for workflows. In On the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS Proceedings, Part I, volume 4803 of Lecture Notes in Computer Science, pages 95–112. Springer, 2007. [2] I. Akyildiz, J. S. M. Ho, and Y. B. Lin. Movement-based Location Update and Selective Paging for PCS Networks. IEEE/ACM Transactions on Networking, 4(4):629–638, 1996. [3] K. Andresen and N. Gronau. An Approach to Increase Adaptability in ERP Systems. In Managing Modern Organizations with Information Technology: Proceedings of the 2005 Information Resources Management Association International Conference, pages 883–885. Idea Group Publishing, May 2005. [4] J. Baier and S. McIlraith. On Planning with Programs that Sense. In KR’06: Proceedings of the 10th International Conference on Principles of Knowledge Representation and Reasoning, pages 492–502, Lake District, UK, June 2006. AAAI Press. [5] J. A. Baier, C. Fritz, and S. A. McIlraith. Exploiting Procedural Domain Control Knowledge in State-of-the-Art Planners. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS), pages 26–33. AAAI Press, 2007. [6] S. Basagni, I. Chlamtac, V. R. Syrotiuk, and B. A. Woodward. A distance routing effect algorithm for mobility (DREAM). In MobiCom ’98: Proceedings of the 4th annual ACM/IEEE international conference on Mobile computing and networking, pages 76–84. ACM, 1998. [7] D. Battista, M. de Leoni, A. Gaetanis, M. Mecella, A. Pezzullo, A. Russo, and C. Saponaro. ROME4EU: A Web Service-Based ProcessAware System for Smart Devices. In ICSOC ’08: Proceedings of the 6th 207 208 BIBLIOGRAPHY International Conference on Service-Oriented Computing, pages 726– 727. Springer-Verlag, 2008. [8] B. Benatallah, M. Dumas, and Q. Sheng. Facilitating the rapid development and scalable orchestration of composite web services. Distributed and Parallel Databases, 17, 2005. [9] D. Berardi, D. Calvanese, G. De Giacomo, R. Hull, and M. Mecella. Automatic composition of transition-based semantic web services with messaging. In Proc. VLDB 2005, 2005. [10] D. Berardi, D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Mecella. Automatic Service Composition Based on Behavioural Descriptions. International Journal of Cooperative Information Systems, 14(4):333–376, 2005. [11] D. Berardi, D. Calvanese, G. De Giacomo, and M. Mecella. Composing web services with nondeterministic behavior. In Proc. ICWS 2006, 2006. [12] P. Berens. Process-Aware Information Systems, chapter Case handling with FLOWer: Beyond workflow. John Wiley & Sons, 2005. [13] J. O. Berger. Springer, 1985. Statistical Decision Theory and Bayesian Analysis. [14] G. Bertelli, M. de Leoni, M. Mecella, and J. Dean. Mobile Ad hoc Networks for Collaborative and Mission-critical Mobile Scenarios: a Practical Study. In WETICE’08: Proceedings of the 17th IEE International Workshops on Enabling Technologies: Infrastructure for collaboration enterprises, pages 157–152. IEEE Publisher, 2008. [15] R. Bobrik, M. Reichert, and T. Bauer. View-based process visualization. In Proceedings of the 5th International Conference on Business Process Management BPM 2007, volume 4714 of LNCS, pages 88–95. Springer, 2007. [16] A. Borgida and T. Murata. Tolerating exceptions in workflows: a unified framework for data and processes. In WACC ’99: Proceedings of the international joint conference on Work activities coordination and collaboration, pages 59–68. ACM, 1999. [17] R. Brown and H.-Y. Paik. Resource-centric worklist visualisation. In Proceedings of OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2005, volume 3760 of LNCS, pages 94–111. Springer, 2005. BIBLIOGRAPHY 209 [18] D. Calvanese, G. De Giacomo, M. Lenzerini, M. Mecella, and F. Patrizi. Automatic Service Composition and Synthesis: the Roman Model. IEEE Data Engineering Bulletin, 31(3):18–22, 2008. [19] F. Casati, S. Ceri, B. Pernici, and G. Pozzi. Workflow Evolution. Data & Knowledge Engineering, 24(3):211–238, 1998. [20] F. Casati, S. Ilnicki, L. jie Jin, V. Krishnamoorthy, and M.-C. Shan. Adaptive and Dynamic Service Composition in eFlow. In CAiSE2000: Proceedings of 12th International Conference Advanced Information Systems Engineering, volume 1789 of Lecture Notes in Computer Science, pages 13–31. Springer, 2000. [21] T. Catarci, F. Cincotti, M. de Leoni, M. Mecella, and G. Santucci. Smart homes for all: Collaborating services in a for-all architecture for domotics. In CollaborateCom’08: Proc. of The 4th International Conference on Collaborative Computing: Networking, Applications and Worksharing. ACM Press, 2009. To appear. [22] T. Catarci, M. de Leoni, F. De Rosa, M. Mecella, A. Poggi, S. Dustdar, L. Juszczyk, H. Truong, and G. Vetere. The WORKPAD P2P Service-Oriented Infrastracture for Emergency Management. In WETICE ’07: Proceedings of the 16th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, Washington, DC, USA, 2007. IEEE Computer Society. [23] T. Catarci, M. de Leoni, A. Marrella, M. Mecella, G. Vetere, B. Salvatore, S. Dustdar, L. Juszczyk, A. Manzoor, and H.-L. Truong. Pervasive Software Environments for Supporting Disaster Responses. IEEE Internet Computing, 12(1):26–37, 2008. [24] T. Catarci, F. De Rosa, M. de Leoni, M. Mecella, M. Angelaccio, S. Dustdar, A. Krek, G. Vetere, Z. M. Zalis, B. Gonzalvez, and G. Iiritano. WORKPAD: 2-Layered Peer-to-Peer for Emergency Management through Adaptive Processes. In CollaborateCom 2006: Proceedings of the 2nd International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE Computer Society, 2006. [25] G. Chafle, S. Chandra, V. Mann, and M. G. Nanda. Decentralized orchestration of composite web services. In Proc. WWW 2004 – Alternate Track Papers & Posters, 2004. [26] D. Chiu, Q. Li, , and K. Karlapalem. A logical framework for exception handling in ADOME workflow management system. In CAiSE2000: 210 BIBLIOGRAPHY Proceedings of 12th International Conference Advanced Information Systems Engineering, volume 1789 of Lecture Notes in Computer Science, pages 110–125. Springer, 2000. [27] Cosa GmbH. COSA BPM product description. http://www.cosa. de/project/docs/en/COSA57-Productdescription.pdf, July 2008. Prompted on 1 February, 2009. [28] F. D’Aprano, M. de Leoni, and M. Mecella. Emulating mobile ad-hoc networks of hand-held devices: the octopus virtual environment. In MobiEval ’07: Proceedings of the 1st international workshop on System evaluation for mobile platforms, pages 35–40, New York, NY, USA, 2007. ACM. [29] G. De Giacomo, Y. Lespérance, H. J. Levesque, and S. Sardina. IndiGolog: A High-Level Programming Language for Embedded Reasoning Agents, chapter in Multi-Agent Programming: Languages, Platforms and Applications. Rafael H. Bordini, Mehdi Dastani, Jürgen Dix, Amal El Fallah-Seghrouchni (Eds.). Springer, 2009. To appear. [30] G. De Giacomo and H. J. Levesque. An incremental interpreter for highlevel programs with sensing. In H. J. Levesque and F. Pirri, editors, Logical Foundations for Cognitive Agents: Contributions in Honor of Ray Reiter, pages 86–102. Springer, Berlin, 1999. [31] G. De Giacomo, R. Reiter, and M. Soutchanski. Execution Monitoring of High-Level Robot Programs. In KR’98: Proceedings of the Sixth International Conference on Principles of Knowledge Representation and Reasoning, pages 453–465, 1998. [32] G. De Giacomo and S. Sardina. Automatic synthesis of new behaviors from a library of available behaviors. In IJCAI’07: Proceedings of 20th International Joint Conference on Artificial Intelligence, pages 1866– 1871, Hyderabad, India, 2007. [33] M. de Leoni, G. De Giacomo, Y. Lespérance, and M. Mecella. On-line Adaptation of Sequential Mobile Processes Running Concurrently. In SAC ’09: Proceedings of the 2009 ACM Symposium on Applied Computing. ACM Press, 2009. To appear. [34] M. de Leoni, F. De Rosa, S. Dustdar, and M. Mecella. Resource disconnection management in MANET driven by process time plan. In Autonomics ’07: Proceedings of the 1st ACM/ICST International Conference on Autonomic Computing and Communication Systems. ACM, 2007. BIBLIOGRAPHY 211 [35] M. de Leoni, F. De Rosa, A. Marrella, A. Poggi, A. Krek, and F. Manti. Emergency Management: from User Requirements to a Flexible P2P Architecture. In B. Van de Walle, P. Burghardt, and C. Nieuwenhuis, editors, Proceedings of the 4th International Conference on Information Systems for Crisis Response and Management ISCRAM2007, 2007. [36] M. de Leoni, F. De Rosa, and M. Mecella. MOBIDIS: A Pervasive Architecture for Emergency Management. In WETICE ’06: Proceedings of the 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, pages 107–112. IEEE Computer Society, 2006. [37] M. de Leoni, S. Dustdar, and A. H. M. ter Hofstede. Introduction to the 1st International Workshop on Process Management for Highly Dynamic and Pervasive Scenarios (PM4HDPS’08). In BPM 2008 Workshops, volume 17 of LNBIP, pages 241–243. Springer-Verlag, 2009. [38] M. de Leoni, S. R. Humayoun, M. Mecella, and R. Russo. A Bayesian Approach for Disconnection Management in Mobile Ad Hoc Networks. Ubiquitous Computing and Communication Journal, CPE, March 2008. [39] M. de Leoni, A. Marrella, M. Mecella, S. Valentini, and S. Sardina. Coordinating Mobile Actors in Pervasive and Mobile Scenarios: An AI-based Approach. In WETICE’08: Proceedings of the 17th IEE International Workshops on Enabling Technologies: Infrastructure for collaboration enterprises, pages 82–88. IEEE Publisher, 2008. [40] M. de Leoni, M. Mecella, and G. De Giacomo. Highly Dynamic Adaptation in Process Management Systems Through Execution Monitoring. In BPM’07: Proceedings of the 5th Internation Conference on Business Process Management, volume 4714 of Lecture Notes in Computer Science, pages 182–197. Springer, 2007. [41] M. de Leoni, M. Mecella, and R. Russo. A bayesian approach for disconnection management in mobile ad hoc networks. In WETICE ’07: Proceedings of the 16th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, pages 62–67, Washington, DC, USA, 2007. IEEE Computer Society. [42] M. de Leoni, W. M. P. van der Aalst, and A. H. M. ter Hofstede. Visual Support for Work Assignment in Process-Aware Information Systems. In Proceedings of the 6th International Conference on Business Process Management (BPM’08), Milan, Italy, September 2-4, volume 5240 of Lecture Notes in Computer Science. Springer, 2008. 212 BIBLIOGRAPHY [43] J. Dehnert and P. Rittgen. Relaxed Soundness of Business Processes. In Proceedings of 19th International Conference on Advanced Information Systems Engineering, 19th International, volume 2068 of Lecture Notes in Computer Science, pages 157–170. Springer, 2001. [44] Y. Dong and Z. Shen-sheng. Approach for workflow modeling using pi-calculus. Journal of Zhejiang University SCIENCE, 4(6):643–650, November 2003. [45] O. V. Drugan, T. Plagemann, and E. Munthe-Kaas. Non-intrusive neighbor prediction in sparse manets. In SECON 2007: Proceedings of the Fourth Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, pages 172–182. IEEE, 2007. [46] M. Dumas, W. M. P. van der Aalst, and A. H. M. ter Hofstede. ProcessAware Information Systems: Bridging People and Software Through Process Technology. Wiley, September 2005. [47] W. Feler. An Introduction to Probability Theory and its Applications. Willey, 2nd edition, 1971. [48] J. Flynn, H. Tewari, D., and O’Mahony. Jemu: A Real Time Emulation System for Mobile Ad Hoc Networks. In Proceedings of 1st Joint IEI/IEE Symposium on Telecommunications Systems Research, 2001. [49] D. Fox, J. Hightower, L. Lao, D. Schulz, and G. Borriello. Bayesian Filters for Location Estimation. IEEE Pervasive Computing, 2(3):24 – 33, 2003. [50] M. Fox and D. Long. PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains. Journal of Artificial Intelligence Research (JAIR), 20:61–124, 2003. [51] C. Fritz, J. A. Baier, and S. A. McIlraith. ConGolog, Sin Trans: Compiling ConGolog into Basic Action Theories for Planning and Beyond. In KR2008: Proceedings of the Eleventh International Conference on Principles of Knowledge Representation and Reasoning, pages 600–610. AAAI Press, 2008. [52] M. Ghallab, D. Nau, and P. Traverso. Automated Planning: Theory and Practice. Morgan Kaufmann Publishers, May 2004. [53] G. D. Giacomo, M. de Leoni, M. Mecella, and F. Patrizi. Automatic Workflows Composition of Mobile Services. In ICWS’07: Proceedings of BIBLIOGRAPHY 213 the 2007 IEEE International Conference on Web Services, pages 823– 830. IEEE Computer Society, 2007. [54] K. Göser, M. Jurisch, H. Acker, U. Kreher, M. Lauer, S. Rinderle, M. Reichert, and P. Dadam. Next-generation Process Management with ADEPT2. In Proceedings of the BPM Demonstration Program at the Fifth International Conference on Business Process Management (BPM’07), volume 272 of CEUR Workshop Proceedings. CEUR-WS.org, 2007. [55] L. Guibas and J. Stolfi. Primitives for the manipulation of general subdivisions and the computation of voronoi. ACM Trans. Graph., 4(2):74– 123, 1985. [56] C. W. Günther, M. Reichert, and W. M. van der Aalst. Supporting Flexible Processes with Adaptive Workflow and Case Handling. In WETICE’08: Proceedings of the 17th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborative Enterprises, 2008. [57] P. Gupta and P. R. Kumar. The capacity of wireless networks. IEEE Transactions on Information Theory, IT-46(2):388–404, March 2000. [58] D. Hadaller, S. Keshav, T. Brecht, and S. Agarwal. Vehicular opportunistic communication under the microscope. In MobiSys ’07: Proceedings of the 5th international conference on Mobile systems, applications and services, pages 206–219. ACM Press, 2007. [59] C. Hagen and G. Alonso. Exception handling in workflow management systems. IEEE Transactions on Software Engineering, 26(10):943–958, October 2000. [60] G. Hansen. Automated Business Process Reengineering: Using the Power of Visual Simulation Strategies to Improve Performance and Profit. Prentice-Hall, Englewood Cliffs, 1997. [61] D. Harel, D. Kozen, and J. Tiuryn. Dynamic Logic. The MIT Press, 2000. [62] G. Harik, E. Cantú-Paz, D. E. Goldberg, and B. L. Miller. The gambler’s ruin problem, genetic algorithms, and the sizing of populations. Evolutionary Computation, 7(3):231–253, 1999. [63] S. L. Hickmott, J. Rintanen, S. Thiébaux, and L. B. White. Planning via petri net unfolding. In IJCAI 2007: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 1904–1911. AAAI Press, 2007. 214 BIBLIOGRAPHY [64] J. Hidders, M. Dumas, W. M. P. van der Aalst, A. H. M. ter Hofstede, and J. Verelst. When are two workflows the same? In CATS ’05: Proceedings of the 2005 Australasian symposium on Theory of computing, pages 3–11. Australian Computer Society, Inc., 2005. [65] R. Hull and J. Su. Tools for design of composite web services. In Proc. SIGMOD 2004, pages 958–961, 2004. [66] S. R. Humayoun, T. Catarci, M. de Leoni, A. Marrella, M. Mecella, M. Bortenschlager, and R. Steinmann. Designing Mobile Systems in Highly Dynamic Scenarios. The WORKPAD Methodology. Journal on Knowledge, Technology & Policy, 2009. To appear. [67] S. R. Humayoun, T. Catarci, M. de Leoni, A. Marrella, M. Mecella, M. Bortenschlager, and R. Steinmann. The workpad user interface and methodology: Developing smart and effective mobile applications for emergency operators. In HCI International 2009: Proceedings of 13th International Conference on Human-Computer Interaction, volume 5616 of Lecture Notes in Computer Science. Springer, 2009. [68] IBM Inc. An introduction to WebSphere Process Server and WebSphere Integration Developer. ftp://ftp.software.ibm.com/ software/integration/wps/library/WSW14021-US%EN-01.pdf, May 2008. Prompted on 1 February, 2009. [69] A. Jardosh, E. M. Belding-Royer, K. C. Almeroth, and S. Suri. Towards realistic mobility models for mobile ad hoc networks. In MobiCom ’03: Proceedings of the 9th annual international conference on Mobile computing and networking, pages 217–229. ACM Press, 2003. [70] K. Jensen. Coloured Petri Nets; Basic Concepts, Analysis Methods and Practical Use. Springer, 2nd edition, 1997. [71] D. B. Johnson, D. A. Maltz, and J. Broch. DSR: The Dynamic Source Routing Protocol for Multi-Hop Wireless Ad Hoc Networks. In C. E. Perkins, editor, Ad Hoc Networking, pages 139–172. Addison-Wesley, 2001. [72] B. Kiepuszewski, A. H. M. t. Hofstede, and C. Bussler. On Structured Workflow Modelling. In CAiSE ’00: Proceedings of the 12th International Conference on Advanced Information Systems Engineering, pages 431–445, London, UK, 2000. Springer-Verlag. [73] M. Kinateder. Sap advanced workflow techniques. https: //www.sdn.sap.com/irj/servlet/prt/portal/prtroot/docs/ BIBLIOGRAPHY library/uui%d/82d03e23-0a01-0010-b482-dccfe1c877c4, Prompted on 1 February, 2009. 215 2006. [74] E. Kindler. On the semantics of EPCs: Resolving the vicious circle. Data & Knowledge Engineering, 56(1):23–40, 2006. [75] Y.-B. Ko and N. H. Vaidya. Location-Aided Routing (LAR) in Mobile Ad hoc Networks. Wireless Networks, 6:307––321, 2000. [76] R. A. Kowalski. Using meta-logic to reconcile reactive with rational agents. Meta-logics and logic programming, pages 227–242, 1995. [77] A. Kumar, W. Aalst, and H. Verbeek. Dynamic Work Distribution in Workflow Management Systems: How to Balance Quality and Performance? Journal of Management Information Systems, 18(3):157–193, 2002. [78] O. Kupferman and M. Y. Vardi. Synthesizing distributed systems. In Proc. of LICS 2001, page 389, 2001. [79] B. Kusy, J. Sallai, G. Balogh, A. Ledeczi, V. Protopopescu, J. Tolliver, F. DeNap, and M. Parang. Radio interferometric tracking of mobile wireless nodes. In MobiSys ’07: Proceedings of the 5th international conference on Mobile systems, applications and services, pages 139–151. ACM Press, 2007. [80] U. Kuter, E. Sirin, D. Nau, B. Parsia, and J. Hendler. Information gathering during planning for web service composition. In Proc. Workshop on Planning and Scheduling for Web and Grid Services, 2004. [81] M. La Rosa, M. Dumas, A. H. M. ter Hofstede, J. Mendling, and F. Gottschalk. Beyond Control-Flow: Extending Business Process Configuration to Roles and Objects. In ER 2008: Proceedings of 27th International Conference on Conceptual Modeling, volume 5231 of Lecture Notes in Computer Science, pages 199–215, 2008. [82] M. Lankhorst. Enterprise Architecture at Work: Modelling, Communication, and Analysis. Springer, 2005. [83] Y. Lespérance and H.-K. Ng. Integrating Planning into Reactive HighLevel Robot Programs. In Proceedings of the Second International Cognitive Robotics Workshop (in conjunction with ECAI 2000), pages 49–54, August 2000. [84] J. Li, C. Blake, D. S. J. De Couto, H. I. Lee, and R. Morris. Capacity of Ad Hoc Wireless Networks. In Proc. 7th International Conference 216 BIBLIOGRAPHY on Mobile Computing and Networking (MOBICOM 2001), pages 61–69, 2001. [85] B. Liang and Z. J. Haas. Predictive Distance-based Mobility Management for Multidimensional PCS Networks. IEEE/ACM Transactions on Networking, 11(5):718–732, 2003. [86] P. Luttighuis, M. Lankhorst, R. Wetering, R. Bal, and H. Berg. Visualising business processes. Computer Languages, 27(1/3):39–59, 2001. [87] L. T. Ly, S. Rinderle, and P. Dadam. Integration and verification of semantic constraints in adaptive process management systems. Data & Knowledge Engineering, 64(1):3–23, 2008. [88] J. Macker, W. Chao, and J. Weston. A low-cost, IP-based Mobile Network Emulator (MNE). In MILCOM’03: Proceedings of the Military Communications Conference, volume 1, pages 481– 486. IEEE Press, 2003. [89] P. Mahadevan, A. Rodriguez, D. Becker, and A. Vahdat. Mobinet: a scalable emulation infrastructure for ad hoc and wireless networks. In WiTMeMo ’05: The 2005 workshop on Wireless traffic measurements and modeling, pages 7–12, Berkeley, CA, USA, 2005. USENIX Association. [90] D. Mahrenholz and S. Ivanov. Real-time network emulation with ns-2. In DS-RT ’04: Proceedings of the 8th IEEE International Symposium on Distributed Simulation and Real-Time Applications, pages 29–36. IEEE Computer Society, 2004. [91] B. S. Manoj and A. Hubenko Baker. Communication Challenges in Emergency Response. Communincation of ACM, 50(3):51–53, 2007. [92] D. V. McDermott. The 1998 AI Planning Systems Competition. AI Magazine, 21(2):35–55, 2000. [93] S. A. McIlraith and T. C. Son. Adapting Golog for composition of semantic web services. In Proc. KR 2002, pages 482–496, 2002. [94] M. Mecella and B. Pernici. Cooperative information systems based on a service oriented approach. Journal of Interoperability in Business Information Systems, 1(3), 2006. [95] B. Medjahed, A. Bouguettaya, and A. K. Elmagarmid. Composing web services on the semantic web. VLDB Journal, 12(4):333 – 351, 2003. BIBLIOGRAPHY 217 [96] J. Mendling and W. M. van der Aalst. Formalization and Verification of EPCs with OR-Joins Based on State and Context. In Proceedings of 19th International Conference on Advanced Information Systems Engineering, 19th International, volume 4495 of Lecture Notes in Computer Science. Springer, 2007. [97] J. Mendling, H. M. W. Verbeek, B. F. van Dongen, W. M. P. van der Aalst, and G. Neumann. Detection and prediction of errors in EPCs of the SAP reference model. Data & Knowledge Engineering, 64(1):312– 329, 2008. [98] S. Menotti. SPIDE A Smart Process IDE for Emergency Operators. Master’s thesis, Faculty of Computer Engineering - SAPIENZA Università di Roma, 2008. Supervisor: Dr. Massimo Mecella. In English. [99] R. Milner. A Calculus of Communicating Systems, volume 92 of Lecture Notes in Computer Science. Springer, 1980. [100] R. Milner. Communication and Concurrency. Prentice Hall, Inc., Upper Saddle River, NJ, USA, 1989. [101] R. Müller, U. Greiner, and E. Rahm. AGENTWORK: a workflow system supporting rule-based workflow adaptation. Data & Knowledge Engineering, 51(2):223–256, 2004. [102] A. L. Murphy, G. P. Picco, and G. C. Roman. LIME: A Coordination Model and Middleware Supporting Mobility of Hosts and Agents. ACM Transactions on Software Engineering and Methodologies, 15(3):279 – 328, 2006. [103] D. Niculescu and B. Nath. Position and Orientation in ad hoc Networks. Elsevier Journal of Ad Hoc Networks, 2(2):133–151, April 2004. [104] R. P. P. Nikitin and D. Stancil. Efficient Simulation of Ricean Fading within a Packet Simulator. In Proceedings of the 51st Vehicular Technology Conference, pages 764–767. IEEE, 2000. [105] Object Management Group. Business Process Modeling Notation. http://www.bpmn.org/Documents/OMG%20Final%20Adopted% 20BPMN%201-0%20Spec%%2006-02-01.pdf, February 2006. Prompted on 16 February, 2009. [106] S. Papanastasiou, L. M. Mackenzie, M. Ould-Khaoua, and V. Charissis. On the interaction of TCP and Routing Protocols in MANETs. In AICT-ICIW ’06: Proceedings of the Advanced Int’l Conference on 218 BIBLIOGRAPHY Telecommunications and Int’l Conference on Internet and Web Applications and Services, page 62, Washington, DC, USA, 2006. IEEE Computer Society. [107] C. E. Perkins and P. Bhagwat. Highly Dynamic Destination-Sequenced Distance-Vector Routing (DSDV) for Mobile Computers. In Proc. SIGCOMM 94, 1994. [108] C. Petri. Communication with Automata. PhD thesis, Insitut für Instrumentelle Mathematik - Universität Bonn, 1962. [109] A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proc. POPL 1989, pages 179–190, 1989. [110] A. Pnueli and R. Rosner. Distributed reactive systems are hard to synthesize. In Proc. of FOCS 1990, pages 746–757, 1990. [111] J. L. Pollock. The logical foundations of goal-regression planning in autonomous agents. Artificial Intelligence, 106(2):267–334, 1998. [112] T. V. Project. The ns Manual. ns-documentation.html, 01 2009. http://isi.edu/nsnam/ns/ [113] F. Puhlmann. Soundness Verification of Business Processes Specified in the Pi-Calculus. In On the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS Proceedings, Part I, volume 4803 of Lecture Notes in Computer Science, pages 6–23. Springer, 2007. [114] F. Puhlmann and M. Weske. Using the π-calculus for Formalizing Workflow Patterns. In Proceedings of the 3rd International Conference on Business Process Management, BPM 2006, volume 3649 of Lecture Notes in Computer Science, pages 153–168. Springer, 2005. [115] M. Puzar and T. Plagemann. NEMAN: A Network Emulator for Mobile Ad-Hoc Networks. In ConTel 2005: Proceedings of the 8th International Conference on Telecommunications, pages 155–161. IEEE Press, June 2005. [116] M. Qin, R. Zimmermann, and L. S. Liu. Supporting multimedia streaming between mobile peers with link availability prediction. In MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia, pages 956–965, New York, NY, USA, 2005. ACM. [117] M. Reichert and P. Dadam. ADEPTflex - Supporting Dynamic Changes of Workflows Without Losing Control. Journal of Intelligent Information Systems (JIIS), 10(2):93–129, 1998. BIBLIOGRAPHY 219 [118] M. Reichert, S. Rinderle, U. Kreher, and P. Dadam. Adaptive Process Management with ADEPT2. In ICDE ’05: Proceedings of the 21st International Conference on Data Engineering, pages 1113–1114. IEEE Computer Society, 2005. [119] R. Reiter. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. MIT Press, September 2001. [120] N. Russell, W. M. P. van der Aalst, A. H. M. ter Hofstede, and D. Edmond. Workflow resource patterns: Identification, representation and tool support. In Proceedings of 17th International Conference CAiSE 2005, volume 3520 of LNCS, pages 216–232. Springer, 2005. [121] S. Sardina, G. De Giacomo, Y. Lespérance, and H. J. Levesque. On the Semantics of Deliberation in Indigolog—from Theory to Implementation. Annals of Mathematics and Artificial Intelligence, 41(2-4):259–299, 2004. [122] S. Sardiña, F. Patrizi, and G. De Giacomo. Automatic synthesis of a global behavior from multiple distributed behaviors. In AAAI 2007: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, pages 1063–1069. AAAI Press, 2007. [123] H. Schonenberg, R. Mans, N. Russell, N. Mulyar, and W. M. P. van der Aalst. Towards a Taxonomy of Process Flexibility. In Proceedings of the Forum at the CAiSE’08 Conference, volume 344 of CEUR Workshop Proceedings, pages 81–84. CEUR-WS.org, 2008. [124] A. Streit, B. Pham, and R. Brown. Visualization support for managing large business process specifications. In Proceedings of the 3rd International Conference on Business Process Management BPM 2005, volume 3649 of LNCS, pages 205–219. Springer, 2005. [125] G. Tagni, A. ten Teije, and F. van Harmelen. Reasoning about repairability of workflows at design time. In Proceedings of the 1st International Workshop on QoS in Self-healing Web Services (QSWS-08), in conjunction with BPM 2008 6th International Conference on Business Process Management (BPM 2008), Lecture Notes in Computer Science. Springer, 2009. [126] T. O. Team. OMNET++ User Manual. http://www.omnetpp.org/ doc/manual/usman.html, 2006. [127] Tibco Software Inc. Introduction to TIBCO iProcess Suite. www.tibco.com/resources/software/bpm/tibco\_iprocess\ _suite\_whitepaper%.pdf, 2008. Prompted on 1 February, 2009. 220 BIBLIOGRAPHY [128] P. Traverso and M. Pistore. Automated composition of semantic web services into executable processes. In Proc. ISWC 2004, volume 3298 of LNCS, pages 380–394. Springer, 2004. [129] W. M. van der Aalst and P. Berens. Beyond Workflow Management: Product-Driven Case Handling. In GROUP2001: Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work, pages 42–51. ACM Press, 2001. [130] W. M. van der Aalst, M. Weske, and D. Grünbauer. Case Handling: A New Paradigm for Business Process Support. Data and Knowledge Engineering, 53:129–162, 2005. [131] W. M. P. van der Aalst. The application of petri nets to workflow management. Journal of Circuits, Systems, and Computers, 8(1):21–66, 1998. [132] W. M. P. van der Aalst. Workflow verification: Finding control-flow errors using petri-net-based techniques. In Proceedings of Business Process Management, Models, Techniques, and Empirical Studies, pages 161–183, London, UK, 2000. Springer-Verlag. [133] W. M. P. van der Aalst and A. H. M. ter Hofstede. YAWL: yet another workflow language. Information Systems, 30(4):245–275, 2005. [134] W. M. P. van der Aalst, A. H. M. ter Hofstede, B. Kiepuszewski, and A. P. Barros. Workflow Patterns. Distributed and Parallel Databases, 14(1):5–51, 2003. [135] W. M. P. van der Aalst, B. van Dongen, G. Christian, R. S. Mans, A. Alva de Medeiros, A. Rozinat, V. Rubin, M. Song, H. M. W. Verbeek, and A. J. M. M. Weijters. Prom 4.0: Comprehensive support for real process analysis. In Proceedings of the 28th International Conference on Applications and Theory of Petri Nets and Other Models of Concurrency ICATPN 2007, volume 4546 of LNCS, pages 484–494. Springer, 2007. [136] W. M. P. van der Aalst and K. van Hee. Workflow Management: Models, Methods, and Systems. The MIT Press, 2002. [137] A. Venkateswaran, V. Sarangan, N. Gautam, and R. Acharya. Impact of mobility prediction on the temporal stability of manet clustering algorithms. In PE-WASUN ’05: Proceedings of the 2nd ACM international workshop on Performance evaluation of wireless ad hoc, sensor, and ubiquitous networks, pages 144–151, New York, NY, USA, 2005. ACM Press. BIBLIOGRAPHY 221 [138] B. Victor and F. Moller. The mobility workbench - a tool for the picalculus. In CAV ’94: Proceedings of the 6th International Conference on Computer Aided Verification, pages 428–440, London, UK, 1994. Springer-Verlag. [139] G. Vossen and M. Weske. The WASA2 Object-Oriented Workflow Management System. In SIGMOD 1999: Proceedings ACM SIGMOD International Conference on Management of Data, pages 587–589. ACM Press, 1999. [140] B. Weber, S. Rinderle, and M. Reichert. Change Patterns and Change Support Features in Process-Aware Information Systems. In Proceedings of 19th International Conference on Advanced Information Systems Engineering, 19th International, volume 4495 of Lecture Notes in Computer Science, pages 574–588. Springer, 2007. [141] B. Weber, W. Wild, and R. Breu. CBRFlow: Enabling Adaptive Workflow Management Through Conversational Case-Based Reasoning. In ECCBR 2004: Proceedings of the 7th European Conference on Advances in Case-Based Reasoning, volume 3155 of Lecture Notes in Computer Science, pages 434–448. Springer, 2004. [142] M. Weske. Formal Foundation and Conceptual Design of Dynamic Adaptations in a Workflow Management System. In HICSS01: Proceedings of the 34th Annual Hawaii International Conference on System Sciences. IEEE Computer Society, 2001. [143] D. West. An Implementation and Evaluation of the Ad-Hoc On-Demand Distance Vector Routing Protocol for Windows CE. M.sc. thesis in computer science, University of Dublin, September 2003. [144] J. Wielemaker. An Overview of the SWI-Prolog Programming Environment. In WLPE: Proceedings of the 13th International Workshop on Logic Programming Environments, volume CW371 of Report, pages 1–16, 2003. [145] W. Wright. Business Visualization Adds Value. IEEE Computer Graphics and Applications, 18(4):39, 1998. [146] M. T. Wynn, W. M. P. van der Aalst, A. H. M. ter Hofstede, and D. Edmond. Verifying Workflows with Cancellation Regions and ORJoins: An Approach Based on Reset Nets and Reachability Analysis. In Proceedings of the 4th International Conference on Business Process Management, BPM 2006, volume 4102 of Lecture Notes in Computer Science, pages 389–394. Springer, 2006. 222 BIBLIOGRAPHY [147] X. Zeng, R. Bagrodia, and M. Gerla. Glomosim: a library for parallel simulation of large-scale wireless networks. SIGSIM Simulation Digest, 28(1):154–161, 1998. [148] Y. Zhang and W. Li. An integrated environment for testing mobile adhoc networks. In MobiHoc ’02: Proceedings of the 3rd ACM international symposium on Mobile ad hoc networking & computing, pages 104–111. ACM, 2002. [149] P. Zheng and L. M. Ni. EMWIN: emulating a mobile wireless network using a wired network. In WOWMOM ’02: Proceedings of the 5th ACM international workshop on Wireless mobile multimedia, pages 64–71. ACM, 2002.