Extending Algorithms for Mobile Robot Patrolling in the Presence of Adversaries to More Realistic Settings Nicola Basilico, Nicola Gatti, Thomas Rossi, Sofia Ceppi, and Francesco Amigoni {basilico,ngatti,ceppi,amigoni}@elet.polimi.it, [email protected] DEI, Politecnico di Milano Outline • Background – State of the art – Basic model – Solving algorithm • Contributions – Modeling intruder’s movements – Modeling intruder’s visibility limitations – Complexity reduction techniques – Experimental results • Conclusions and Future Works DEI, Politecnico di Milano Part 1: Background DEI, Politecnico di Milano Related Works • The patrolling strategy problem: • The patrolling strategy drives the robot in the patrolling task • Problem: given an environment, compute the best patrolling strategy • Approaches: • Not considering a model of the adversary (the intruder) • Frequency/coverage based approaches • Explicitly considering a model of the adversary (the intruder) • It can provide better strategies (Amigoni et a, IAT 2008) • Model of the adversary • Without preferences (Agmon et al., AAMAS 2008, perimeter-like environments) • With preferences (Paruchuri et al., AAMAS 08, fully connected environments) DEI, Politecnico di Milano Patrolling Setting • • • Time is discretized in turns • Grid map composed by free cells (white), obstacle cells (black) and targets (green circles) • Targets: cells with some value for both players Patroller: • Equipped with sensors to detect intrusions in the patrolling setting • It can move between adjacent vertexes in one time unit Intruder: • It observes the patroller remaining hidden outside the environment • It can decide to enter the environment at any turn • For each target T, the intruder must spend time dT to successfully attack the target • When attempting to attack target T at time t, the intruder can be detected during [t, t+dT) DEI, Politecnico di Milano Patrolling Strategy • Patrolling strategy: it specifies the next move of the patroller at each turn • Randomized strategy: a probability distribution over the next move, it can be the only effective strategy against an observing intruder • Objective: finding the optimal randomized patrolling strategy while considering a model of the adversary (the intruder) • Strongest intruder: a rational agent that knows the patrolling strategy and considers it when deciding its action • Approach: to study the interactions between patroller and intruder agents within a game-theoretical framework DEI, Politecnico di Milano The Patrolling Game P move(7) P … P … I … enter(1) I … I 1 … 6 9 P … 2 3 4 7 10 8 12 1 turn Game Outcomes • • At turn k the indruder enters cell T when the patroller is in cell G: enter-when(T,G) • If the patroller does not sense cell T in the interval [k, k+ dT) the intruder wins • Otherwise the intruder is captured and the patroller wins The intruder never enters: stay-out DEI, Politecnico di Milano 5 13 Solving the Game • The patrolling problem can be modeled as a leader-follower game • Two players • The leader commits to a strategy • The follower observes such commitment and acts as a best responder • Patroller’s strategy: A = {αi,j}, where αi,j is the probability of doing move(j) when i is the current node • Intruder’s strategy: enter-when(T,G), enter in target T when the patroller is in cell G • The optimal A can be derived by computing the equilibrium of the leaderfollower game resorting to a bilevel optimization problem (Conitzer and Sandholm, 2006) DEI, Politecnico di Milano Solving the Game For very intruder’s action ai Find A’ such that EUp is maximum s.t. ai is best response to A’ If the intruder’s action is to attack target T, the patroller’s expected utility is computed as: A’1,a1 A’2,a2 A’3,a3 … A’n,an max EUp A*,a* Leader-Follower Equilibrium Optimal Patrolling Strategy DEI, Politecnico di Milano P(intrusion T) *XT + (1 - P(intrusion T)) * X0 P(intrusion T) depends on • the attacked target • the position of the patroller • the patrolling strategy Part 2: Contributions DEI, Politecnico di Milano Objective • The basic model is general but it makes a lot of simplifying assumption • E.g., the intruder can directly enter in any target • We introduce two different extensions in order to model a more realistic patrolling scenario • We refine the intruder’s model considering aspects that are not addressed in game theoretical patrolling literature • We experimentally evaluate the computational complexity of the extended model and provide techniques to reduce it DEI, Politecnico di Milano Intruder’s Movements Basic model assumption: the intruder can directly enter in any target T Nowintruder’s The the intruder’s strategy strategy is represented is represented as:as: enter-when(T,C) enter-when(P,C) C • The environment can be accessed by access areas • As soon as the patroller is in C, the intruder: • • Enters from an access area • Follows a path P from the access area to a target T, and then stays there for dT turns to complete the intrusion attempt The intrusion probability of an intruder’s strategy has to be computed in a different way with respect to the basic model DEI, Politecnico di Milano Reduction • We can reduce the computational burden by discarding players’ dominated actions • An action a is dominated by an action b if the player prefers to undertake b independently of the opponent‘s strategy • • Patroller’s actions reduction: • Smaller setting, less variables • Forcing the patroller to cover shortest paths between targets Intruder’s actions reduction: • Less optimization problems, less constraints for each optimization problem DEI, Politecnico di Milano Reduction • Indentify the minimal set of paths that a rational intruder would consider in its actions enter-when(P,C)s • Obiously enter-when(P1, *) dominates enter-when(P2, *) • P3 is not dominated: there can be a patrolling strategy such that P3 is better than P1 • We select all irreducible paths, i.e., those paths that do not strictly contain any other path DEI, Politecnico di Milano Reduction • Indentify the minimal set of cells {C} that a rational intruder would consider in its actions enter-when(P,C)s C C2 C1 • Obiously enter-when(P1, C) is dominated by stay-out • enter-when(P1, C1) is dominated by enter-when(P1, C2): from C2 the patroller should always cover a longer distance to reach the target within dT turns than from C1 • For every irreducible path we find the set {C} resorting to a tree based search technique DEI, Politecnico di Milano Intruder’s Limited Observation Capabilities • Basic model: the intruder can observe the patroller and derive a correct belief on the patrolling strategy • Limited visibility: when acting the intruder has a limited knowledge about the current position of the patroller • Hi is the set of hidden cells when entering from access area i ? • Actions enter-when(T,G) cannot be performed if G is an hidden cell belonging to Hi • We introduce a state of the game s = <G,O> where: ? • G is the last cell where the intruder saw the patroller ? G • O is the number of turns from such last observation • Example s = <G,3> • The intruder can compute a probability distribution over the patroller’s position using the strategy it knows: DEI, Politecnico di Milano Intruder’s Limited Visibility • Now the intruder’s strategy is represented as : enter-when(T,s) where s is a state • To determine non dominated actions we have to compute the minimal set of states {s} • Compute the minimal set of cells {c} to consider as patroller positions (like in the previous case) • For every c of {c}: • If c is not hidden then s = <c,0> has to be considered • If c is hidden, we consider c’ from which the patroller can reach c without passing from any non hidden cell c c’ • We consider state s = <c’,k> such that the probability for the patroller of being in c starting from c’ is maximum after k turns since it disappeared • We can find it by resorting to Markov chains properties, in the example s = <c’,3> DEI, Politecnico di Milano 2500 2000 partially reduced 1500 1000 fully reduced 500 0 Optimization problems Total time (seconds) Experimental Results 350 300 250 200 150 100 50 0 intruder's paths partially reduced 400 fully reduced 200 0 intruder's limited visibility DEI, Politecnico di Milano Optimization problems Total time (seconds) 1000 600 partially reduced fully reduced original 1200 800 without reduction intruder's path 100 80 60 40 partially reduction 20 fully reduced 0 original intruder's limited visibility Conclusions and Future Works • Conclusions: • We presented a game theoretical model to find the best patrolling strategy in a patrolling setting, together with some extensions to capture more realistic situations • Future Works: • Further extensions to refine the model of the patroller • Real / simulated robot implementation • Multi-patroller scenarios DEI, Politecnico di Milano