Leader-Follower Strategies for Robotic Patrolling in Environments with Arbitrary Topology Nicola Basilico, Nicola Gatti, Francesco Amigoni {basilico,ngatti,amigoni}@elet.polimi.it N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Mobile Robot Patrolling • A robotic patrolling situation is characterized by one or more mobile robotic patrollers and by some targets to be patrolled (Agmon et al., 2008), (Amigoni et al., 2008), (Gatti, 2008), (Paruchuri et al., 2008) • Usually, due to the characteristics of the setting, the patrollers cannot employ a deterministic strategy, otherwise the intruder will surely succeed in attacking a target • Patrollers should adopt an unpredictable patrolling strategy, randomizing over the targets and trying to reduce the intrusion probability (Pita et al., 2008) • The scientific challenge is the development of effective patrolling strategies, i.e., the optimal probabilities with which the patroller moves in the environment N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Example: Deterministic Strategy € €€ € N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Example: Non-Deterministic Strategy € €€ € N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Patrolling with Adversaries • Considering adversaries can provide the patrolling robots a larger expected utility • (Agmon et al., 2008) considers non-explicit adversary models in perimeter settings: the optimal patroller’s strategy is its max-min strategy • (Paruchuri et al., 2008) considers explicit adversary models in settings without topologies: the optimal patroller’s strategy is its leader-follower strategy (von Stengel and Zamir, 2004) • Knowing the adversary preferences (also with uncertainty) provides the patrollers a larger expected utility than not knowing them N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Assumptions • Time is discretized in turns • There is a single patrolling robot equipped with a sensor (e.g., a camera) able to detect intruders • The environment is discretized in cells and its topology is represented by a directed graph • The intruder cannot do anything else for some turns once it has attempted to enter a cell (this amounts to say that penetration takes some time) N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano The Environment Model • C is the set of n cells to be patrolled • Matrix T(nxn) represents the topology, where ti,j = 1 means that cells i and j are adjacent (the patroller can go from i to j with one action) and ti,j = 0 means that they are not • Matrix V(nxn) represents the patroller’s sensing capabilities, where vi,j = 1 if cell j can be sensed by the patroller from cell i and vi,j = 0 otherwise • Each cell i requires the intruder di > 0 turns to enter • Xi and Yi (with i ∈ {1, 2, …, n}) are the values to the patroller and to the intruder, respectively, when the intruder successfully strikes cell i • X0 and Y0 are the values to the patroller and to the intruder, respectively, when the intruder is captured N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Example: Environment d=6 X=.8,Y=.4 X0=1 Y0=-1 d=4 X=.7,Y=.5 d=5 X=.8,Y=.4 The shortest cycle is 12 cells long and then no deterministic strategy is effective N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano The Game Model • At each turn, a strategic-form game is repeated in which the players act simultaneously • The patroller chooses the next cell to move to among those directly connected to its current cell: called i the current cell of the robot, its actions are move(j), such that ti,j = 1 • If the intruder has not previously attempted to enter any cell, chooses whether or not to enter a cell and, in the former case, what cell to enter: its actions are wait and enter(i) • If, instead, the intruder has previously attempted to enter a cell i, it cannot take any action for di turns N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Example N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Markov Hypothesis • Being the game an extensive-form game with infinite-horizon, its study is commonly conducted by introducing symmetries: the patroller’s strategy depends only on the last |H| actions (i.e., the last |H| visited cells) • The patroller’s strategy is denoted by αH,j, that is the probability to make action move(j) after history H • When |H| = 0, the patroller’s strategy does not depend on any last visited cell, i.e., αj • When |H| = 1, the patroller’s strategy depends only on the last visited cell, i.e., αi,j • The intruder’s action space can be reduced to: • stay-out: the intruder never enters the environment • enter-when(i,j): the intruder enters cell i when the patroller is in cell j N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Leader-Follower Equilibrium • In a leader-follower scenario, the follower chooses its strategy after having observed the leader’s strategy • The patrolling scenario is a leader-follower scenario where the patroller is the leader and the intruder is follower • A leader-follower equilibrium is the appropriate solution concept for a game with a leader-follower position: it assures the leader a non smaller expected utility than any Nash equilibrium (von Stengel and Zamir, 2004) • The follower’s strategy is pure (von Stengel and Zamir, 2004) N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Solving Algorithm • Computing a leader-follower equilibrium is essentially a bilevel optimization problem (Conitzer and Sandholm, 2006) • In our case the second optimization level is bilinear • Our algorithm develops in two steps • Step 1: stay-out as intruder’s best response • Checking whether or not there exists a patroller’s strategy such that stay-out is a intruder’s best response is a bilinear feasibility problem • Step 2: enter-when as intruder’s best response • For each possible enter-when(i,j) we compute the optimal patroller strategy as a bilinear optimization problem • The leader-follower strategy is the strategy that assures the largest expected utility to the patroller among all the ones computed above N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Example: Solution .41 .65 .95 .23 1.00 .59 .05 .44 .55 .33 .35 .35 .53 .47 .48 N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano .45 .52 .35 1.00 .30 Action Space Reduction • • • • We developed two reduction algorithms We reduce the patroller’s action space We reduce the intruder’s action space Our algorithms allow one to save about 96%-97% computational time • Example: • Without reduction: 132 bilinear optimization problems • With reduction: 10 bilinear optimization problems € €€ € N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano Conclusions and Future Works • Achievements • A satisfactory patrolling model that generalizes models from the state of the art • An exact algorithm for leader-follower equilibrium computation • Effective algorithms for removing dominated actions and improving computational efficiency • Future Works • Introducing uncertainty over intruder’s preferences and patroller’s sensing capabilities • Developing an algorithm for computing the deterministic patrolling strategy in general settings • Extending our framework to multiple patrollers N. Basilico, N. Gatti, F. Amigoni DEI, Politecnico di Milano