Leader-Follower Strategies for Robotic Patrolling in
Environments with Arbitrary Topology
Nicola Basilico, Nicola Gatti, Francesco Amigoni
{basilico,ngatti,amigoni}@elet.polimi.it
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Mobile Robot Patrolling
• A robotic patrolling situation is characterized by one or more mobile
robotic patrollers and by some targets to be patrolled (Agmon et al.,
2008), (Amigoni et al., 2008), (Gatti, 2008), (Paruchuri et al., 2008)
• Usually, due to the characteristics of the setting, the patrollers
cannot employ a deterministic strategy, otherwise the intruder will
surely succeed in attacking a target
• Patrollers should adopt an unpredictable patrolling strategy,
randomizing over the targets and trying to reduce the intrusion
probability (Pita et al., 2008)
• The scientific challenge is the development of effective patrolling
strategies, i.e., the optimal probabilities with which the patroller
moves in the environment
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Example: Deterministic Strategy
€
€€
€
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Example: Non-Deterministic Strategy
€
€€
€
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Patrolling with Adversaries
• Considering adversaries can provide the patrolling robots a larger
expected utility
• (Agmon et al., 2008) considers non-explicit adversary models in
perimeter settings: the optimal patroller’s strategy is its max-min
strategy
• (Paruchuri et al., 2008) considers explicit adversary models in
settings without topologies: the optimal patroller’s strategy is its
leader-follower strategy (von Stengel and Zamir, 2004)
• Knowing the adversary preferences (also with uncertainty) provides
the patrollers a larger expected utility than not knowing them
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Assumptions
• Time is discretized in turns
• There is a single patrolling robot equipped with a sensor (e.g., a
camera) able to detect intruders
• The environment is discretized in cells and its topology is represented
by a directed graph
• The intruder cannot do anything else for some turns once it has
attempted to enter a cell (this amounts to say that penetration takes
some time)
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
The Environment Model
• C is the set of n cells to be patrolled
• Matrix T(nxn) represents the topology, where ti,j = 1 means that cells i
and j are adjacent (the patroller can go from i to j with one action) and
ti,j = 0 means that they are not
• Matrix V(nxn) represents the patroller’s sensing capabilities, where vi,j
= 1 if cell j can be sensed by the patroller from cell i and vi,j = 0
otherwise
• Each cell i requires the intruder di > 0 turns to enter
• Xi and Yi (with i ∈ {1, 2, …, n}) are the values to the patroller and to
the intruder, respectively, when the intruder successfully strikes cell i
• X0 and Y0 are the values to the patroller and to the intruder,
respectively, when the intruder is captured
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Example: Environment
d=6
X=.8,Y=.4
X0=1
Y0=-1
d=4
X=.7,Y=.5
d=5
X=.8,Y=.4
The shortest cycle is 12 cells long and then no deterministic strategy is effective
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
The Game Model
• At each turn, a strategic-form game is repeated in which the players
act simultaneously
• The patroller chooses the next cell to move to among those directly
connected to its current cell: called i the current cell of the robot, its
actions are move(j), such that ti,j = 1
• If the intruder has not previously attempted to enter any cell,
chooses whether or not to enter a cell and, in the former case, what
cell to enter: its actions are wait and enter(i)
• If, instead, the intruder has previously attempted to enter a cell i, it
cannot take any action for di turns
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Example
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Markov Hypothesis
• Being the game an extensive-form game with infinite-horizon, its
study is commonly conducted by introducing symmetries: the
patroller’s strategy depends only on the last |H| actions (i.e., the last
|H| visited cells)
• The patroller’s strategy is denoted by αH,j, that is the probability to
make action move(j) after history H
• When |H| = 0, the patroller’s strategy does not depend on any last
visited cell, i.e., αj
• When |H| = 1, the patroller’s strategy depends only on the last visited
cell, i.e., αi,j
• The intruder’s action space can be reduced to:
• stay-out: the intruder never enters the environment
• enter-when(i,j): the intruder enters cell i when the patroller is in
cell j
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Leader-Follower Equilibrium
• In a leader-follower scenario, the follower chooses its strategy after
having observed the leader’s strategy
• The patrolling scenario is a leader-follower scenario where the
patroller is the leader and the intruder is follower
• A leader-follower equilibrium is the appropriate solution concept for
a game with a leader-follower position: it assures the leader a non
smaller expected utility than any Nash equilibrium (von Stengel and
Zamir, 2004)
• The follower’s strategy is pure (von Stengel and Zamir, 2004)
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Solving Algorithm
• Computing a leader-follower equilibrium is essentially a bilevel
optimization problem (Conitzer and Sandholm, 2006)
• In our case the second optimization level is bilinear
• Our algorithm develops in two steps
• Step 1: stay-out as intruder’s best response
• Checking whether or not there exists a patroller’s strategy such
that stay-out is a intruder’s best response is a bilinear feasibility
problem
• Step 2: enter-when as intruder’s best response
• For each possible enter-when(i,j) we compute the optimal
patroller strategy as a bilinear optimization problem
• The leader-follower strategy is the strategy that assures the
largest expected utility to the patroller among all the ones
computed above
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Example: Solution
.41
.65
.95
.23
1.00
.59
.05
.44
.55
.33
.35
.35
.53
.47
.48
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
.45
.52
.35
1.00
.30
Action Space Reduction
•
•
•
•
We developed two reduction algorithms
We reduce the patroller’s action space
We reduce the intruder’s action space
Our algorithms allow one to save about 96%-97% computational
time
• Example:
• Without reduction:
132 bilinear optimization problems
• With reduction:
10 bilinear optimization problems
€
€€
€
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Conclusions and Future Works
• Achievements
• A satisfactory patrolling model that generalizes models from
the state of the art
• An exact algorithm for leader-follower equilibrium computation
• Effective algorithms for removing dominated actions and
improving computational efficiency
• Future Works
• Introducing uncertainty over intruder’s preferences and patroller’s
sensing capabilities
• Developing an algorithm for computing the deterministic patrolling
strategy in general settings
• Extending our framework to multiple patrollers
N. Basilico, N. Gatti, F. Amigoni
DEI, Politecnico di Milano
Scarica

Slides