Extending Algorithms for Mobile Robot Patrolling
in the Presence of Adversaries
to More Realistic Settings
Nicola Basilico, Nicola Gatti, Thomas Rossi, Sofia Ceppi, and Francesco Amigoni
{basilico,ngatti,ceppi,amigoni}@elet.polimi.it, [email protected]
DEI, Politecnico di Milano
Outline
• Background
– State of the art
– Basic model
– Solving algorithm
• Contributions
– Modeling intruder’s movements
– Modeling intruder’s visibility limitations
– Complexity reduction techniques
– Experimental results
• Conclusions and Future Works
DEI, Politecnico di Milano
Part 1:
Background
DEI, Politecnico di Milano
Related Works
• The patrolling strategy problem:
• The patrolling strategy drives the robot in the patrolling task
• Problem: given an environment, compute the best patrolling strategy
• Approaches:
• Not considering a model of the adversary (the intruder)
•
Frequency/coverage based approaches
• Explicitly considering a model of the adversary (the intruder)
•
It can provide better strategies (Amigoni et a, IAT 2008)
• Model of the adversary
•
Without preferences (Agmon et al., AAMAS 2008, perimeter-like environments)
•
With preferences (Paruchuri et al., AAMAS 08, fully connected environments)
DEI, Politecnico di Milano
Patrolling Setting
•
•
•
Time is discretized in turns
•
Grid map composed by free cells (white), obstacle cells
(black) and targets (green circles)
•
Targets: cells with some value for both players
Patroller:
•
Equipped with sensors to detect intrusions in the patrolling setting
•
It can move between adjacent vertexes in one time unit
Intruder:
•
It observes the patroller remaining hidden outside the environment
•
It can decide to enter the environment at any turn
•
For each target T, the intruder must spend time dT to successfully attack the target
•
When attempting to attack target T at time t, the intruder can be detected during [t, t+dT)
DEI, Politecnico di Milano
Patrolling Strategy
• Patrolling strategy:
it specifies the next move of the patroller at each turn
• Randomized strategy:
a probability distribution over the next move, it can be the only
effective strategy against an observing intruder
• Objective: finding the optimal randomized patrolling strategy while
considering a model of the adversary (the intruder)
• Strongest intruder: a rational agent that knows the patrolling strategy
and considers it when deciding its action
• Approach: to study the interactions between patroller and intruder
agents within a game-theoretical framework
DEI, Politecnico di Milano
The Patrolling Game
P
move(7)
P
…
P
…
I
…
enter(1)
I
…
I
1
…
6
9
P
…
2
3
4
7
10
8
12
1 turn
Game Outcomes
•
•
At turn k the indruder enters cell T when the patroller is in cell G: enter-when(T,G)
•
If the patroller does not sense cell T in the interval [k, k+ dT) the intruder wins
•
Otherwise the intruder is captured and the patroller wins
The intruder never enters: stay-out
DEI, Politecnico di Milano
5
13
Solving the Game
• The patrolling problem can be modeled as a leader-follower game
• Two players
• The leader commits to a strategy
• The follower observes such commitment and acts as a best responder
• Patroller’s strategy: A = {αi,j}, where αi,j is the probability of doing move(j) when
i is the current node
• Intruder’s strategy: enter-when(T,G), enter in target T when the patroller is in
cell G
• The optimal A can be derived by computing the equilibrium of the leaderfollower game resorting to a bilevel optimization problem (Conitzer and
Sandholm, 2006)
DEI, Politecnico di Milano
Solving the Game
For very intruder’s action ai
Find A’ such that EUp is maximum
s.t. ai is best response to A’
If the intruder’s action is to attack target T, the
patroller’s expected utility is computed as:
A’1,a1
A’2,a2
A’3,a3
…
A’n,an
max EUp
A*,a*
Leader-Follower
Equilibrium
Optimal Patrolling Strategy
DEI, Politecnico di Milano
P(intrusion T) *XT + (1 - P(intrusion T)) * X0
P(intrusion T) depends on
• the attacked target
• the position of the patroller
• the patrolling strategy
Part 2:
Contributions
DEI, Politecnico di Milano
Objective
• The basic model is general but it makes a lot of simplifying assumption
• E.g., the intruder can directly enter in any target
• We introduce two different extensions in order to model a more realistic
patrolling scenario
• We refine the intruder’s model considering aspects that are not addressed in
game theoretical patrolling literature
• We experimentally evaluate the computational complexity of the extended model
and provide techniques to reduce it
DEI, Politecnico di Milano
Intruder’s Movements
Basic model assumption: the intruder can directly enter in any target
T
Nowintruder’s
The
the intruder’s
strategy
strategy
is
represented
is
represented
as:as:
enter-when(T,C)
enter-when(P,C)
C
•
The environment can be accessed by access areas
•
As soon as the patroller is in C, the intruder:
•
•
Enters from an access area
•
Follows a path P from the access area to a target T, and then stays there for dT turns to
complete the intrusion attempt
The intrusion probability of an intruder’s strategy has to be computed in a different way with
respect to the basic model
DEI, Politecnico di Milano
Reduction
•
We can reduce the computational burden by discarding players’ dominated actions
•
An action a is dominated by an action b if the player prefers to undertake b independently
of the opponent‘s strategy
•
•
Patroller’s actions reduction:
•
Smaller setting, less variables
•
Forcing the patroller to cover shortest paths between targets
Intruder’s actions reduction:
•
Less optimization problems, less constraints for each optimization problem
DEI, Politecnico di Milano
Reduction
•
Indentify the minimal set of paths that a rational intruder would consider
in its actions enter-when(P,C)s
•
Obiously enter-when(P1, *) dominates enter-when(P2, *)
•
P3 is not dominated: there can be a patrolling strategy such that P3 is better than P1
•
We select all irreducible paths, i.e., those paths that do not strictly contain any other path
DEI, Politecnico di Milano
Reduction
•
Indentify the minimal set of cells {C} that a rational intruder would consider
in its actions enter-when(P,C)s
C
C2
C1
•
Obiously enter-when(P1, C) is dominated by stay-out
•
enter-when(P1, C1) is dominated by enter-when(P1, C2): from C2 the patroller should
always cover a longer distance to reach the target within dT turns than from C1
•
For every irreducible path we find the set {C} resorting to a tree based search technique
DEI, Politecnico di Milano
Intruder’s Limited Observation Capabilities
•
Basic model: the intruder can observe the patroller and derive a correct belief on the
patrolling strategy
•
Limited visibility: when acting the intruder has a limited knowledge about the current position
of the patroller
• Hi is the set of hidden cells when entering from access area i
?
• Actions enter-when(T,G) cannot be performed if G is an hidden
cell belonging to Hi
• We introduce a state of the game s = <G,O> where:
?
• G is the last cell where the intruder saw the patroller
?
G
• O is the number of turns from such last observation
• Example s = <G,3>
• The intruder can compute a probability distribution over the patroller’s position using the strategy it knows:
DEI, Politecnico di Milano
Intruder’s Limited Visibility
•
Now the intruder’s strategy is represented as : enter-when(T,s) where s is a state
•
To determine non dominated actions we have to compute the minimal set of states {s}
•
Compute the minimal set of cells {c} to consider as patroller positions (like in the previous
case)
•
For every c of {c}:
•
If c is not hidden then s = <c,0> has to be considered
• If c is hidden, we consider c’ from which the
patroller can reach c without passing from any non
hidden cell
c
c’
• We consider state s = <c’,k> such that the
probability for the patroller of being in c starting
from c’ is maximum after k turns since it
disappeared
• We can find it by resorting to Markov chains
properties, in the example s = <c’,3>
DEI, Politecnico di Milano
2500
2000
partially
reduced
1500
1000
fully reduced
500
0
Optimization problems
Total time (seconds)
Experimental Results
350
300
250
200
150
100
50
0
intruder's paths
partially
reduced
400
fully reduced
200
0
intruder's limited
visibility
DEI, Politecnico di Milano
Optimization problems
Total time (seconds)
1000
600
partially
reduced
fully reduced
original
1200
800
without
reduction
intruder's
path
100
80
60
40
partially
reduction
20
fully reduced
0
original
intruder's
limited
visibility
Conclusions and Future Works
• Conclusions:
• We presented a game theoretical model to find the best patrolling
strategy in a patrolling setting, together with some extensions to
capture more realistic situations
• Future Works:
• Further extensions to refine the model of the patroller
• Real / simulated robot implementation
• Multi-patroller scenarios
DEI, Politecnico di Milano
Scarica
Slides - Politecnico di Milano

Slides - Politecnico di Milano

1 - Politecnico di Milano

Slides

politecnico di milano - LeHo | Learning at Home and in the Hospital