Agenda
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Presentation of ImageLab
Digital Library
content-based
retrieval
Computer
Vision
for robotic
automation
Multimedia:
video
annotation
Medical
Imaging
Video analysis
for indoor/outdoor
surveillance
People and vehicle
surveillance
Off-line Video
analysis
for telemetry
and forensics
Imagelab-Softech
Lab of Computer Vision,
Pattern Recognition and Multimedia
Dipartimento di Ingegneria dell’Informazione
Università di Modena e Reggio Emilia Italy
http://imagelab.ing.unimore.it
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Imagelab: recent projects in surveillance
Projects:
•
European
•
International
•
•
BE SAFE NATO Science for Peace project 2007-2009
Detection of infiltrated objects for security 2006-2008 Australian
Council
Italian & Regional
•
•
•
Behave_Lib : Regione Emilia Romagna Tecnopolo Softech 2010-2013
LAICA Regione Emilia Romagna; 2005-2007
FREE_SURF MIUR PRIN Project 2006-2008
With Companies
•
•
•
•
•
Building site surveillance: with Bridge-129 Italia 2009-2010
Stopped Vehicles with Digitek Srl 2007-2008
SmokeWave: with Bridge-129 Italia 2007-2010
Sakbot for Traffic Analysis with Traficon 2004-2006
Mobile surveillance with Sistemi Integrati 2007
•
Domotica per disabili: posture detection FCRM 2004-2005
THIS Transport hubs intelligent surveillance EU JLS/CHIPS Project
2009-2010
VIDI-Video: STREP VI FP EU  (VISOR VideosSurveillance Online
Repository) 2007-2009
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
AD-HOC: Appearance Driven Human tracking
with Occlusion Handling
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Key aspects
• Based on the SAKBOT system
– Background estimation and updating
– Shadow removal
• Appearance based tracking
– we aim at recovering a pixel based foreground mask, even during
an occlusion
– Recovering of missing parts from the background subtraction
– Managing split and merge situations
• Occlusion detection and classification
– Classify the differences as real shape changes or occlusions
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Example 1 (from ViSOR)
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Example 2 from PETS 2002
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Example 3
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Other experimental results
Imagelab videos (available on ViSOR)
PETS series
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Results on the PETS2006 dataset
Working in real time
at 10 fps!
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Posture classification
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
1
Distributed surveillance
with non overlapping field of view
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Exploit the knowledge about the scene
• To avoid all-to-all matches, the tracking system can exploit
the knowledge about the scene
–
–
–
–
Preferential paths -> Pathnodes
Border line / exit zones
Physical constraints & Forbidden zones NVR
Temporal constraints
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Tracking with pathnode
A possible path
between
Camera1 and Camera 4
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Pathnodes lead particle diffusion
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Results with PF and pathnodes
Single camera tracking:
Recall=90.27%
Precision=88.64%
Multicamera tracking
Recall=84.16%
Precision=80.00%
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
“VIP: Vision tool for comparing Images of People”
Lantagne & al., Vision Interface 2003
Each extracted silhouette is segmented into significant region using
the JSEG algorithm
( Y. Deng ,B.S. Manjunath: “Unsupervised segmentation of colortexture regions in images and video” )
Colour and texture descriptors are calculated for each region


The colour descriptor is a modified version of the descriptor
presented in Y. Deng et al.: “Efficient color representation for
Image retrieval”.
Basically an HSV histogram of the dominant colors.
The texture descriptor is based on D.K.Park et al.: “Efficient
Use of Local Edge Histogram Descriptor”.
Essentially this descriptor characterizes the edge density
inside a region according to different orientations ( 0°, 45°,
90° and 135° )
The similarity between two regions is the weighted sum of
the two descriptor similarities:

Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
To compare the regions inside two silhouette, a region matching scheme is used,
involving a modified version of the IRM algorithm presented in J.Z. Wang et al, ”Simplicity:
Semantics-sensitive integrated matching for picture libraries” .
The IRM algorithm is simple and works as follows:
1) The first step is to calculate all of the similarities
between all regions.
2) Similarities are sorted in decreasing order, the
first one is selected, and areas of the
respective pair of regions are compared.
A weight, equal to the smallest percentage area
between the two regions, is assigned to the
similarity measure.
3) Then, the percentage area of the largest region is updated by removing the
percentage area of the smallest region so that it can be matched again.
The smallest region will not be matched anymore with any other region.
4) The process continues in decreasing order for all of the similarities.
In the end the overall similarity between the two region sets is calculated as:
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
ViSOR: Video Surveillance Online Repository
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
The ViSOR video repository
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Aims of ViSOR
• Gather and make freely available a
repository of surveillance videos
• Store metadata annotations, both manually
provided as ground-truth and automatically
generated by video surveillance tools and
systems
• Execute Online performance evaluation and
comparison
• Create an open forum to exchange, compare
and discuss problems and results on video
surveillance
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Different types of annotation
• Structural Annotation: video size, authors, keywords,…
• Base Annotation: ground-truth, with concepts referred to
the whole video. Annotation tool: online!
• GT Annotation: ground-truth, with a frame level
annotation; concepts can be referred to the whole video, to
a frame interval or to a single frame. Annotation tool:
Viper-GT (offline)
• Automatic Annotation: output of automatic systems
shared by
ViSOR users.
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Video corpus set: the 14 categories
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Outdoor multicamera
Synchronized
views
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Surveillance of entrance door of a building
• About 10h!
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Videos for smoke detection with GT
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Videos for shadow detection
• Already used from many
researcher working on shadow
detection
• Some videos with GT
A. Prati, I. Mikic, M.M. Trivedi, R.
Cucchiara, "Detecting Moving Shadows:
Algorithms and Evaluation" in IEEE
Transactions on Pattern Analysis and
Machine Intelligence, vol. 25, n. 7, pp. 918923, July, 2003
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Some statistics
We need videos
and annotations!
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Action recognition
SIMULTANEOUS HMM ACTION
SEGMENTATION AND RECOGNITION
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Probabilistic Action Classification
• Classical approach:
– Given a set of training videos
containing an atomic action each
(manually labelled)
– Given a new video with a single
action
• … find the most likely action
Dataset: "Actions as Space-Time Shapes (ICCV '05)."
M. Blank, L. Gorelick, E. Shechtman, M. Irani, R. Basri
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Classical HMM Framework
• Definition of a feature set
• For each frame t, computation of the feature set Ot
(observations)
• Given a set of training observations O={O1…OT} for each
action, training of an HMM (k) for each action k
• Given a new set of observations O={O1…OT}
• Find the model (k) which maximise P(k|O)
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
A sample 17-dim feature set
• Computed on the extracted blob after the foreground
segmentation and people tracking:
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
From the Rabiner tutorial
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Online action Recognition
• Given a video with a sequence of actions
– Which is the current action? Frame by frame
action classification
(online – Action recognition)
– When does an action finish and the next one start?
(offline – Action segmentation)
R. Vezzani, M. Piccardi, R. Cucchiara, "An efficient Bayesian framework for on-line action
recognition" in press on Proceedings of the IEEE International Conference on Image
Processing, Cairo, Egypt, November 7-11, 2009
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Main problem of this approach
 I do not know when the action starts and when it finishes.
 Using all the observations, the first action only is
recognized
 A possible solution: “brute force”. For each action, for each
starting frame, for each ending frame, compute the model
likelihood and select the maximum. UNFEASIBLE
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Our approach
• Subsample of the starting frames (1 each 10)
• Adoption of recursive formulas
• Computation of the emission probability once for each
model (Action)
• Current frame as Ending frame
• Maximum length of each action
• The computational complexity is compliant with real time
requirements
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Different length sequences
• Sequences with different starting frame have different
length
• Unfair comparisons using the traditional HMM schema
• The output of each HMM is normalized using the
sequence length and a term related to the mean duration of
the considered action
• This allows to classify the current action and, at the same
time, to perform an online action segmentation
Roberto Vezzani - Imagelab – Università di Modena e Reggio Emilia
Scarica

imagelab_051109