Group and Crowd Behavior for Computer Vision

Author: Murino   Vittorio;Cristani   Marco;Shah   Shishir  

Publisher: Elsevier Science‎

Publication year: 2017

E-ISBN: 9780128092804

P-ISBN(Paperback): 9780128092767

Subject: TP317.4 Image processing software;TP39 computer application

Keyword: Technology: general issues,计算机的应用,图像处理软件

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Description

Group and Crowd Behavior for Computer Vision provides a multidisciplinary perspective on how to solve the problem of group and crowd analysis and modeling, combining insights from the social sciences with technological ideas in computer vision and pattern recognition.

The book answers many unresolved issues in group and crowd behavior, with Part One providing an introduction to the problems of analyzing groups and crowds that stresses that they should not be considered as completely diverse entities, but as an aggregation of people.

Part Two focuses on features and representations with the aim of recognizing the presence of groups and crowds in image and video data. It discusses low level processing methods to individuate when and where a group or crowd is placed in the scene, spanning from the use of people detectors toward more ad-hoc strategies to individuate group and crowd formations.

Part Three discusses methods for analyzing the behavior of groups and the crowd once they have been detected, showing how to extract semantic information, predicting/tracking the movement of a group, the formation or disaggregation of a group/crowd and the identification of different kinds of groups/crowds depending on their behavior.

The final section focuses on identifying and promoting datasets for group/crowd analysis and modeling, presenting and discussing metrics for evaluating the pros and cons of the various models and methods. This book give

Chapter

1.3 Summary of Important Points

References

Part 1 Features and Representations

2 Social Interaction in Temporary Gatherings

2.1 Introduction: Group and Crowd Behavior in Context

2.2 Social Interaction: A Typology and Some Definitions

2.2.1 Unfocused Interaction

2.2.2 Common-Focused Interaction

2.2.3 Jointly-Focused Interaction

2.3 Temporary Gatherings: A Taxonomy and Some Examples

2.3.1 Small Gatherings - Semi/Private Encounters and Group Life

2.3.2 Medium Gatherings - Semi/Public Occasions and Community Life

2.3.3 Large Gatherings - Public Events and Collective Life

2.4 Conclusion: Microsociology Applied to Computer Vision

2.5 Further Reading

References

3 Group Detection and Tracking Using Sociological Features

3.1 Introduction

3.2 State-of-the-Art

3.3 Sociological Features

3.3.1 Low-Level Features

3.3.1.1 Person Detection

3.3.1.2 Person Velocity and Direction

3.3.1.3 Head & Body Orientation

3.3.2 High-Level Features

3.3.2.1 3D Subjective View Frustum

3.3.2.2 Transactional Segment-Based Frustum

3.3.2.3 External Factors

3.4 Detection Models

3.4.1 Game-Theoretic Conversational Grouping Model

3.4.2 The Dirichlet Process Mixture Model

3.5 Group Tracking

3.5.1 DPF for Group Tracking

Individual Proposal p(Xt+1|X0:t,y0:t+1)

Joint Observation Distribution p(yt|Xt ,Tt )

Joint Individual Distribution p( Xt+1|X0:t,Tt )

Joint Group Proposal p( Tt+1|X0:t+1,Tt )

3.6 Experiments

3.6.1 Results of Group Detection

3.6.1.1 Datasets

3.6.1.2 Evaluation Metrics

3.6.1.3 Comparing Methods

3.6.1.4 Performance Evaluation

3.6.2 Results of Group Tracking

3.6.2.1 Datasets

3.6.2.2 Evaluation Metrics

3.6.2.3 Comparing Methods

3.6.2.4 Performance Analysis

3.7 Discussion

3.8 Conclusions

References

4 Exploring Multitask and Transfer Learning Algorithms for Head Pose Estimation in Dynamic Multiview Scenarios

4.1 Introduction

4.2 Related Work

4.2.1 Head Pose Estimation from Low-Resolution Images

4.2.2 Transfer Learning

4.2.3 Multitask Learning

4.3 TL and MTL for Multiview Head Pose Estimation

4.3.1 Preprocessing

4.3.2 Transfer Learning for HPE

4.3.2.1 Head-Pan Classification Under Varying Head-Tilt

Experimental Results

4.3.2.2 Head-Pan Classification Under Target Motion

Experimental Results

4.3.3 Multitask Learning for HPE

4.3.3.1 FEGA-MTL

Experimental Results

4.4 Conclusions

References

5 The Analysis of High Density Crowds in Videos

5.1 Introduction

5.2 Literature Review

5.2.1 Crowd Motion Modeling and Segmentation

5.2.2 Estimating Density of People in a Crowded Scene

5.2.3 Crowd Event Modeling and Recognition

5.2.4 Detecting and Tracking in a Crowded Scene

5.3 Data-Driven Crowd Analysis in Videos

5.3.1 Off-Line Analysis of Crowd Video Database

5.3.1.1 Low-Level Representation

5.3.1.2 Mid-Level Representation

5.3.2 Matching

5.3.2.1 Global Crowded Scene Matching

5.3.2.2 Local Crowd Patch Matching

5.3.3 Transferring Learned Crowd Behaviors

5.3.4 Experiments and Results

5.4 Density-Aware Person Detection and Tracking in Crowds

5.4.1 Crowd Model

5.4.1.1 Tracking Detections

5.4.2 Evaluation

5.4.2.1 Tracking

5.5 CrowdNet: Learning a Representation for High Density Crowds in Videos

5.5.1 Introduction

5.5.2 Overview of the Approach

5.5.3 Crowd Patch Mining in Videos

5.5.4 Tracking

5.5.5 Learning a Representation for High Density Crowds

5.5.6 Evaluation

5.6 Conclusions and Directions for Future Research

References

6 Tracking Millions of Humans in Crowded Spaces

6.1 Introduction

6.2 Related Work

6.3 System Overview

6.4 Human Detection in 3D

6.4.1 Method

6.4.2 Evaluation

6.5 Tracklet Generation

6.6 Tracklet Association

6.6.1 Social Affinity Map - SAM

6.6.2 The SAM Feature

6.6.3 Tracklet Association Method

6.6.4 Optimization

6.6.5 Coarse-to-Fine Data Association

6.7 Experiments

6.7.1 Large-Scale Evaluation

6.7.2 OD Forecasting

6.8 Conclusions

References

7 Subject-Centric Group Feature for Person Reidentification

7.1 Introduction

7.2 Related Works

7.3 Methodology

7.3.1 Group Extraction

7.3.2 Person-Group Feature

7.3.2.1 In-Group Position Signature

7.3.2.2 Metric of Person-Group Feature

7.3.3 Person Reidentification with Person-Group Feature

7.4 Results

7.4.1 Features Evaluation

7.4.1.1 Group Extraction Evaluation

7.4.1.2 Group Features Evaluation

7.4.2 Comparison with Baseline Approaches

7.4.3 Comparison with Group-Based Approaches

7.5 Conclusion

Acknowledgments

References

Part 2 Group and Crowd Behavior Modeling

8 From Groups to Leaders and Back

8.1 Introduction

8.2 Modeling and Observing Groups and Their Leaders in Literature

8.2.1 Sociological Perspective

8.2.2 Computational Approaches

8.3 Technical Preliminaries and Structured Output Prediction

8.3.1 Problem Statement

8.3.2 Stochastic Optimization

8.4 The Tools of the Trade in Social and Structured Crowd Analysis

8.4.1 Socially Constrained Structural Learning for Groups Detection in Crowd

8.4.1.1 Task Formulation

8.4.1.2 SSVM Adaptation to Group Detection

Inference and Max Oracle

Loss Function

8.4.2 Learning to Identify Group Leaders in Crowd

8.4.2.1 Task Formulation

8.4.2.2 SSVM Adaptation to Leader Identification

Inference and Max Oracle

Loss Function

8.5 Results on Visual Localization of Groups and Leaders

8.6 The Predictive Power of Leaders in Social Groups

8.6.1 Experimental Settings

8.6.2 Leader Centrality in Feature Space

8.6.2.1 Group Recovery Guarantees

8.6.2.2 Validation and Results

8.7 Conclusion

References

9 Learning to Predict Human Behavior in Crowded Scenes

9.1 Introduction

9.2 Related Work

9.2.1 Human-Human Interactions

9.2.2 Activity Forecasting

9.2.3 RNN Models for Sequence Prediction

9.3 Forecasting with Social Forces Model

9.3.1 Basic Theory

9.3.2 Modeling Social Sensitivity

9.3.2.1 Social Sensitivity Feature

9.3.2.2 Training

9.3.2.3 Testing

9.3.3 Forecasting with Social Sensitivity

9.4 Forecasting with Recurrent Neural Network

9.4.1 Social LSTM

9.4.1.1 Social Pooling of Hidden States

9.4.1.2 Position Estimation

9.4.1.3 Occupancy Map Pooling

9.4.1.4 Inference for Path Prediction

9.4.2 Implementation Details

9.5 Experiments

9.5.1 Analyzing the Predicted Paths

9.5.2 Discussions and Limitations

9.6 Conclusions

References

10 Deep Learning for Scene-Independent Crowd Analysis

10.1 Introduction

10.2 Large Scale Crowd Datasets

10.2.1 Shanghai World Expo'10 Crowd Dataset

10.2.1.1 Data Collection

10.2.1.2 Annotation

10.2.2 WWW Crowd Dataset

10.2.2.1 Crowd Video Construction

Collecting Keywords

Collecting Crowd Videos

10.2.2.2 Crowd Attribute Annotation

Collecting Crowd Attributes from Web Tags

Crowd Attribute Annotation

10.2.3 User Study on Crowd Attribute

10.3 Crowd Counting and Density Estimation

10.3.1 Method

10.3.1.1 Normalized Crowd Density Map for Training

10.3.1.2 Crowd CNN Model

10.3.2 Nonparametric Fine-Tuning Method for Target Scene

10.3.2.1 Candidate Fine-Tuning Scene Retrieval

10.3.2.2 Local Patch Retrieval

10.3.2.3 Experimental Results

10.4 Attributes for Crowded Scene Understanding

10.4.1 Related Work

10.4.2 Slicing Convolutional Neural Network

10.4.2.1 Semantic Selectiveness of Feature Maps

10.4.2.2 Feature Map Pruning

Affinity Score

Conspicuous Score

10.4.2.3 Semantic Temporal Slices

10.4.3 S-CNN Deep Architecture

10.4.3.1 Single Branch of S-CNN Model

S-CNN-xy Branch

S-CNN-xt/-yt Branch

10.4.3.2 Combined S-CNN Model

10.4.4 Experiments

10.4.4.1 Experimental Setting

Dataset

Evaluation Metrics

Model Pre-Training

10.4.4.2 Ablation Study of S-CNN

Level of Semantics and Temporal Range

Pruning of Features

Single Branch Model vs. Combined Model

10.4.4.3 Comparison with State-of-the-Art Methods

Quantitative Evaluation

Qualitative Evaluation

10.5 Conclusion

References

11 Physics-Inspired Models for Detecting Abnormal Behaviors in Crowded Scenes

11.1 Introduction

11.2 Crowd Anomaly Detection: A General Review

11.3 Physics-Inspired Crowd Models

11.3.1 Social Force Models

11.3.2 Flow Field Models

11.3.3 Crowd Energy Models

11.3.4 Substantial Derivative

11.4 Violence Detection

11.4.1 The Substantial Derivative Model

11.4.1.1 Substantial Derivative in Fluid Mechanics

11.4.1.2 Modeling Pedestrian Motion Dynamics

11.4.1.3 Estimation of Local and Convective Forces from Videos

11.5 Experimental Results

11.5.1 Datasets

11.5.2 Effect of Sampled Patches

11.5.3 Comparison to State-of-the-Art

11.6 Conclusions

References

12 Activity Forecasting

12.1 Introduction

12.2 Overview

12.3 Activity Forecasting as Optimal Control

12.3.1 Toward Decision-Theoretic Models

12.3.2 Markov Decision Processes and Optimal Control

12.3.3 Maximum Entropy Inverse Optimal Control (MaxEnt IOC)

12.4 Single Agent Trajectory Forecasting in Static Environment

12.5 Multiagent Trajectory Forecasting

12.6 Dual-Agent Interaction Forecasting

12.7 Final Remarks

References

Part 3 Metrics, Benchmarks and Systems

13 Integrating Computer Vision Algorithms and Ontologies for Spectator Crowd Behavior Analysis

13.1 Introduction

13.2 Computer Vision and Ontology

13.3 An Extension of the dolce Ontology for Spectator Crowd

13.3.1 Modeling the Spectator Crowd and the Playground in dolce

13.3.2 A Tractable Fragment of dolce

13.4 Reasoning on the Temporal Alignment of Stands and Playground

13.4.1 A New Description Logic for Video Interpretation

13.4.1.1 ALCTemp: Syntax and Semantics

13.4.1.2 Reasoning Services for Video Interpretation

13.4.2 An Example of Application of the Integrated Approach

13.5 Concluding Remarks

Acknowledgments

References

14 SALSA: A Multimodal Dataset for the Automated Analysis of Free-Standing Social Interactions

14.1 Introduction

14.2 Literature Review

14.2.1 Unimodal Approaches

14.2.1.1 Vision-Based Approaches

14.2.1.2 Audio-Based Approaches

14.2.1.3 Wearable-Sensor Based Approaches

14.2.2 Multimodal Approaches

14.3 Spotting the Research Gap

14.3.1 Datasets

14.3.2 ASIA Methodologies

14.3.2.1 Human Tracking and Pose Estimation

14.3.2.2 Speech Processing

14.3.2.3 F-Formation Detection

14.3.3 Why SALSA?

14.4 The SALSA Dataset

14.4.1 Scenario and Roles

14.4.2 Sensors

14.4.3 Ground Truth Data

14.4.3.1 Annotations

14.4.3.2 Personality Data

14.5 Experiments on SALSA

14.5.1 Visual Tracking of Multiple Targets

14.5.2 Head and Body Pose Estimation from Visual Data

14.5.3 F-Formation Detection

14.6 Conclusions and Future Work

References

15 Zero-Shot Crowd Behavior Recognition

15.1 Introduction

15.2 Related Work

15.2.1 Crowd Analysis

15.2.2 Zero-Shot Learning

15.2.2.1 Attributes

15.2.2.2 WordNet

15.2.2.3 Cooccurrence

15.2.2.4 Word-Vector

15.2.3 Multilabel Learning

15.2.4 Multilabel Zero-Shot Learning

15.3 Methodology

15.3.1 Probabilistic Zero-Shot Prediction

15.3.2 Modeling Attribute Relation from Context

15.3.2.1 Learning Attribute Relatedness from Text Corpora

15.3.2.2 Context Learning from Visual Cooccurrence

15.4 Experiments

15.4.1 Zero-Shot Multilabel Behavior Inference

15.4.1.1 Experimental Settings

Dataset

Data Split

Visual Features

Evaluation Metrics

Parameter Selection

15.4.1.2 Comparative Evaluation

Sate-of-the-Art ZSL Models

Context-Aware Multilabel ZSL Models

Quantitative Comparison

Qualitative Analysis

15.4.2 Transfer Zero-Shot Recognition in Violence Detection

15.4.2.1 Experiment Settings

Dataset

Data Split

Zero-Shot Recognition Models

Fully Supervised Model

15.4.2.2 Results and Analysis

15.5 Further Analysis

15.5.1 Feature Analysis

15.5.1.1 Static Features

15.5.1.2 Motion Features

15.5.1.3 Analysis

15.5.2 Qualitative Illustration of Contextual Cooccurrence Prediction

15.6 Conclusions

References

16 The GRODE Metrics

16.1 Introduction

16.2 Metrics in the Literature

16.3 The GRODE Metrics

16.3.1 Detection Accuracy Measures

16.3.2 Cardinality Driven Measures

16.4 Experiments

16.4.1 Datasets

16.4.2 Detection Methods

16.4.3 Detection Accuracy Measures

16.4.4 Cardinality Driven Measures

16.5 Conclusions

References

17 Realtime Pedestrian Tracking and Prediction in Dense Crowds

17.1 Introduction

17.2 Related Work

17.2.1 Motion Models

17.2.2 Pedestrian Tracking with Motion Models

17.2.3 Path Prediction and Robot Navigation

17.3 Pedestrian State

17.3.1 Realtime Multiperson Tracking

17.4 Mixture Motion Model

17.4.1 Overview and Notations

17.4.2 Particle Filter for Tracking

17.4.3 Parametrized Motion Model

17.4.4 Mixture of Motion Models

17.4.5 Formalization

17.5 Realtime Pedestrian Path Prediction

17.5.1 Global Movement Pattern

17.5.2 Local Movement Pattern

17.5.3 Prediction Output

17.6 Implementation and Results

17.6.1 Pedestrian Tracking

17.6.2 Evaluation

17.6.3 Tracking Results

17.6.4 Pedestrian Prediction

17.6.5 Noisy Data

17.6.6 Long-Term Prediction Accuracy

17.6.7 Varying the Pedestrian Density

17.6.8 Comparison with Prior Methods

17.7 Conclusion

Acknowledgments

References

Subject Index

Back Cover

The users who browse this book also browse