Abstract:For visual surveillance, the semantic content of video was modelled by the states of motion targets. Feature vectors were clustered based on DBSCAN to obtain activity patterns which represent potential structure of training set. Then a complex events detection model was investigated by ultilizing high level Petri nets to model the temporal dependence relationship of activity patterns, like continuum and concurrence. The unsupervised learning process of activity patterns was robust to low level noise. By combining Petri framework representation, inference of semantic events could be more flexibly. In the experiments, Petri nets modeling process was demonstrated based on the results of clustering and the validity was given by the detection of two interesting semantic events ‘staying’ and ‘stealing’.