Abstract:Possible safety-critical events (SCEs) were identified from the naturalistic driving data using a threshold method. Random forests (RF) and support vector machine (SVM) models were employed to further screen the possible events, overcoming the defect of a high false positive rate while applying threshold methods solely. A set of threshold criteria was established and 3 623 possible SCEs were extracted from the naturalistic driving data in Shanghai. The RF method was adopted to select the important features as input variables. The RF and SVM models were trained and tested respectively on the same dataset. The results indicate that:the mean and minimum value of longitudinal acceleration, the minimum value of the distance from the leading vehicle and the standard deviation of the speed of the subject vehicle can effectively determine whether the possible events are valid or not.Compared with RF, SVM performs better in prediction, that is, filtering 85.9% invalid events and controlling false negative rate simultaneously.