SHEN Jifeng, SHENG Changbao, CHEN Yifei, ZUO Xin
To solve the severe miss detection problem in pedestrian detection caused by insufficient pixel information of distant targets and occlusion induced loss of human pattern information, the pedestrian detection method was proposed based on dual key points combinations. The discriminative semantic features of pedestrians were effectively extracted and fused by utilizing key points of the head and center regions for significantly reducing the pedestrian miss detection rate. The deformable convolution was introduced into the deep aggregation backbone feature network to enlarge the receptive field and enhance the semantic information of human pattern. The dual branch joint detection module based on key points combinations was designed, and the positive samples for different branches were redefined to strengthen the semantic information of small scale and occluded targets. The results of the dual branch detection were fused using the non maximum suppression (NMS) algorithm. The results show that on the CityPerson validation dataset, the average miss detection rates of the normal, small scale and heavily occluded subsets reach 8.24%, 11.81% and 30.59%, respectively. Especially, for the heavily occluded subset, the miss detection rate is reduced by 15.71% compared to the traditional method ACSP. By the proposed method, the detection speed reaches 16 frames per second. On the CrowdHuman dataset, the average precision and average miss detection rate are 86.30% and 45.52%, respectively. Compared with other state of the art methods, the proposed method exhibits superior performance in average precision, miss detection rate and detection speed, which demonstrates significant application value in complex scenarios with dense pedestrian crowds.