Real-Time Crowd Counting for Embedded Systems with Lightweight Architecture
By: Zhiyuan Zhao , Yubin Wen , Siyu Yang and more
Potential Business Impact:
Counts people in pictures super fast.
Crowd counting is a task of estimating the number of the crowd through images, which is extremely valuable in the fields of intelligent security, urban planning, public safety management, and so on. However, the existing counting methods have some problems in practical application on embedded systems for these fields, such as excessive model parameters, abundant complex calculations, etc. The practical application of embedded systems requires the model to be real-time, which means that the model is fast enough. Considering the aforementioned problems, we design a super real-time model with a stem-encoder-decoder structure for crowd counting tasks, which achieves the fastest inference compared with state-of-the-arts. Firstly, large convolution kernels in the stem network are used to enlarge the receptive field, which effectively extracts detailed head information. Then, in the encoder part, we use conditional channel weighting and multi-branch local fusion block to merge multi-scale features with low computational consumption. This part is crucial to the super real-time performance of the model. Finally, the feature pyramid networks are added to the top of the encoder to alleviate its incomplete fusion problems. Experiments on three benchmarks show that our network is suitable for super real-time crowd counting on embedded systems, ensuring competitive accuracy. At the same time, the proposed network reasoning speed is the fastest. Specifically, the proposed network achieves 381.7 FPS on NVIDIA GTX 1080Ti and 71.9 FPS on NVIDIA Jetson TX1.
Similar Papers
Density Estimation and Crowd Counting
CV and Pattern Recognition
Counts people in videos more accurately and faster.
A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals
CV and Pattern Recognition
Counts crowds better using pictures and signals.
FusionCounting: Robust visible-infrared image fusion guided by crowd counting via multi-task learning
CV and Pattern Recognition
Helps cameras count people better in crowded places.