Video object segmentation (VOS) is a significant yet challenging task in computer vision. In VOS, two challenging problems, including occlusions and distractions, are needed to be handled especially in multi-object videos. However, most existing methods have difficulty in efficiently tackling these two factors. To this end, a new semi-supervised VOS model, called Distance-Guided Mask Propagation Model (DGMPM), is proposed in this paper. Specifically, a novel embedding distance module, which is utilized to generate a soft cue for handling occlusions, is implemented by calculating distance difference between target features and the centers of foreground/background features. This non-parametric module that is based on global contrast between the target and reference features to detect target object regions even if occlusions still exist, is less sensitive to the feature scale. The prior knowledge of the previous frame is applied as spatial guidance in the decoder to reduce the effect of distractions. In addition, spatial attention blocks are designed to strengthen the network to focus on the target object and rectify the prediction results. Extensive experiments demonstrate that the proposed DGMPM achieves competitive performance on accuracy and runtime in comparison with state-of-the-art methods.
Leave a Reply