DETR-based methods, which use multi-layer transformer decoders to refine object queries iteratively, have shown promising performance in 3D indoor object detection. However, the scene point features ...
Abstract: To mitigate the heavy reliance on semantic information and the unreliability of manual feature extraction in dynamic simultaneous localization and mapping (SLAM) and object tracking systems, ...