We introduce TASTE-Rob: 1) a dataset with 100,856 task-oriented hand-object interaction videos, 2) a three-stage pose-refinement video generation pipeline. With the above contributions, TASTE-Rob is ...
Abstract: Nowadays, high-resolution remote sensing images provide rich data sources and deep learning models show powerful feature representation capability for remote sensing object detection.