This is the official implementation of the paper VideoINSTA: Zero-shot Long-Form Video Understanding via Informative Spatial-Temporal Reasoning. The configuration of an experiment is done in a YAML ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results