This is the official implementation of the paper VideoINSTA: Zero-shot Long-Form Video Understanding via Informative Spatial-Temporal Reasoning. The configuration of an experiment is done in a YAML ...