This repository contains the official implementation of Robust Residual Finite Scalar Quantization (RFSQ), a novel quantization framework that addresses the residual magnitude decay problem in naive ...
We introduce the Progressive Visual Token Compression (PVC) in large vision-language models (VLMs), which unifies the visual inputs as videos and progressively compresses vision tokens across video ...