Family of tunable vision-language models based on Gemma 2 generate long captions for images that describe actions, emotions, and narratives of the scene. Google has introduced a new family of ...