Family of tunable vision-language models based on Gemma 2 generate long captions for images that describe actions, emotions, and narratives of the scene. Google has introduced a new family of ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results
Feedback