Family of tunable vision-language models based on Gemma 2 generate long captions for images that describe actions, emotions, and narratives of the scene. Google has introduced a new family of ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results
Feedback