MiniCPM-V is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. The models take image, video and text as inputs and provide high-quality text outputs. Since ...
Abstract: 3D face reconstruction from multiple in-the-wild images in an unsupervised manner poses a significant challenge, primarily due to the pervasive presence of Intrinsic and Extrinsic ...
AI-powered question answering system that grounds responses in YouTube video transcripts using hybrid retrieval and LLM synthesis. This system combines multiple retrieval methods (BM25 keyword search, ...