MiniCPM-V is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. The models take image, video and text as inputs and provide high-quality text outputs. Since ...
Abstract: 3D face reconstruction from multiple in-the-wild images in an unsupervised manner poses a significant challenge, primarily due to the pervasive presence of Intrinsic and Extrinsic ...
AI-powered question answering system that grounds responses in YouTube video transcripts using hybrid retrieval and LLM synthesis. This system combines multiple retrieval methods (BM25 keyword search, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results