Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
摘要和介绍
摘要和介绍
论文名称:GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
论文名称:Visual Instruction Tuning
论文名称:Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic
论文名称:VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks