
LLaVA
LLaVA Model. We introduce LLaVA (L arge L anguage- a nd- V ision A ssistant), an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose …
LLaVA-Plus - GitHub Pages
🌋 LLaVA-Plus Model. We have developed LLaVA-Plus, a general-purpose multimodal assistant that extends LLaVA by incorporating a large and diverse set of external tools that can be …
LLaVA-NeXT: A Strong Zero-shot Video Understanding Model
Jan 30, 2024 · It is natural to further tune the model on video data for performance boost. Our analysis reveals that a mixed training regimen of video and image data is essential for …
LLaVA-NeXT: Tackling Multi-image, Video, and 3D in Large …
May 25, 2024 · This task enables the model to interact with a 3D environment to solve problems or answer questions by navigating and manipulating its surroundings, which are essential for …
LLaVA-Interactive
Jun 10, 2024 · LLaVa-Interactive is an all-in-one demo that connects three LV models in one interactive session for image chat, segmentation and generation/editing, which can complete …
LLaVA-NeXT: Improved reasoning, OCR, and world knowledge
Jan 30, 2024 · We're currently training our next-gen model Llama 3, and we're building massive compute infrastructure to support our future roadmap, including 35k H100s by the end of this …
LLaVA-NeXT: Stronger LLMs Supercharge Multimodal Capabilities …
May 10, 2024 · On January 30, 2024, we unveiled LLaVA-NeXT, a state-of-the-art Large Multimodal Model (LMM) developed using a cost-effective training method leveraging open …
LLaVA-OneVision: Easy Visual Task Transfer
Aug 5, 2024 · Our experimental results demonstrate show that LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three …
LLaVA-NeXT: What Else Influences Visual Instruction Tuning …
May 25, 2024 · The model size scaling of LLM is more effective than image encoder in yielding improved performance. The success of the latter is more related to its visual input …
LLaVA-Grounding
We present an end-to-end model, which connects a Large Multimodal Model (LMM) with a grounding model to facilitate grounded visual chat. Our model supports both object and pixel …