Project Awesome project awesome

GitHub

InstrucBLIP enhances the BLIP-2 framework by introducing instruction tuning to its Query Transformer (Q-Former), enabling the model to extract instruction-aware visual features and achieve state-of-the-art zero-shot performance across diverse vision-language tasks.

Package 11.2k stars GitHub
Back to VLM Architectures