Project Awesome project awesome

CogVLM

enhances pretrained language models with a dedicated visual expert module, incorporating a QKV matrix and MLP within each layer to achieve deep visual-language feature alignment, enabling superior performance in multimodal tasks such as image captioning and visual question answering.

Package 6.7k stars GitHub
Back to VLM Architectures