Project Awesome project awesome

MobileVLM

Offers a mobile-optimized vision-language model that combines a CLIP ViT-L/14 visual encoder with the efficient MobileLLaMA language model and a Lightweight Downsample Projector (LDP), enabling effective multimodal processing and alignment within the constraints of mobile devices.

Package 1.3k stars GitHub
Back to VLM Architectures