Project Awesome project awesome

AlphaCLIP

builds upon the CLIP model by incorporating region awareness through the addition of an alpha channel to the image encoder, trained on millions of RGBA region-text pairs, enabling precise control over image emphasis and enhancing performance across various tasks requiring detailed spatial understanding.

Package 869 stars GitHub
Back to VLM Architectures