Project Awesome project awesome

VideoChat-Flash

is a system designed for handling long-form video content in multimodal large language models (MLLMs). It introduces a Hierarchical visual token Compression (HiCo) method to reduce computational load while preserving essential details, and uses a multi-stage learning approach with a new long-video dataset (LongVid) to achieve state-of-the-art performance on both long and short video benchmarks.

Package 509 stars GitHub
Back to VLM Architectures