Long-form video representation learning (Part 2: Video as sparse transformers) | Towards Data Science

By Noble Pilot · March 16, 2026 · 1 min read

cvpr
cvpr 2024
sparse transformer
video text pretraining
subarna.tripathi

We explore novel video representations methods that are equipped with long-form reasoning capability. This is part II focusing on sparse video-text transformers. See Part I on video as graphs. And Part III provides a sneak peek into our latest and greatest explorations. The first blog in this series was about learning explicit sparse graph-based video […]