Long-form video representation learning (Part 2: Video as sparse transformers) | Towards Data Science
We explore novel video representations methods that are equipped with long-form reasoning capability. This is part II focusing on sparse video-text transformers. See Part I on video as graphs. And ...

Source: Towards Data Science
We explore novel video representations methods that are equipped with long-form reasoning capability. This is part II focusing on sparse video-text transformers. See Part I on video as graphs. And Part III provides a sneak peek into our latest and greatest explorations. The first blog in this series was about learning explicit sparse graph-based video […]