Yunlong (Yolo) Tang

About
Blog

Visitor Map

Thank you for your visit!

CaptionAnything in Video (CAT-V)

CAT-V is a training-free framework that enables fine-grained object-centric video captioning through spatiotemporal visual prompting and chain-of-thought reasoning.

© Copyright 2026 Yunlong (Yolo) Tang. Theme by al-folio.