UniAV combines action localization, sound detection, and audio-visual event localization for better video understanding.
― 7 min read
Cutting edge science explained simply
UniAV combines action localization, sound detection, and audio-visual event localization for better video understanding.
― 7 min read
LongVALE provides a new benchmark for understanding long videos through audio-visual data.
― 7 min read