Computer Science - Multimedia

RSS

Multimedia Advancing 3D Cross-Modal Retrieval for Unseen Categories

A new framework enhances 3D object retrieval from diverse data types.

2025-07-08T16:31:54+00:00 ― 5 min read

Computer Vision and Pattern Recognition Detecting Fake News in Short Videos

Examining the creative process behind fake news video production.

2025-07-08T09:33:12+00:00 ― 6 min read

Computer Vision and Pattern Recognition Advancements in Visual Scoring with QPT V2

QPT V2 enhances visual scoring using masked image modeling and high-quality data.

2025-07-08T07:26:48+00:00 ― 5 min read

Computer Vision and Pattern Recognition MMTrail: A Comprehensive Video Dataset for Language Models

MMTrail combines visual and audio descriptions for better video-language models.

2025-07-08T06:53:20+00:00 ― 4 min read

Multimedia Protecting Privacy in Multimodal Communication

New method strengthens privacy for shared images and text.

2025-07-08T03:14:00+00:00 ― 5 min read

Computer Vision and Pattern Recognition New Framework Enhances Audio-Visual Question Answering

A new method improves AVQA performance when audio or visual inputs are missing.

2025-07-07T23:40:42+00:00 ― 5 min read

Computer Vision and Pattern Recognition Generating Synchronized Audio for Silent Videos

A method to create audio that matches first-person viewpoint videos.

2025-07-07T23:36:05+00:00 ― 7 min read

Multimedia New Collection of 3D Models for Research

A diverse collection of 3D models for enhanced research opportunities.

2025-07-07T13:24:30+00:00 ― 6 min read

Sound Evaluating Large Language Models in Music Creation

This study examines how well LLMs understand and generate music.

2025-07-07T10:38:45+00:00 ― 5 min read

Sound ChordSync: Aligning Music Chords with Audio

A new model that synchronizes chord annotations with music audio seamlessly.

2025-07-06T22:30:00+00:00 ― 5 min read

Computer Vision and Pattern Recognition New Method Enhances Point Cloud Compression

A unified model improves point cloud compression for better quality and efficiency.

2025-07-06T19:15:40+00:00 ― 6 min read

Cryptography and Security A New Approach to Image Verification

Innovative method adds hidden messages to ensure image authenticity.

2025-07-06T15:01:30+00:00 ― 5 min read

Sound New Method for Detecting Deepfakes Using Audio and Video

A framework that effectively identifies deepfake content through combined audio and visual analysis.

2025-07-06T08:44:05+00:00 ― 5 min read

Sound Assessing Music Understanding with MuChoMusic Benchmark

A new benchmark to evaluate models analyzing music and language.

2025-07-06T05:29:45+00:00 ― 6 min read

Computer Vision and Pattern Recognition Innovative Model for Diagnosing Depression

A new approach merges audio, video, and text data for effective depression diagnosis.

2025-07-06T04:53:12+00:00 ― 8 min read

Multimedia Advancing Audio-Visual Generalized Zero-Shot Learning

A new framework improves classification in unseen audio-visual tasks.

2025-07-06T04:41:10+00:00 ― 6 min read

Computer Vision and Pattern Recognition Advancements in Human Silhouette Segmentation

A new model enhances silhouette segmentation using RF signals for better motion capture.

2025-07-06T03:34:12+00:00 ― 5 min read

Multimedia Addressing Hate Speech in Videos with MultiHateClip Dataset

New dataset provides insights on hate speech across languages and formats.

2025-07-06T02:31:00+00:00 ― 6 min read

Computer Vision and Pattern Recognition Improving Image Compression for Multimodal Models

New framework enhances image processing in multimodal large language models.

2025-07-06T00:56:12+00:00 ― 4 min read

Multimedia AxiomVision: Transforming Video Analytics for Dynamic Environments

AxiomVision offers a new approach to video analysis, enhancing performance in changing conditions.

2025-07-05T14:40:00+00:00 ― 6 min read

Computer Vision and Pattern Recognition Advancements in Violence Detection Technology

New systems combine audio and video for better violence detection in public spaces.

2025-07-05T14:06:40+00:00 ― 5 min read

Multimedia Affordable VR Headsets for Realistic Video Calls

A new system enhances video calls on budget-friendly VR headsets using voice input.

2025-07-05T12:33:36+00:00 ― 6 min read

Computer Vision and Pattern Recognition Navigating the Academic Manuscript Submission Process

A clear guide to manuscript types and submission challenges.

2025-07-04T20:14:00+00:00 ― 4 min read

Multimedia Advancements in Audio-Visual Semantic Segmentation

A new method improves object recognition in videos through sound and visual cues.

2025-07-04T10:13:36+00:00 ― 5 min read

Machine Learning Navigating the Challenges of Long-Tailed Learning

A look at strategies to tackle long-tailed data in machine learning.

2025-07-03T17:06:36+00:00 ― 6 min read

Multimedia Advancements in Multi-View Outlier Detection

A new method improves outlier detection in multi-view datasets with missing views.

2025-07-03T12:38:00+00:00 ― 6 min read

Computer Vision and Pattern Recognition Addressing Hallucination in Multi-modal Language Models

A study on the challenges and solutions for hallucination in MLLMs.

2025-07-03T06:42:30+00:00 ― 4 min read

Multimedia Addressing Noisy Correspondence in Cross-Modal Retrieval

A framework to tackle data noise in cross-modal retrieval techniques.

2025-07-03T06:34:36+00:00 ― 5 min read

Computer Vision and Pattern Recognition Combatting Identity Fraud with IDNet Dataset

A new dataset supports better tools for detecting identity document fraud.

2025-07-02T17:40:24+00:00 ― 7 min read

Computation and Language Introducing MMPKUBase: A Chinese Knowledge Graph

MMPKUBase provides over 52,000 Chinese subjects with rich images.

2025-07-02T17:32:30+00:00 ― 5 min read

Sound Revolutionizing Music Creation with TEAdapter

TEAdapter enhances music generation from text, providing users greater control and creativity.

2025-07-02T17:17:05+00:00 ― 4 min read

Computer Vision and Pattern Recognition Introducing SynopGround: A New Approach to Video Grounding

A novel dataset and method enhance video grounding for complex narratives.

2025-07-02T17:08:48+00:00 ― 8 min read

Computer Vision and Pattern Recognition Advancing Deepfake Detection with MkfaNet

A new method enhances the detection of facial deepfakes.

2025-07-02T17:00:54+00:00 ― 5 min read

Computer Vision and Pattern Recognition Lighthouse: A Tool for Video Moment Retrieval and Highlight Detection

Lighthouse simplifies video moment retrieval and highlight detection for researchers.

2025-07-02T07:08:24+00:00 ― 5 min read

Sound Advancements in Audio Source Separation with RQ-VAE

New machine learning model enhances audio source separation techniques.

2025-07-02T05:08:20+00:00 ― 5 min read

Sound New Method Improves Speech Clarity in Smart Glasses

A system to enhance speech clarity in noisy environments using smart glasses.

2025-07-02T02:42:35+00:00 ― 5 min read

Computer Vision and Pattern Recognition New Dataset Aims to Enhance Cooking Video Analysis

COM Kitchens provides unedited cooking videos to study food preparation processes.

2025-07-01T20:28:30+00:00 ― 5 min read

Computer Vision and Pattern Recognition ReSyncer: A New Approach to Lip-Syncing

ReSyncer improves video quality and flexibility for lip movements synchronized to audio.

2025-07-01T12:18:42+00:00 ― 5 min read

Computer Vision and Pattern Recognition Neural Tuning: A New Approach for Multitask Learning

Introducing neural tuning to improve large models' multitask capabilities effectively.

2025-07-01T09:09:06+00:00 ― 6 min read

Multimedia Advancements in E-Commerce Product Retrieval

A new method enhances product searches across different media formats.

2025-07-01T08:45:24+00:00 ― 6 min read