Latest Articles for Multimodal

Computer Vision and Pattern Recognition Introducing Learnable Attention Mask for Multimodal Tasks

A new method to improve attention mechanisms in complex data processing.

2025-08-02T16:00:36+00:00 ― 7 min read

Machine Learning Enhancing Human Activity Recognition with Multimodal Data

A new approach improves activity recognition by combining various data types.

2025-08-01T05:30:48+00:00 ― 7 min read

Computer Vision and Pattern Recognition Setokim: Advancing Multimodal Language Models

Setokim enhances the fusion of visual and text understanding through innovative tokenization.

2025-08-01T00:06:54+00:00 ― 8 min read

Computation and Language mOSCAR: A New Dataset for Multimodal AI

mOSCAR provides a multilingual dataset for improved AI understanding of text and images.

2025-07-30T03:13:00+00:00 ― 6 min read

Machine Learning Integrating Vision and Language in the Brain

This study reveals how the brain combines visual and language information.

2025-07-22T21:29:48+00:00 ― 4 min read

Computer Vision and Pattern Recognition Evaluating Multimodal Learning in Language Models

This study examines how visual and textual data affect model performance.

2025-07-22T07:03:54+00:00 ― 7 min read

Computation and Language A New Approach to Model Predictions

Innovative ensemble method improves accuracy of language and visual models.

2025-07-19T17:10:54+00:00 ― 7 min read

Computer Vision and Pattern Recognition Referring Audio-Visual Segmentation: A New Approach

Combining audio and visual information enhances object recognition in videos.

2025-07-13T10:17:30+00:00 ― 6 min read

Computation and Language Improving Fake News Detection with IMFND Framework

A new framework enhances fake news detection using text and images.

2025-07-12T22:34:24+00:00 ― 4 min read

Computer Vision and Pattern Recognition Advancements in Multimodal Learning Techniques

A novel approach to improve multimodal learning with missing data.

2025-07-08T01:39:12+00:00 ― 5 min read

Machine Learning Advancing Healthcare with Multimodal Machine Learning

A new framework enhances disease prediction using diverse healthcare data.

2025-07-07T03:16:12+00:00 ― 6 min read

Computation and Language Evaluating Zero-Shot Capabilities of Multimodal LLMs

An assessment of multimodal LLMs' zero-shot performance across various tasks.

2025-07-05T08:36:36+00:00 ― 5 min read

Digital Libraries HALvest: A New Dataset for Academic Research

HALvest combines citation networks and texts for enhanced research insights.

2025-07-04T17:51:48+00:00 ― 5 min read

Methodology Analyzing Circular Data in Bird Migration Studies

Explore how circular data impacts bird migration analysis and conservation efforts.

2025-07-03T21:54:04+00:00 ― 5 min read

Computation and Language Advancing Training for Multimodal Large Language Models

A new system improves the efficiency of training multimodal large language models.

2025-07-01T16:55:12+00:00 ― 6 min read

Computer Vision and Pattern Recognition Improving Efficiency in Multimodal Model Training

A new method enhances efficiency and performance of multimodal large language models.

2025-06-30T21:33:54+00:00 ― 5 min read

Machine Learning Improving Sampling Methods for Complex Distributions

A new technique simplifies sampling from complex probability distributions in data science and finance.

2025-06-26T23:00:44+00:00 ― 6 min read

Machine Learning Improving Computer Understanding Through Text-Centric Methods

This article discusses how converting data into text enhances computer understanding.

2025-06-25T14:22:30+00:00 ― 6 min read

Computation and Language In-Context Learning: Navigating Challenges in AI Models

Exploring how large language models learn from examples in various contexts.

2025-06-23T02:12:12+00:00 ― 6 min read

Machine Learning Inf-MLLM: A New Approach to Multimodal Processing

Inf-MLLM enhances efficiency in handling complex data streams with limited resources.

2025-06-14T00:57:12+00:00 ― 5 min read

Computation and Language Understanding Intent in Bangla Social Media Posts

A framework to analyze Bangla social media content through text and images.

2025-06-12T03:31:42+00:00 ― 5 min read

Computer Vision and Pattern Recognition Improving Video Anomaly Detection Techniques

A new method combines video, audio, and algorithms for better anomaly detection.

2025-06-10T15:03:24+00:00 ― 7 min read

Computer Vision and Pattern Recognition The Future of Multimodal Search Engines

Examining the role of LMMs in transforming search capabilities with text and images.

2025-06-09T12:35:30+00:00 ― 6 min read

Computer Vision and Pattern Recognition Introducing InfiMM-WebMath-40B: A New Dataset for Multimodal Mathematical Reasoning

A new dataset aims to enhance multimodal reasoning in language models.

2025-06-09T06:24:12+00:00 ― 6 min read

Computation and Language Assessing Multimodal Language Models with OmniBench

A new tool evaluates large language models' performance across multiple data types.

2025-06-07T16:21:06+00:00 ― 5 min read

Information Retrieval Advancing Multimodal Recommendation Systems Through Better Feature Extraction

A study on improving recommendation systems by focusing on feature extraction techniques.

2025-06-06T12:02:36+00:00 ― 7 min read

Computer Vision and Pattern Recognition Mapping Rhino Middens to Enhance Conservation Efforts

A new method tracks rhinos using their waste locations to combat poaching.

2025-06-05T05:21:54+00:00 ― 7 min read

Computer Vision and Pattern Recognition New Models Transforming Multimodal AI

Recent models enhance AI's ability to generate and understand various media.

2025-06-04T08:49:30+00:00 ― 5 min read

Machine Learning How Robots Combine Senses for Better Interaction

Robots learn to merge sensory information for improved understanding and response.

2025-05-28T12:36:39+00:00 ― 7 min read

Artificial Intelligence Combining Numbers and Words for Better Forecasting

Scientists blend time series data with text to improve weather predictions.

2025-05-25T10:19:12+00:00 ― 7 min read

Computation and Language Investigating Cross-Modal Consistency in AI Models

Examining how AI models handle text and images together.

2025-05-23T05:28:03+00:00 ― 7 min read

Computation and Language Enhancing Reasoning in Multimodal Models

A new method improves reasoning skills in language models using preference optimization.

2025-05-22T16:25:03+00:00 ― 4 min read

Artificial Intelligence AdaptAgent: A New Way for Web Agents to Learn

AdaptAgent helps web agents learn tasks using fewer demonstrations.

2025-05-16T13:08:00+00:00 ― 7 min read

Computation and Language How Sound Helps Machines Understand Jokes

Sound cues improve machines' grasp of humor and wordplay.

2025-04-29T08:07:15+00:00 ― 4 min read

Artificial Intelligence Transforming Medical Diagnosis with Multimodal Data

Combining various medical data types enhances diagnosis and treatment planning.

2025-04-23T11:11:15+00:00 ― 6 min read

Computation and Language BabyLM Challenge: Bridging Kids and AI in Language Learning

A competition aimed at improving how machines learn languages like children do.

2025-04-07T04:21:27+00:00 ― 8 min read

Computer Vision and Pattern Recognition COEF-VQ: The Future of Video Quality on Social Media

Discover how COEF-VQ ensures high video quality for better user experiences.

2025-03-22T17:36:09+00:00 ― 7 min read

Machine Learning Revolutionizing Stock Predictions with New Models

Higher Order Transformers enhance stock movement predictions using diverse data sources.

2025-03-16T17:34:20+00:00 ― 9 min read

Computation and Language RapGuard: A New Safety Shield for AI Models

RapGuard offers context-aware safety for multimodal large language models.

2025-01-25T11:47:51+00:00 ― 7 min read

Computer Vision and Pattern Recognition A New Era in Visual Question Answering

Advancements in AI enhance visual question answering capabilities.

2025-01-18T05:39:18+00:00 ― 6 min read