Spica: A New Tool for Blind Users
Spica enhances video access for blind and low-vision users through interactivity.
― 4 min read
Table of Contents
- What is Spica?
- Why Interactive Video Matters
- Features of Spica
- Interactive Exploration
- Audio Description Layering
- Spatial Sound
- High-Contrast Visual Cues
- User Study
- Participant Demographics
- Study Process
- Findings
- Overall User Experience
- Feature Feedback
- Sound Effects
- Challenges and Opportunities
- Future Directions
- Conclusion
- Summary of Contributions
- Original Source
- Reference Links
Blind or low-vision (BLV) users often depend on Audio Descriptions (AD) to understand video content. However, traditional AD can be limited; they often miss key details, demand high focus, and do not meet all user needs. To improve this experience, we present Spica, a tool that allows BLV users to interactively engage with videos.
What is Spica?
Spica is a system that uses artificial intelligence (AI) to help BLV users explore video content. Unlike regular AD, Spica provides Interactive Features that let users engage with the content in a more personalized way. This includes navigating through scenes and focusing on specific objects within the video frames.
Why Interactive Video Matters
AD plays a vital role in making videos accessible to BLV users. However, traditional methods often fail to provide enough information. Users can feel overwhelmed due to a lack of detail. With Spica, users can control their viewing experience better and reduce mental fatigue by exploring content at their own pace.
Features of Spica
Interactive Exploration
Spica allows users to move through video scenes and focus on objects. Users can:
- Navigate through different scenes using keyboard or touch controls.
- Receive detailed descriptions of specific objects, enhancing their understanding of the visual content.
- Listen to spatial sound effects that provide more context and immersion.
Audio Description Layering
Spica offers layered audio descriptions. The base layer includes the original AD, and users can request additional details about objects. This ensures users get the depth of information they seek without feeling overwhelmed.
Spatial Sound
Spica uses sound effects linked to objects in a video. These sounds help users understand where objects are located in relation to each other, making the experience more immersive.
High-Contrast Visual Cues
For those with some vision, Spica provides a high-contrast color mask to highlight selected objects. This allows users to locate and identify objects within the video frames easily.
User Study
To assess Spica's effectiveness, we conducted a study with 14 BLV participants. The participants watched videos using both traditional AD and Spica, enabling us to compare their experiences.
Participant Demographics
The participants varied in age, gender, and level of vision impairment. Each participant had different experiences with video content, which allowed us to gather a broad range of feedback.
Study Process
Participants engaged with two different videos in two formats: using Spica and traditional AD. They were encouraged to interact with the content using the system's features. After watching, participants rated their experience regarding understanding and immersion.
Findings
Overall User Experience
Participants reported that Spica significantly improved their understanding of the videos. They appreciated the ability to explore content on their terms, as it allowed them to engage more deeply with the material.
Feature Feedback
Temporal Exploration
Users liked the ability to navigate through scene timelines. They felt this control helped them connect better with the video's story. However, some noted that frequent pauses could disrupt the flow of the narrative.
Object Exploration
Features allowing users to explore objects were well received. Many users opted to examine individual items after hearing the original AD or new sounds in the video. They liked how this added depth to their understanding.
Sound Effects
The spatial sound associated with objects was highlighted as beneficial. Participants found it helpful in understanding the position and context of objects in the scene, enhancing their overall experience.
Challenges and Opportunities
While Spica demonstrated many advantages, some challenges were identified. For instance, there were times when descriptions did not match the video content, leading to confusion. Users managed to navigate these errors by comparing different scenes and using contextual clues.
Future Directions
Spica offers significant opportunities for enhancing video accessibility. Future improvements could focus on refining the description generation process, providing customizable audio descriptions, and further exploring spatial audio techniques to create a richer experience.
Conclusion
Spica represents a step forward in making video content more accessible to BLV users. By allowing for interactive engagement and offering detailed audio descriptions, it addresses the limitations of traditional methods. The feedback from our user study highlights its potential to enhance understanding and immersion, making videos more enjoyable for everyone.
Summary of Contributions
- Presenting Spica as an interactive tool that enhances video accessibility for BLV users.
- Conducting a user study that demonstrates Spica's effectiveness in improving user experience.
- Offering insights into feature preferences and areas for future research in accessible video consumption tools.
Title: SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
Abstract: Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, SPICA offers novel interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, SPICA augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of SPICA and explored user behaviors, preferences, and mental models when interacting with augmented ADs.
Authors: Zheng Ning, Brianna L. Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, Toby Jia-Jun Li
Last Update: 2024-02-26 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2402.07300
Source PDF: https://arxiv.org/pdf/2402.07300
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.
Reference Links
- https://www.robots.ox.ac.uk/~vgg/data/queryd/
- https://youdescribe.org
- https://labelme.csail.mit.edu/Release3.0/
- https://www.microsoft.com/en-us/ai/seeing-ai
- https://platform.openai.com/docs/guides/vision
- https://freesound.org
- https://freesound.org/docs/api/
- https://react.dev
- https://flask.palletsprojects.com
- https://cloud.google.com/text-to-speech