Advancements in Speech Recognition for People with Disabilities

New methods improve communication tools for individuals with speech difficulties.

Table of Contents

The Challenge of Speech Difficulties
Tackling Speech Recognition Problems
Evaluating Speech Recognition with TORGO
Creating a Better Dataset
Experimenting with Speech Recognition
Results of Our Experiments
Conclusion for a Better Tomorrow
Original Source
Reference Links

People with conditions like cerebral palsy and ALS often have a tough time speaking clearly. This can make it hard for them to get their needs across, especially in healthcare settings where clear communication is key. When doctors and patients can't understand each other, it can create problems. To fix this, we are working on a tool that can help these individuals communicate better with the help of technology.

However, many of the current Speech Recognition tools struggle with non-standard speech patterns, mainly because they haven't had enough practice with this type of speech. Tools that are meant for people who speak normally, like Whisper and Wav2vec2.0, aren't very good at picking up words when the speaker has a speech difficulty. This means there’s a big gap when trying to support people with Speech Difficulties using these tools.

One common way to test how well speech recognition works for people with speech issues is by using a dataset called TORGO. But there’s a catch: sometimes this dataset has overlapping phrases. This means that the same phrases are used by different speakers, which isn’t helpful when trying to train a tool to recognize speech.

We found a way to deal with this overlap problem, and we’re excited to share our findings!

The Challenge of Speech Difficulties

For many people with conditions like ALS and cerebral palsy, speaking can be a major hurdle. This is due to weakness or paralysis affecting the muscles used for speech. As a result, they might have slurred speech or unusual speech patterns, which can lead to miscommunication.

In healthcare settings, where accurate information is vital, these issues can decrease the quality of care. The good news is that there are tools designed to help, known as augmentative and alternative communication (AAC) tools. These tools are built to assist individuals with speech difficulties to express themselves better.

Modern AAC tools like SpeakEase offer the ability to recognize the user’s speech and convert it into text. This gives everyone a better chance to communicate. But the challenge here is that speech recognition tools often have limitations when it comes to understanding atypical speech.

A lot of the speech recognition technology is trained on "normal" speech, leaving those with speech difficulties in a tough spot.

Tackling Speech Recognition Problems

Speech recognition programs need enough data to learn effectively. Unfortunately, data for atypical speech is scarce. While there are many Datasets for typical speech, tools often hit a wall when trying to recognize atypical speech due to a lack of training examples. This makes it tough for speech recognition software to work well with people who have speech difficulties.

To build a better tool, one idea is to use a first-pass recognition system that guesses what the person is saying, and then uses a second step for error-correction, which can help with misunderstood words.

One part of our process involved checking if we could build a better dataset that doesn't include overlapping phrases. This leads to more accurate speech recognition for these individuals.

Evaluating Speech Recognition with TORGO

TORGO is commonly used to test how well speech recognition works for people with speech difficulties. It has recordings from eight speakers who have different levels of speech difficulties, as well as recordings from people with normal speech. The range of data includes single words and entire sentences, which helps to create a more balanced dataset.

However, there’s a significant amount of overlap in the phrases used across different speakers, which can distort the accuracy when testing new systems. If a phrase is already known because it’s been used before, it doesn’t truly test how well the tool can recognize the speech.

In our work, we paid close attention to this overlap issue because it can lead to inflated performance numbers. When reviewing the performance of speech recognition systems, it’s crucial to have a solid understanding of how the tool performs on its own without any advantages from memorized phrases.

Creating a Better Dataset

To improve the situation, we created a new dataset called NP-TORGO. This dataset was generated by carefully selecting phrases so that there’s no overlap between what the training speakers used and what the testing speakers used. Essentially, we wanted to make sure each speaker was tested with phrases they hadn’t encountered during training.

To achieve this, we used a mathematical approach that divides the phrases so that there are no duplicates in the training and testing groups. This way, we can better evaluate how the speech recognition system is functioning.

After solving the overlap issue, we wanted to see how this improved the performance of different speech recognition systems.

Experimenting with Speech Recognition

In our experiments, we checked out how various versions of the Wav2vec2 architecture performed with the new NP-TORGO dataset. We also looked at how well other off-the-shelf systems, like Whisper, performed when confronted with atypical speech.

During the process, we discovered some key points. One major finding was that when the speech recognition system was tested on the original TORGO dataset, it performed well. But when we tested it on NP-TORGO, the system struggled. This suggested that the original success was partially due to the overlap of phrases rather than true recognition capability.

We also evaluated how language models play a role in this process. Language models help predict what the next word should be based on what has already been said. In the context of NP-TORGO, we noticed that language models that were trained outside of the dataset seemed to perform better when there were no overlaps.

Results of Our Experiments

The results from our experiments shed light on how both the speech recognition and language models work together. We looked closely at the word error rates (WER) and other performance indicators to gauge the effectiveness of different approaches.

From our results, it was evident that simply using standard language models wasn’t enough in cases with atypical speech. Instead, we found that a cross-modal error-correction system called Whispering-LLaMA showed some promise.

This system takes audio input and uses that to improve the accuracy of the transcribed text generated by the speech recognition tool. While this was helpful in some ways, it also highlighted that there is still a long way to go before these systems can adequately support those with speech difficulties.

Conclusion for a Better Tomorrow

In our quest to improve communication for individuals with speech difficulties, we’ve come a long way, but there’s still much to do. While we’ve made strides in addressing the prompt overlap issue and leveraging error-correction systems, the fact remains that many speech recognition tools are not yet ready to serve those who need them most.

We hope that our findings will spark further research and development in this important area. By improving the tools available for those with speech difficulties, we can help ensure that everyone has access to clear and effective communication, making healthcare more accessible for all.

As we continue to delve into this critical field, we are optimistic that with more attention and resources, we can create a future where communication barriers are a thing of the past. After all, everyone deserves to be heard, even if their speech is a little less than perfect.

Advancements in Speech Recognition for People with Disabilities

The Challenge of Speech Difficulties

Tackling Speech Recognition Problems

Evaluating Speech Recognition with TORGO

Creating a Better Dataset

Experimenting with Speech Recognition

Results of Our Experiments

Conclusion for a Better Tomorrow

Reference Links

Referenced Topics

Similar Articles

Advancements in Speech Recognition for People with Disabilities

#The Challenge of Speech Difficulties

#Tackling Speech Recognition Problems

#Evaluating Speech Recognition with TORGO

#Creating a Better Dataset

#Experimenting with Speech Recognition

#Results of Our Experiments

#Conclusion for a Better Tomorrow

Reference Links

Referenced Topics

Similar Articles

The Challenge of Speech Difficulties

Tackling Speech Recognition Problems

Evaluating Speech Recognition with TORGO

Creating a Better Dataset

Experimenting with Speech Recognition

Results of Our Experiments

Conclusion for a Better Tomorrow