Examining AI's Role in Mental Health Detection
This study evaluates AI models for identifying risks in Chinese social media posts.
― 6 min read
Table of Contents
Mental health issues like depression are significant concerns worldwide. In China, about 6.9% of people are estimated to experience depression, which can sometimes lead to suicidal thoughts or actions. Social media platforms, such as Weibo, have become places where individuals openly share their feelings, including negative emotions and suicidal thoughts. Therefore, identifying these issues quickly can help in providing timely support and possible interventions.
Artificial intelligence (AI) is playing an increasingly important role in recognizing emotions through text. Recent advancements in AI, particularly with deep learning techniques, have created many tools designed to analyze sentiments in written content. However, building effective AI models can be complicated and costly, often requiring a lot of labeled data, which can include expert opinions. This highlights a need for more adaptable and practical solutions, especially in healthcare.
Large Language Models are noteworthy because they are able to learn from vast amounts of text data. These models can produce text that mimics human language. Despite their potential, research into their usefulness for real-world applications, particularly in mental health, is still limited. While some studies focus on English, there is a gap in research regarding Chinese social media data, especially in understanding emotions in-depth. This study aims to address this gap by comparing traditional Supervised Learning methods with large language models in identifying Cognitive Distortions and suicidal risks in Chinese social media posts.
The Importance of Mental Health in the Digital Age
The rise of social media has changed how people express their emotions. These platforms generate a large amount of data that reflects users' thoughts and feelings. Understanding this emotional content is essential, especially in detecting negative sentiments that may lead to serious mental health issues. The ability to assess these sentiments quickly and accurately can play a critical role in preventing tragedies.
The Role of Artificial Intelligence
AI and deep learning technologies have shown promise in analyzing emotions from text. Many algorithms have been developed specifically for this purpose. However, challenges still exist, including the need for extensive labeled datasets and the high costs associated with building and maintaining these systems. This has raised the need for more flexible and efficient solutions, particularly in sectors like healthcare where reliability is crucial.
Large Language Models: An Overview
Large language models represent a significant advance in computational linguistics. They can analyze and generate complex text based on extensive training data. Although many studies have showcased their potential, most of them have focused on English datasets, creating a gap in understanding their effectiveness in other languages and contexts.
Research Focus
This study investigates two critical tasks: identifying Suicide Risks and recognizing cognitive distortions based on content from Chinese social media. The research compares supervised learning methods with large language models, assessing their effectiveness in these specific contexts.
Task 1: Identifying Suicide Risks
The first task involves classifying content to determine whether it indicates low or high suicide risk. This is essential for guiding appropriate interventions and support.
Data Collection
Data was gathered from Weibo, a popular Chinese social media platform. A team of psychologists annotated the collected posts to label them as either low or high risk for suicide. This labeled data provided a foundation for training and testing the models.
Task 2: Recognizing Cognitive Distortions
The second task focuses on identifying cognitive distortions in the content. Cognitive distortions are flawed patterns of thinking that can negatively affect mental health. The labels used include various types of distortions, such as all-or-nothing thinking, emotional reasoning, and more.
Comparison of Methods
The study compares two primary methods: supervised learning and large language models. For supervised learning, two models were utilized: LSAN and BERT. The LSAN model is designed to recognize relationships between different labels, making it suitable for identifying cognitive distortions. BERT is known for its robust performance in various language tasks.
In addition to these traditional methods, large language models such as GPT-3.5 and GPT-4 were employed. Various prompting strategies were applied to evaluate their performance in identifying suicide risks and cognitive distortions, ranging from basic task requests to more complex role and scene definitions.
Experimental Design and Evaluation
The research followed a structured approach to testing the different models. Data was split into training and testing sets, and performance was measured using precision, recall, and F1 scores. Precision indicates the accuracy of the positive predictions, while recall assesses how well the models identified actual positive cases. The F1 score combines these metrics into a single measure of performance.
Results for Suicide Risk Classification
The performance of various models was evaluated in classifying suicide risk. The results showed that the LSAN model performed slightly better than BERT. However, fine-tuning the GPT-3.5 model yielded significant improvements, bringing its performance closer to that of traditional supervised learning methods.
Prompt Design for Large Language Models
Different prompt designs were tested for large language models. The hybrid prompt strategy, which combined various approaches, was particularly effective. However, increasing the amount of training data did not consistently lead to better performance for all models.
Results for Cognitive Distortion Classification
In the cognitive distortion identification task, fine-tuning the GPT-3.5 model did not yield the expected improvements and even resulted in decreased performance compared to its initial state. This underscores the complexity involved in training language models for specific tasks.
Comparative Analysis of Models
The research highlighted interesting trends among the different models. Generally, larger models tend to outperform smaller counterparts. However, fine-tuning could lead to remarkable achievements, as seen when GPT-3.5 surpassed GPT-4 in specific instances after fine-tuning.
Cross-Task Comparison
The study found that as tasks became more complex, the performance of large language models declined. In contrast, supervised learning models maintained stable performance across both binary and multi-label classification tasks. This suggested that while language models can be effective for simple tasks, they may not be suitable replacements for supervised learning in more complex scenarios.
Conclusion
This research examined the effectiveness of large language models and supervised learning in recognizing cognitive distortions and suicide risks on Chinese social media. The findings indicated that while large language models show promise, they are not yet comprehensive substitutes for traditional supervised learning algorithms, especially in specialized tasks. Fine-tuning can improve performance on simpler tasks but might not work as well for more complex challenges. There is a clear need for customization based on the specific task and model size.
Future Directions
The study has limitations, including token constraints that affected certain tests. Future work should explore a broader range of tasks and models to gain a deeper understanding of the comparative effectiveness of language models and supervised learning. Additionally, further investigation into fine-tuning methods and prompt design could help optimize model performance across various applications.
Title: Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media
Abstract: On social media, users often express their personal feelings, which may exhibit cognitive distortions or even suicidal tendencies on certain specific topics. Early recognition of these signs is critical for effective psychological intervention. In this paper, we introduce two novel datasets from Chinese social media: SOS-HL-1K for suicidal risk classification and SocialCD-3K for cognitive distortions detection. The SOS-HL-1K dataset contained 1,249 posts and SocialCD-3K dataset was a multi-label classification dataset that containing 3,407 posts. We propose a comprehensive evaluation using two supervised learning methods and eight large language models (LLMs) on the proposed datasets. From the prompt engineering perspective, we experimented with two types of prompt strategies, including four zero-shot and five few-shot strategies. We also evaluated the performance of the LLMs after fine-tuning on the proposed tasks. The experimental results show that there is still a huge gap between LLMs relying only on prompt engineering and supervised learning. In the suicide classification task, this gap is 6.95% points in F1-score, while in the cognitive distortion task, the gap is even more pronounced, reaching 31.53% points in F1-score. However, after fine-tuning, this difference is significantly reduced. In the suicide and cognitive distortion classification tasks, the gap decreases to 4.31% and 3.14%, respectively. This research highlights the potential of LLMs in psychological contexts, but supervised learning remains necessary for more challenging tasks. All datasets and code are made available.
Authors: Hongzhi Qi, Qing Zhao, Jianqiang Li, Changwei Song, Wei Zhai, Dan Luo, Shuo Liu, Yi Jing Yu, Fan Wang, Huijing Zou, Bing Xiang Yang, Guanghui Fu
Last Update: 2024-06-09 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2309.03564
Source PDF: https://arxiv.org/pdf/2309.03564
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.
Reference Links
- https://github.com/thudm/chatglm2-6b
- https://huggingface.co/spaces/mikeee/chatglm2-6b-4bit
- https://github.com/THUDM/GLM-130B
- https://chatglm.cn/detail
- https://chat.openai.com/
- https://openai.com/blog/gpt-3-5-turbo-fine-tuning-and-api-updates
- https://platform.openai.com/docs/guides/fine-tuning
- https://www.weibo.com/xiaofan116?is_all=1