Innovative Data Synthesis for Sentiment Analysis
A new approach to enhance sentiment analysis in low-resource scenarios.
Hongling Xu, Yice Zhang, Qianlong Wang, Ruifeng Xu
― 3 min read
Table of Contents
Hongling Xu, Yice Zhang, Qianlong Wang, Ruifeng Xu
Harbin Institute of Technology, Shenzhen, China
Peng Cheng Laboratory, Shenzhen, China
Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies
Emails: [email protected], [email protected]
Abstract
Large language models (LLMs) can help tackle data scarcity in low-resource situations like few-shot aspect-based sentiment analysis (ABSA). Previous methods using LLMs for Data Augmentation often lack diversity and relevancy. We present DS -ABSA, which uses two approaches: key-point-driven and instance-driven data synthesis. This framework effectively generates diverse and high-quality ABSA samples in low-resource situations, while a Label Refinement module enhances the quality of generated labels. Our experiments show that DS -ABSA significantly outperforms other methods in few-shot ABSA, demonstrating its potential for practical applications.
Introduction
Aspect-based sentiment analysis (ABSA) identifies sentiment towards specific aspects in user reviews. For example, in the review "the battery life is great, but the screen resolution is disappointing," the analysis yields (battery life, positive) and (screen resolution, negative). Traditional methods rely on large amounts of labeled data, which takes time and effort to collect. This has led to the exploration of methods suitable for low-resource scenarios. Current strategies fall into three categories: data augmentation, in-context learning, and pre-training techniques. Each has its own limitations, such as lack of diversity in augmented data or the requirement for extensive external datasets.
Proposed Method: DS -ABSA
Our dual-stream data synthesis framework, DS -ABSA, combines two distinct strategies for data generation. The key-point-driven strategy focuses on generating potential ABSA attributes while the instance-driven strategy modifies existing samples. This approach allows for both diversity and relevance in the generated data.
Key-point-driven Data Synthesis
This method involves brainstorming potential attributes for ABSA, such as aspect categories and opinion terms. LLMs help generate new reviews based on these attributes. We place emphasis on maintaining variety in the generated samples.
Instance-driven Data Synthesis
This method transforms existing review samples to create new ones. It uses techniques such as sample combination and selective reconstruction, ensuring that the new samples maintain strong similarity to the original data while providing diversity.
Label Refinement
To address inaccuracies in LLM-generated labels, we implement a label refinement process. This involves normalizing the labels and applying a noisy self-training algorithm using a few high-quality samples to improve the synthetic labels' quality.
Experiments
We validate DS -ABSA on four ABSA benchmark datasets across two domains: restaurants and laptops. Our results indicate that DS -ABSA consistently outperforms existing few-shot methods. The evaluation shows marked improvement in F1 scores compared to other state-of-the-art techniques, validating the effectiveness of our approach in low-resource settings.
Conclusion
DS -ABSA presents a novel solution for few-shot ABSA. By effectively utilizing dual-stream synthesis and a robust label refinement process, we generate high-quality, diverse samples without requiring additional data. Our findings suggest that this framework can be a valuable asset for future research and applications in various fields. We acknowledge some limitations, such as potential biases in LLMs and reliance on careful prompt design. Addressing these can offer further improvements.
Appendices
- Prompts for Data Generation: Detailed prompts used for generating synthetic data.
- Implementation Details: Further explanations of our methods and baseline models.
- Additional Experiments: Supplemental results to support our findings.
Title: DS$^2$-ABSA: Dual-Stream Data Synthesis with Label Refinement for Few-Shot Aspect-Based Sentiment Analysis
Abstract: Recently developed large language models (LLMs) have presented promising new avenues to address data scarcity in low-resource scenarios. In few-shot aspect-based sentiment analysis (ABSA), previous efforts have explored data augmentation techniques, which prompt LLMs to generate new samples by modifying existing ones. However, these methods fail to produce adequately diverse data, impairing their effectiveness. Besides, some studies apply in-context learning for ABSA by using specific instructions and a few selected examples as prompts. Though promising, LLMs often yield labels that deviate from task requirements. To overcome these limitations, we propose DS$^2$-ABSA, a dual-stream data synthesis framework targeted for few-shot ABSA. It leverages LLMs to synthesize data from two complementary perspectives: \textit{key-point-driven} and \textit{instance-driven}, which effectively generate diverse and high-quality ABSA samples in low-resource settings. Furthermore, a \textit{label refinement} module is integrated to improve the synthetic labels. Extensive experiments demonstrate that DS$^2$-ABSA significantly outperforms previous few-shot ABSA solutions and other LLM-oriented data generation methods.
Authors: Hongling Xu, Yice Zhang, Qianlong Wang, Ruifeng Xu
Last Update: 2024-12-19 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2412.14849
Source PDF: https://arxiv.org/pdf/2412.14849
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.