Simple Science

Cutting edge science explained simply

# Computer Science# Computation and Language

Improving Automated Essay Scoring with Grammar Features

This study enhances AES by focusing on grammar in essay evaluation.

― 6 min read


Grammar Boosts EssayGrammar Boosts EssayScoringautomated essay evaluation.Study shows grammar features enhance
Table of Contents

Automated essay scoring (AES) is a tool that grades essays without requiring a human reader. It has become popular in classrooms and language tests. One of the main challenges in traditional scoring is that it takes a lot of time and effort for people to read and give scores to many essays. This method of grading can also lead to inconsistencies among different raters. AES aims to reduce these problems by providing quicker and more reliable scoring.

There are two main ways to score essays: holistic and analytic scoring. Holistic scoring gives a single score based on the overall quality of the essay, while analytic scoring breaks down the evaluation into multiple parts, such as grammar, vocabulary, content, and organization. In this article, we will focus on how Grammatical Features can improve AES.

Importance of Grammatical Features

Grammar is essential for good writing. It helps convey ideas clearly and effectively. When people judge writing, they often pay attention to grammatical usage. Studies show that both the variety of grammatical structures and the number of errors play a significant role in how essays are scored. Therefore, using detailed grammatical features can enhance the way AES models evaluate essays.

In many past studies, grammatical features have been used, but they often looked at groups of similar structures instead of individual items. This approach might miss important details about specific grammatical forms. Our study proposes to focus on using individual grammatical items to better represent what writers are doing in their essays.

Methods Used in the Study

In this study, we looked at two main types of grammatical features:

  1. Grammatical items that writers used correctly in their essays.
  2. The number of grammatical errors made by the writers.

We took these features, along with essay content, and trained a model to predict essay scores. By using a special technique called Multi-task Learning, we also worked on predicting grammar scores alongside the overall essay scores. This helped us better capture the influence of grammar on writing quality.

What is Multi-Task Learning?

Multi-task learning (MTL) is a method where a single model learns to perform multiple tasks simultaneously. In the context of AES, this means training the model not just to score essays but also to assess the grammatical accuracy of those essays. This approach helps the model to understand better how grammar affects the overall quality of writing.

Understanding Item Response Theory (IRT)

Item Response Theory (IRT) is a statistical method that helps measure abilities based on how individuals respond to certain items, like test questions. In our case, we treat each grammatical item as a test question. IRT helps us not only gauge a writer's ability but also understand how difficult various grammatical items are. By using IRT, we can weigh items according to their difficulty level, which allows us to give credit for using more complex structures.

Grammatical Features Used

We used a detailed system to capture grammatical features. Our study included:

  • A list of positive linguistic features (grammatical items a writer uses correctly).
  • A count of negative linguistic features (grammatical errors made).
  • We also considered the difficulty of these features when scoring them.

The use of both types of features allowed us to gather a more complete picture of a writer's abilities and challenges.

Experimental Setup

To test our model, we used two datasets that included many essays written for different prompts. We trained our model on these datasets and evaluated its performance by comparing the scores it predicted against actual scores given by human raters.

We set up our experiments in a way that allowed us to see how well our model performed with different types of grammatical features and scoring strategies. By changing certain parts of the model, such as how many hidden layers it had, we could find the best setup for scoring essays accurately.

Results of the Experiments

The results showed that using grammatical features significantly improved the model's ability to score essays accurately. We found that when we combined the scores from both grammatical items and errors, the overall scoring performance became better. Multi-task learning also proved to be beneficial, as it allowed the model to learn from both the holistic essay scores and the grammatical accuracy scores at the same time.

When we weighed the grammatical features using IRT parameters, the performance increased even more. This indicates the importance of considering the difficulty of grammatical structures when evaluating writing.

Challenges and Limitations

While our methods showed promise, there were some challenges and limitations. For instance, not all grammatical features might be relevant to every essay, and the way we extracted these features could lead to errors. Additionally, the scoring for some essays did not improve as much as expected, highlighting the need for further investigation into why certain prompts led to less effective results.

Future Directions

Looking ahead, there are plenty of opportunities for improvement. One significant area is finding better ways to combine positive and negative linguistic features without simply putting them together. Experimenting with how grammatical features interact might reveal deeper insights into their role in essay scoring.

Another direction is to apply the principles of IRT to both correct uses of grammar and errors to gain a more comprehensive view of a writer’s grammatical abilities. We also see potential in exploring how our methods can be implemented in scoring systems using advanced language models.

In addition, we aim to study how our methods perform across different types of essays and prompts. By doing this, we can understand better which essay features our model responds well to and where it struggles.

Conclusion

Our study highlights the value of incorporating grammatical features into automated essay scoring systems. By focusing on individual grammatical items and errors, we were able to enhance the scoring accuracy significantly. The combination of multi-task learning and item response theory allowed for a more nuanced understanding of how grammar affects writing quality.

As educational tools continue to evolve, using these advanced techniques could lead to more effective and reliable ways to assess writing. In turn, this will help learners receive the feedback they need to improve their writing skills.

Original Source

Title: Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory

Abstract: This study examines the effect of grammatical features in automatic essay scoring (AES). We use two kinds of grammatical features as input to an AES model: (1) grammatical items that writers used correctly in essays, and (2) the number of grammatical errors. Experimental results show that grammatical features improve the performance of AES models that predict the holistic scores of essays. Multi-task learning with the holistic and grammar scores, alongside using grammatical features, resulted in a larger improvement in model performance. We also show that a model using grammar abilities estimated using Item Response Theory (IRT) as the labels for the auxiliary task achieved comparable performance to when we used grammar scores assigned by human raters. In addition, we weight the grammatical features using IRT to consider the difficulty of grammatical items and writers' grammar abilities. We found that weighting grammatical features with the difficulty led to further improvement in performance.

Authors: Kosuke Doi, Katsuhito Sudoh, Satoshi Nakamura

Last Update: 2024-06-13 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2406.08817

Source PDF: https://arxiv.org/pdf/2406.08817

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles