Simple Science

Cutting edge science explained simply

# Computer Science # Artificial Intelligence

Revolutionizing Feedback: A New Grading Approach

Discover how technology transforms student feedback with innovative grading methods.

Pritam Sil, Bhaskaran Raman, Pushpak Bhattacharyya

― 8 min read


New Era of Student New Era of Student Feedback with AI-driven feedback. Transforming educational assessments
Table of Contents

In education, giving students Feedback is super important. It helps them learn and grow. But what happens when you have a classroom full of learners? How do you give each one the personal touch they need? Enter technology! With the help of intelligent systems, we can now offer personalized feedback to students. This article discusses a new approach to grading short answers given by students, especially when they also include images. It's like a teacher with superpowers!

The Need for Personalized Feedback

Imagine a classroom where everyone is working on their assignments. Some students ask questions, while others struggle in silence. Addressing their individual needs can be tricky for one teacher. This is where smart tools come into play. They aim to provide unique feedback based on each student’s answer, whether it’s in writing or with a picture.

The traditional methods in education mostly focus on multiple-choice questions. These can be limiting, as they only allow students to pick answers without encouraging creativity. Instead, open-ended questions let students express their thoughts freely. However, evaluating these answers can be tough! That's where Automatic Short Answer Grading (ASAG) comes in, but with a twist. We’re now adding a new layer: feedback that recognizes images too!

The MMSAF Problem

Now, let’s dive into our main subject: the Multimodal Short Answer Grading with Feedback (MMSAF). This new approach allows teachers (and machines) to grade answers that include both text and images.

What Is MMSAF?

Think of MMSAF as a grading superhero. It takes a question, a reference answer (the "gold standard"), and the student’s answer-all with the possibility of images-and gives a grade along with useful feedback. The goal is to help students understand where they went wrong and how they can improve.

This is particularly useful in subjects like science, where diagrams and images can really enhance understanding. For example, if a student draws a picture of a plant cell and explains its parts, the system grades not just the words, but also the image they provided.

The MMSAF Dataset

To train our grading superhero, we needed a lot of data. We created a dataset consisting of 2,197 examples taken from high school-level questions in subjects like physics, chemistry, and biology.

How Was the Dataset Created?

We didn’t just pull this data out of thin air. We used textbooks and even some help from AI to generate example answers. Each entry in our dataset includes a question, a correct answer, a student answer, and information on whether their image (if provided) was relevant. This means that our superhero has a rich understanding of what good answers look like!

Challenges in Traditional Grading

Grading open-ended questions comes with its own set of challenges. Many existing systems struggle when it comes to providing specific, insightful feedback. They might just say, "You did okay," without giving any real guidance. This can leave students feeling confused.

The MMSAF approach seeks to change all that. Not only does it evaluate the correctness of what students write, but it also considers how relevant their images are. It’s a more comprehensive way to evaluate both creativity and understanding.

The Role of Large Language Models (LLMs)

LLMs are like the brains behind our grading superhero. These models learn from vast amounts of data, allowing them to evaluate and provide feedback on both text and images.

Choosing the Right LLMs

We didn’t just pick any model off the shelf. We selected four different LLMs to test our MMSAF approach: ChatGPT, Gemini, Pixtral, and Molmo. Each of these models has its own strengths, especially when it comes to understanding and reasoning through multimodal data-text and images combined.

How Do LLMs Help?

Think of LLMs as very smart assistants that can read, write, and analyze. They can look at a student’s answer and compare it to a reference answer. They generate levels of correctness, comment on the relevance of images, and provide thoughtful feedback that addresses common errors. This saves time for teachers who might otherwise spend hours grading assignments.

Evaluation of the LLMs

After setting up the MMSAF framework and dataset, we needed to see how well these LLMs performed. We randomly sampled 221 student responses and let our LLMs work their magic.

Measuring Success

We looked at how accurately each LLM predicted the level of correctness and the relevance of images. The main goal was to determine which model could provide the best feedback while remaining friendly and approachable-like a teacher, but with a little digital flair!

Results of the Evaluation

So, how did our LLM superheroes perform? It turned out that while some excelled in specific areas, others had certain shortcomings.

Correctness Levels

Gemini performed quite well when it came to predicting correctness levels. It reliably classified answers as correct, partially correct, or incorrect without much fuss. ChatGPT also did a good job but tended to label some incorrect answers as partially correct. Pixtral was lenient with its grading, giving some incorrect answers a pass as partially correct. On the other hand, Molmo didn’t fare as well, often marking everything as incorrect.

Image Relevance

When it came to the relevance of images, ChatGPT shone brightly. It was able to evaluate the images accurately in most cases. Meanwhile, Gemini struggled a bit, sometimes marking relevant images as irrelevant, which could leave students scratching their heads.

Feedback Quality

One of the most exciting aspects of our study was the quality of the feedback that each LLM generated. We wanted to ensure that the feedback was not only accurate but also constructive and encouraging.

Expert Evaluation

To get a better sense of how the feedback held up, we enlisted the help of subject matter experts (SMEs). These are real educators who know their subjects inside and out. They evaluated the feedback on several criteria, including grammar, emotional impact, correctness, and more.

Who Came Out on Top?

The experts rated ChatGPT as the best in terms of fluency and grammatical correctness, while Pixtral excelled in emotional impact and overall helpfulness. It turns out that Pixtral knew how to structure its feedback in a way that made it easy for students to digest.

The Importance of Feedback in Learning

Feedback is more than just a grade; it’s an opportunity for improvement. It can inspire students to dig deeper, ask questions, and truly engage with the material. In a world where students often feel overwhelmed, personalized feedback can be a game-changer.

Motivating Students

When students receive constructive feedback, it can ignite a spark of curiosity. They might think, “Hey, I never thought about it that way!” Effective feedback encourages students to learn from their mistakes and fosters a desire to keep exploring the subject matter.

Future Directions

While we’ve made great strides with the MMSAF framework and its evaluation methods, there’s still room to grow.

Expanding the Dataset

Currently, our dataset is primarily focused on high school subjects. In the future, we could expand it to include university-level courses and other subjects. This would create a more robust resource for educators and students alike.

Automating Image Annotations

Right now, some of the image-related feedback must be done manually. We could develop tools to automate this process, thus making it scalable and efficient.

Ethical Considerations

We’ve sourced our content from reputable educational resources to ensure that we meet ethical guidelines. It’s crucial to respect the boundaries of copyright and address issues of data privacy, especially when working with AI in education.

Conclusion

In summary, the MMSAF problem offers a fresh approach to assessing students’ short answers that include multimodal content. By leveraging the power of LLMs, we can help students receive valuable feedback that not only grades their work but also enhances their learning experience. With ongoing research and development, we can make educational experiences richer, more engaging, and, most importantly, more supportive for learners everywhere.

Final Thoughts

Education is more than just passing grades; it’s about nurturing curiosity and passion for learning. With tools like MMSAF and smart AI models, we stand on the brink of a new age in educational assessment. So, whether it’s a student’s text or a doodle of a cell, we’re ready to help them succeed, one grade at a time!

And who knows? Maybe one day, our grading superhero will help students learn from their homework mistakes while they laugh along the way. After all, learning should be fun!

Original Source

Title: "Did my figure do justice to the answer?" : Towards Multimodal Short Answer Grading with Feedback (MMSAF)

Abstract: Personalized feedback plays a vital role in a student's learning process. While existing systems are adept at providing feedback over MCQ-based evaluation, this work focuses more on subjective and open-ended questions, which is similar to the problem of Automatic Short Answer Grading (ASAG) with feedback. Additionally, we introduce the Multimodal Short Answer grading with Feedback (MMSAF) problem over the traditional ASAG feedback problem to address the scenario where the student answer and reference answer might contain images. Moreover, we introduce the MMSAF dataset with 2197 data points along with an automated framework for generating such data sets. Our evaluations on existing LLMs over this dataset achieved an overall accuracy of 55\% on Level of Correctness labels, 75\% on Image Relevance labels and a score of 4.27 out of 5 in correctness level of LLM generated feedback as rated by experts. As per experts, Pixtral achieved a rating of above 4 out of all metrics, indicating that it is more aligned to human judgement, and that it is the best solution for assisting students.

Authors: Pritam Sil, Bhaskaran Raman, Pushpak Bhattacharyya

Last Update: Dec 27, 2024

Language: English

Source URL: https://arxiv.org/abs/2412.19755

Source PDF: https://arxiv.org/pdf/2412.19755

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles