AI's Role in Grading Physics Assignments
This article explores AI's potential in grading physics problems in universities.
Ryan Mok, Faraaz Akhtar, Louis Clare, Christine Li, Jun Ida, Lewis Ross, Mario Campanelli
― 7 min read
Table of Contents
- The Rising Influence of AI
- How to Use AI in Education
- Grading with AI Chatbots
- How AI Grading Works
- Creating Physics Problems and Solutions
- Grading: AI vs. Humans
- How Humans Weigh In
- Trends and Observations
- Rescaling AI Grades
- The Connection Between Grading and Problem Solving
- Conclusion: What’s Next?
- Original Source
- Reference Links
Grading schoolwork is often like trying to find your way out of a maze blindfolded. It takes a lot of time and many teachers worry that their own opinions might sneak into the mix. Students end up waiting a long time to get their marks back, and the Feedback they receive might not really help them improve.
But what if AI could help? This article talks about how using AI, specifically chatbots based on large language models (LLMs), can be a game changer for grading physics Problems in universities. Picture having a super-smart assistant who never sleeps and is always ready to help students learn. This article investigates how well such AI tools can grade physics assignments compared to human teachers.
The Rising Influence of AI
In recent years, AI has swept across nearly every field. The big hype started with a chatbot called ChatGPT, developed by OpenAI, which could hold text-based conversations and seemed to understand human language in a way we hadn’t seen before. Other companies, like Google and Meta, quickly followed suit by creating their own chatbots. These tools can engage in conversations and perform tasks that resemble human abilities.
Newer models, such as GPT-4 and others, have shown they can tackle some tricky human tasks. They can even work with images and documents, not just text, which makes them even more useful. The rise of these multimodal models has opened up many possibilities in education, especially in subjects like physics.
How to Use AI in Education
Before getting into the fun stuff, it’s worth mentioning that AI has been used in education for a while now. For example, there are intelligent tutoring systems that help students learn without needing a teacher present all the time. More recent studies have shown how ChatGPT can help with tasks like solving physics problems. However, we still don’t know enough about how these models can hand out grades.
This article takes a closer look at how well AI chatbots can grade undergraduate physics problems. Good grading practices are super important for students because feedback helps them see where they need to improve their understanding. Traditional grading is time-consuming and requires a lot of human effort. If we could automate this process with AI, it could free up teachers’ time and give students faster and more consistent feedback.
Grading with AI Chatbots
To see if AI can handle grading, it’s important to understand what makes these chatbots work. They use large language models built on vast amounts of internet data. When you ask a question, they shoot back a reply based on patterns they’ve learned. For grading, they need to be taught how to handle physics problems effectively.
A study was conducted to see how well different AI models could not only grade but also provide helpful feedback. The researchers looked at several models, including GPT-4 and others, to see which could best handle some classic physics problems.
How AI Grading Works
In a typical scenario, a student submits their handwritten solution to a physics problem. For the AI to understand what the student wrote, the handwriting must be converted into a digital format. Once it’s digitized, the AI can read it, understand it, and then grade it.
For this study, the researchers scanned handwritten answers into PDFs and then transformed them into a format that the AI could understand. They created a set of physics problems covering classical mechanics, electromagnetic theory, and quantum mechanics from university-level courses. A clear Marking Scheme was designed to guide both the AI and human graders.
Solutions
Creating Physics Problems andThe researchers came up with a variety of physics problems, ensuring to incorporate calculations and word-based questions. For example, they had problems about electrostatics and circuits along with questions that needed lengthy explanations. The idea was to mimic what students might see on actual exams or quizzes.
To avoid asking real students to solve the problems-because that could get messy with consent-the researchers generated the answers using the AI itself. Three different solutions were created for each problem, so the AI could grade multiple attempts for better accuracy.
Grading: AI vs. Humans
When it came time to grade the solutions, the AI models were put to the test in two different ways. First, they graded “blindly,” without any marking scheme, and then they graded with a marking scheme to see how much the grading improved.
For blind grading, the AI was asked to assign marks and provide feedback based solely on its understanding of the answers. This naturally led to variations in the grades because the AI’s grading could be a little random. For the marking scheme grading, the AI was given a structured way to evaluate solutions based on specific criteria.
How Humans Weigh In
To compare the AI’s performance to human grading, human graders were brought in to evaluate the same set of physics solutions. They followed the same marking scheme to keep things consistent. Each solution was graded by multiple human markers, and their average scores were calculated to see how closely the AI tallied up against the human grades.
It turned out human grading was a bit stricter than the AI grading, often because the AI would overlook key mistakes or give marks too liberally. This highlighted that while AI can help, relying on it alone might lead to some students getting a pass they didn’t truly earn.
Trends and Observations
When the researchers plotted the results, they noticed some patterns. Models like Claude 3.5 Sonnet graded a lot more leniently than humans, while GPT-4 provided a better grading performance overall when using the marking scheme.
The feedback provided by the AI varied a lot too. Some models gave generic comments along the lines of "good job," even when the answers contained mistakes. The more advanced models were somewhat better at identifying where students went wrong but still needed improvement in pointing out specific errors.
Rescaling AI Grades
To help AI grades align more closely with human grading, a technique called grade rescaling can be used. By adjusting the AI grades based on how they performed in relation to human grades, a better match can be achieved. However, this doesn’t eliminate the inconsistencies in the AI's grading style.
The Connection Between Grading and Problem Solving
Interestingly, it was found that the AI’s ability to grade well was often tied to how well it solved the physics problems in the first place. If the AI struggled with solving a problem, it would also have a tough time assigning accurate grades. This connection suggests that if the AI could improve its problem-solving skills, its grading abilities would likely improve as well.
Conclusion: What’s Next?
In summary, while AI has the potential to assist with grading in physics education, it’s not quite ready to take over completely. The study showed that while AI can grade faster, it still makes too many mathematical errors. However, when using a marking scheme, the accuracy of the grades improves significantly.
As AI continues to evolve, there is hope that these tools can be refined to deliver even more accurate grading and feedback. In the meantime, teachers might want to keep their grading pens handy just in case!
Title: Using AI Large Language Models for Grading in Education: A Hands-On Test for Physics
Abstract: Grading assessments is time-consuming and prone to human bias. Students may experience delays in receiving feedback that may not be tailored to their expectations or needs. Harnessing AI in education can be effective for grading undergraduate physics problems, enhancing the efficiency of undergraduate-level physics learning and teaching, and helping students understand concepts with the help of a constantly available tutor. This report devises a simple empirical procedure to investigate and quantify how well large language model (LLM) based AI chatbots can grade solutions to undergraduate physics problems in Classical Mechanics, Electromagnetic Theory and Quantum Mechanics, comparing humans against AI grading. The following LLMs were tested: Gemini 1.5 Pro, GPT-4, GPT-4o and Claude 3.5 Sonnet. The results show AI grading is prone to mathematical errors and hallucinations, which render it less effective than human grading, but when given a mark scheme, there is substantial improvement in grading quality, which becomes closer to the level of human performance - promising for future AI implementation. Evidence indicates that the grading ability of LLM is correlated with its problem-solving ability. Through unsupervised clustering, it is shown that Classical Mechanics problems may be graded differently from other topics. The method developed can be applied to investigate AI grading performance in other STEM fields.
Authors: Ryan Mok, Faraaz Akhtar, Louis Clare, Christine Li, Jun Ida, Lewis Ross, Mario Campanelli
Last Update: 2024-11-20 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2411.13685
Source PDF: https://arxiv.org/pdf/2411.13685
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.