Simple Science

Cutting edge science explained simply

# Computer Science# Artificial Intelligence# Computation and Language# Computers and Society

Enhancing Learner Assessment in Intelligent Tutoring Systems

Improving assessments through Item Response Theory for better language learning.

Jue Hou, Anisia Katinskaia, Anh-Duc Vu, Roman Yangarber

― 7 min read


Revolutionizing LanguageRevolutionizing LanguageLearner Assessmentaccuracy and efficiency.Using IRT to improve assessment
Table of Contents

Assessment of a learner's skills is key in Intelligent Tutoring Systems (ITS). This assessment helps in understanding how well a student is doing and how to support their learning. We focus on Item Response Theory (IRT) in learning languages using computers, which helps us assess student abilities in two main scenarios: tests and practice Exercises.

Testing can cover many skills, providing a clear view of a learner's Proficiency. However, exhaustive testing may not always be practical. Therefore, we aim to replace lengthy tests with shorter, more efficient adaptive tests. By using data gathered from extensive tests, even in less-than-ideal situations, we can train an IRT model to help create these adaptive tests. Our work includes simulations and experiments with real learners that show this method can be both efficient and accurate.

Additionally, we investigate if we can gauge a learner's abilities directly from their practice exercises without needing formal testing. We transform data collected from practice sessions into formats suitable for IRT modeling by linking exercises to key language concepts, treating these concepts like test items.

We present findings from large-scale studies involving thousands of students. By comparing estimates from tests to those derived from exercises, we find that IRT models can accurately assess learner abilities using exercise data.

Intelligent Tutoring Systems

ITS aim to personalize learning experiences for students. They have proven to be effective in various subjects. This article discusses Intelligent Tutoring Systems, especially in the realm of computer-aided language learning (CALL), which is used in real-life learning situations. Our experiments with the CALL system show how it can support student practice outside of regular class.

The smart tutor aids learners during their independent practice. While students participate in lectures and receive learning materials from teachers, the tutor helps them use their time outside the classroom to further improve their skills.

A significant requirement for personalized tutoring in ITS and CALL is accurate assessment of the learner's current proficiency. Assessment serves two main purposes: externally, to inform both learners and teachers about what skills have or have not been mastered, and internally, to determine which exercises should be presented to the learner during practice.

Zone of Proximal Development

The Zone of Proximal Development (ZPD) refers to the skills a learner is ready to tackle with some help. Skills outside of the ZPD are either already mastered, or too difficult for the learner at that moment. If tutors provide exercises focusing too much on mastered skills, learners may become bored. Conversely, if the focus is on skills that are too challenging, learners may become frustrated. Both scenarios can lead to decreased motivation and potential dropout.

Thus, ITS should concentrate on identifying the ZPD to ensure that exercises are well-matched with the learner's current level of proficiency. Accurately assessing the learner's ability is crucial to this process.

Different Contexts for Learning

We consider two main contexts where learners interact with the tutor: test sessions and practice sessions using exercises. In our learning framework, we will analyze three types of Assessments: (A) exhaustive tests, (B) adaptive tests, and (C) assessments stemming from exercises completed by learners during practice.

Traditionally, exhaustive testing involves students answering a long list of questions, which can provide teachers with detailed information about a student's abilities. However, this method has its drawbacks. The testing environment can influence student performance, meaning tests may not accurately reflect the real abilities of learners. Instead, students might end up preparing for the test rather than for genuine skill mastery.

Additionally, testing does not facilitate learning; learners typically do not receive immediate feedback on their answers. In contrast, exercises provide immediate feedback and helpful hints that guide learners toward finding the correct answers independently.

Given the redundancy in testing, particularly if a student demonstrates strong proficiency, many easier questions may yield little useful information. A more efficient approach is to adjust the sequence of questions based on the interconnections among skills.

Our large-scale studies involving thousands of learners collected significant data from tests and exercises. The data is anonymized and shared with the community for further research. For both assessment types, we utilize a vast bank of questions designed by experts in language education.

For assessment through exercises, we work with texts uploaded by users, allowing our system to generate exercises linked to language concepts validated by pedagogical experts.

Challenges in Assessment

Despite their advantages, assessments can be complicated. For tests, even well-designed ones can be problematic. Long tests can lead to frustration and stress. Testing environments, like strict time limits, can hinder performance, resulting in a skewed view of a learner's actual proficiency.

This raises questions about whether effective assessment models can be trained using imperfect data. Similarly, evaluating exercises can be complex when it comes to assigning credit or penalties based on students' answers. Unlike straightforward test questions, exercises often relate to multiple skills, making credit assignment less clear-cut.

Item Response Theory (IRT)

Item Response Theory (IRT) helps evaluate and compare question difficulty and learner proficiency. It is particularly useful in settings beyond ITS, including psychological assessments and medical tests. In language learning, IRT is applied to map a learner's abilities to language proficiency scales.

We use a 3PL model, designed for multiple-choice questions with a correct answer. This model helps determine the likelihood a learner with a particular ability will answer a question correctly. IRT also defines item and test information functions, which gauge the amount of information provided about the learner's abilities.

The IRT model allows us to adaptively select the most informative questions based on the learner's ability, and we simulate the adaptive test process to examine its effectiveness. This involves iterative question selection and ability estimation to ensure a valid and efficient testing experience.

Conducting Simulations

In our simulations, we aim to assess the effectiveness of IRT trained on learner data for future tests. We use a structured process that selects questions based on the learner's ability and evaluates their responses. This process includes measures such as slips (random mistakes) and exploration (deliberate variation in question difficulty).

We evaluate the impact of these adjustments on the adaptive testing process, as well as explore termination criteria that dictate when to end a test session. We examine multiple metrics, such as the mean number of questions and the accuracy of the ability estimate, to assess the effectiveness of our procedures.

The analysis includes a focus on how different early responses affect the overall assessment and the introduction of a warm-up phase to minimize the influence of initial slips.

Results and Findings

Our simulations reveal critical insights into the effectiveness of IRT in assessing student abilities. The results indicate that despite initial imperfections in data collection, parameters learned directly from learner interactions generate reliable assessments.

We find that the IRT-based assessments outperform traditional methods in terms of efficiency and accuracy. More questions generally lead to more accurate predictions, and as data over time accumulates, the assessments become increasingly reliable.

Additionally, we establish that the ability to model learner proficiency based on exercise data is at least as good as assessments derived from traditional testing methods.

Conclusions and Future Directions

In summary, accurate assessment of learner proficiency is vital for personalized tutoring in ITS. Our work demonstrates the effectiveness of IRT in both test environments and exercise settings. Our approach allows for more efficient and reliable assessments, ultimately leading to better tailored educational experiences.

We propose that traditional testing can be minimized to key points in the learning process while continuous assessment through exercises can provide sufficient insight into learner progress. This means that as learners engage in exercises, they can achieve valid assessments of their skills without the stress typically associated with testing.

Our findings suggest promising future directions, indicating that adaptive testing and assessments driven by exercise data can create engaging and effective learning environments for language learners. As we continue to collect data, we anticipate refining our models and enhancing their accuracy, paving the way for more intuitive and responsive teaching methods.

Original Source

Title: Implicit assessment of language learning during practice as accurate as explicit testing

Abstract: Assessment of proficiency of the learner is an essential part of Intelligent Tutoring Systems (ITS). We use Item Response Theory (IRT) in computer-aided language learning for assessment of student ability in two contexts: in test sessions, and in exercises during practice sessions. Exhaustive testing across a wide range of skills can provide a detailed picture of proficiency, but may be undesirable for a number of reasons. Therefore, we first aim to replace exhaustive tests with efficient but accurate adaptive tests. We use learner data collected from exhaustive tests under imperfect conditions, to train an IRT model to guide adaptive tests. Simulations and experiments with real learner data confirm that this approach is efficient and accurate. Second, we explore whether we can accurately estimate learner ability directly from the context of practice with exercises, without testing. We transform learner data collected from exercise sessions into a form that can be used for IRT modeling. This is done by linking the exercises to {\em linguistic constructs}; the constructs are then treated as "items" within IRT. We present results from large-scale studies with thousands of learners. Using teacher assessments of student ability as "ground truth," we compare the estimates obtained from tests vs. those from exercises. The experiments confirm that the IRT models can produce accurate ability estimation based on exercises.

Authors: Jue Hou, Anisia Katinskaia, Anh-Duc Vu, Roman Yangarber

Last Update: 2024-09-24 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2409.16133

Source PDF: https://arxiv.org/pdf/2409.16133

Licence: https://creativecommons.org/licenses/by-nc-sa/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

Similar Articles