Improving assessments through Item Response Theory for better language learning.
Jue Hou, Anisia Katinskaia, Anh-Duc Vu
― 7 min read
Cutting edge science explained simply
Improving assessments through Item Response Theory for better language learning.
Jue Hou, Anisia Katinskaia, Anh-Duc Vu
― 7 min read
A new benchmark assesses how well AI models mimic human language.
Xufeng Duan, Bei Xiao, Xuemei Tang
― 5 min read
A new method improves accuracy in answering questions from tables by merging two systems.
Siyue Zhang, Anh Tuan Luu, Chen Zhao
― 7 min read
A new method for generating engaging distractors in educational assessments.
Devrim Cavusoglu, Secil Sen, Ulas Sert
― 5 min read
A new method aims to enhance alt-text for mobile app icons to aid visually impaired users.
Sabrina Haque, Christoph Csallner
― 5 min read
DREAMS simplifies deep learning for EEG data, promoting transparency and ethical practices.
Rabindra Khadka, Pedro G Lind, Anis Yazidi
― 7 min read
A look into assessing the trustworthiness of AI explanations through adversarial sensitivity.
Supriya Manna, Niladri Sett
― 7 min read
Recent models enhance AI's ability to generate and understand various media.
Xinlong Wang, Xiaosong Zhang, Zhengxiong Luo
― 5 min read
ARLBench simplifies hyperparameter tuning for reinforcement learning with efficient benchmarking tools.
Jannis Becktepe, Julian Dierkes, Carolin Benjamins
― 7 min read
A model to assess segmentation quality without ground truth benchmarks.
Ahjol Senbi, Tianyu Huang, Fei Lyu
― 8 min read
A method to manage conflicting sensor data in autonomous vehicles for improved safety.
Oliver Schumann, Thomas Wodtko, Michael Buchholz
― 5 min read
ESPnet-Codec enhances training and evaluation of neural codecs for audio and speech.
Jiatong Shi, Jinchuan Tian, Yihan Wu
― 7 min read
A three-step method for secure data sharing while protecting privacy.
Tung Sum Thomas Kwok, Chi-hua Wang, Guang Cheng
― 6 min read
New benchmark addresses gaps in assessing LLMs for clinical decision-making.
Fenglin Liu, Z. Li, H. Zhou
― 6 min read
Visualizing functional programs can simplify the debugging process for programmers.
John Whitington, Tom Ridge
― 7 min read
Exploring how Generative AI is influencing interaction design processes.
Marie Muehlhaus, Jürgen Steimle
― 5 min read
This study examines values in human and AI-generated texts for better understanding.
Scott E. Friedman, Noam Benkler, Drisana Mosaphir
― 3 min read
NetworkCommons is a new tool for studying molecular interactions.
Victor Paton, Denes Türei, Olga Ivanova
― 7 min read
A new framework enhances reasoning in language models with quality rationales.
Jaehyeok Lee, Keisuke Sakaguchi, JinYeong Bak
― 7 min read
A study compares AI models in grasping spatial relationships.
Shang Hong Sim, Clarence Lee, Alvin Tan
― 6 min read
Examining the vulnerabilities and defenses of new AI models.
Yangyang Guo, Fangkai Jiao, Liqiang Nie
― 7 min read
Examining how well models detect toxic comments across various language dialects.
Fahim Faisal, Md Mushfiqur Rahman, Antonios Anastasopoulos
― 7 min read
MTFusion combines images and text for advanced 3D model creation.
Yu Liu, Ruowei Wang, Jiaqi Li
― 6 min read
A look at holistic admissions and its impact on future doctors.
Andrew D. Bergemann, Stephen R. Smith, Joel A. Daboub
― 6 min read
A new method for creating realistic materials enhances flexibility for artists and designers.
Chenliang Zhou, Zheyuan Hu, Alejandro Sztrajman
― 6 min read
A new approach tackles biases in image-text models effectively.
Haoyu Zhang, Yangyang Guo, Mohan Kankanhalli
― 7 min read
Assessing language models' effectiveness in coding tasks with new benchmarks.
Nidhish Shah, Zulkuf Genc, Dogu Araci
― 5 min read
Understanding how Knowledge Graphs can reduce false information in AI responses.
Ernests Lavrinovics, Russa Biswas, Johannes Bjerva
― 6 min read
A fresh approach to evaluating AI decision-making models using attribution maps.
Lars Nieradzik, Henrike Stephani, Janis Keuper
― 7 min read
Examining how humans and AI can work together effectively.
Filip Ilievski, Barbara Hammer, Frank van Harmelen
― 9 min read
An overview of how LLMs enhance evaluation processes while addressing key challenges.
Jiawei Gu, Xuhui Jiang, Zhichao Shi
― 7 min read
This study examines how well LLMs assess creativity in the Alternative Uses Test.
Abdullah Al Rabeyah, Fabrício Góes, Marco Volpe
― 5 min read
STAR automates AI model building for smarter and faster results.
Armin W. Thomas, Rom Parnichkun, Alexander Amini
― 7 min read
ER 2Score improves the quality assessment of automated radiology reports.
Yunyi Liu, Yingshu Li, Zhanyu Wang
― 5 min read
Transforming text prompts into realistic videos by incorporating physical laws.
Qiyao Xue, Xiangyu Yin, Boyuan Yang
― 6 min read
Are large language models reliable evaluators? Exploring consistency in their assessments.
Noah Lee, Jiwoo Hong, James Thorne
― 7 min read
ChemTEB helps improve chemical text processing by evaluating specialized models.
Ali Shiraee Kasmaee, Mohammad Khodadad, Mohammad Arshi Saloot
― 8 min read
AgriBench evaluates AI tools to support smarter farming decisions.
Yutong Zhou, Masahiro Ryo
― 8 min read
Learn how SelfPrompt helps assess the strength of language models effectively.
Aihua Pei, Zehua Yang, Shunan Zhu
― 3 min read
Learn how sandbagging affects AI assessments and ways to detect it.
Cameron Tice, Philipp Alexander Kreer, Nathan Helm-Burger
― 6 min read