AI Tools Transform Systematic Reviews in Health Research
Explore how AI impacts systematic reviews and enhances health research efficiency.
Dr. Judith-Lisa Lieberum, Markus Töws, Dr. Maria-Inti Metzendorf, Felix Heilmeyer, Dr. Waldemar Siemens, Dr. Christian Haverkamp, Prof. Dr. Daniel Böhringer, Prof. Dr. Joerg J. Meerpohl, Dr. Angelika Eisele-Metzger
― 7 min read
Table of Contents
- The Rise of AI in Systematic Reviews
- How AI Tools Help with Systematic Reviews
- The Goal of the Scoping Review
- Setting the Guidelines
- Gathering the Evidence
- Selection Process
- Data Extraction
- Key Findings on LLM Applications
- Types of AI Tools Used
- Overall Conclusions
- Evaluating the Challenges
- Future of LLMs in Systematic Reviews
- Conclusion
- Original Source
- Reference Links
Systematic Reviews (SRs) are a way to gather all the existing research on a particular topic. They aim to collect and analyze all available studies to provide a clear overview of what is known about an issue. Think of it as putting together pieces of a puzzle where the final picture is the overall understanding of a specific question in health research. SRs are essential for evidence-based medicine, making sure that healthcare decisions are backed by solid data.
However, conducting a systematic review is no small feat. It can take a lot of time and resources, often requiring a team of researchers to sift through countless studies, which can feel a bit like looking for a needle in a haystack. This is where artificial intelligence (AI) comes into play, promising to make life a bit easier for these researchers.
The Rise of AI in Systematic Reviews
In recent years, various AI tools have emerged to help researchers with systematic reviews. These tools primarily use Machine Learning (ML), which is a branch of AI that helps computers learn from data and make decisions. Traditional ML requires training on specific tasks, but newer models, especially Large Language Models (LLMs), have changed the landscape.
LLMs, such as GPT and Claude, can follow instructions in natural language almost as if they have a mind of their own (okay, not a mind, but you get the point). These models process large amounts of text to generate responses, and this capability has made them quite popular in areas like medicine and health research. However, one must tread lightly, as their complexity can lead to some unexpected outcomes, like misinformation or unsuitable responses.
How AI Tools Help with Systematic Reviews
Several machine learning tools are already being used in health research to assist with systematic reviews. Some tools help with screening studies, while others assist with different steps in the review process. For instance, ASReview is an example of a tool that aids in screening research papers, and DistillerSR helps with various systematic review tasks.
A recent review of AI's impact on systematic reviews highlighted many ML tools that improve efficiency. However, it also noted a lack of LLM applications at that time. Since then, the use of LLMs in systematic reviews has increased substantially, helping researchers formulate review questions, screen studies, and extract data from the literature. But, like any new technology, these approaches are still in the experimental stage and can make mistakes.
The Goal of the Scoping Review
The purpose of the recent scoping review was to take a closer look at how ML and LLMs are being used to support systematic reviews, pinpointing the most promising strategies for future development. Researchers followed specific guidelines to ensure the process was thorough and reliable.
Setting the Guidelines
To explore the role of AI in systematic reviews, researchers set certain eligibility criteria. They focused specifically on articles that discussed the application of machine learning in systematic reviews conducted within the health research field. Articles were included if they were published from April 2021 onwards, and only full scientific articles in English and German were considered.
The researchers wanted to ensure they captured all relevant information, so they excluded other types of sources, such as study protocols or literature that did not provide details about the AI tools used. This way, they could focus on gathering meaningful data that would help in understanding how AI is changing the systematic review process.
Gathering the Evidence
The researchers systematically searched multiple databases to find relevant studies, utilizing a variety of sources, including MEDLINE and Google Scholar. They employed a search strategy that targeted known records related to ML and LLM applications in systematic reviews. After screening and organizing the findings, they were able to gather a substantial number of studies for further analysis.
Selection Process
The selection process involved a group of reviewers who independently evaluated the studies to determine their eligibility. They screened the titles and abstracts first, then moved on to the full texts for the remaining articles, discussing any disagreements that arose. This careful process ensured that only the most relevant studies made it into the final selection.
Data Extraction
When it came to analyzing the data, the researchers made a distinction between LLMs and traditional ML methods for clarity. They developed a customized spreadsheet to track the specific details of LLM applications, including the types of models used, the steps in the systematic review process they supported, and the overall conclusions drawn by the authors of each study.
For traditional ML approaches, a separate method of data extraction was used. The researchers listed known tools and categorized the machine learning methods based on their functionality. By keeping these approaches separate, the team could better understand how each type of AI supported systematic reviews.
Key Findings on LLM Applications
From the investigation, the researchers found a total of 196 studies relevant to their analysis. Out of these, a significant portion focused on how LLMs were used in systematic reviews, indicating a growing interest and promising potential in this area.
One interesting finding was that LLMs were particularly useful in various systematic review steps. The most frequently reported tasks where LLMs provided assistance included systematic Literature Search, study selection (screening), and data extraction. These tasks are crucial in ensuring that a systematic review is comprehensive and accurate.
Types of AI Tools Used
Among the studies reviewed, GPT proved to be the most commonly employed LLM. Other models, such as Claude and LLaMA, were also mentioned, but GPT took the lion's share of the research spotlight. The researchers noted the different types of LLMs used at various steps of the systematic review process, shedding light on how each model contributed to the overall task.
Overall Conclusions
The authors of these studies were somewhat optimistic about the role of LLMs in systematic reviews. More than half of the studies classified LLM applications as promising. However, a portion of authors expressed neutral or negative views regarding their efficacy. Despite the promising outcomes in study selection and data extraction, uncertainties regarding reproducibility and reliability were common themes.
Evaluating the Challenges
While LLMs show potential, there are notable challenges to overcome. For example, their ability to generate coherent and relevant content is impressive, but they don't always provide references or fact-check their outputs. This may lead to unreliable results, which is a critical issue in scientific literature and health research.
The observers also noted that LLM responses could vary significantly depending on the input provided. A minor tweak in the prompt could yield vastly different outputs, raising concerns about consistency. Moreover, many LLMs have cut-off dates for their training data, which can result in outdated information making its way into research results.
Future of LLMs in Systematic Reviews
So, what's next for LLMs in systematic reviews? While there’s excitement about their integration into the review process, caution is warranted. Researchers believe that human supervision will be crucial to ensure the quality and accuracy of results. Editing and verifying the outputs generated by these models will help maintain high standards in scientific research.
The findings from the scoping review suggest that although the applications of LLMs in systematic reviews are still developing, they hold significant potential to make the research process more efficient. Researchers encourage further studies to enhance transparency and improve the methodologies used, ensuring that as we embrace these AI tools, we do so responsibly.
Conclusion
In conclusion, AI, particularly in the form of LLMs, is ushering in a new wave of support for systematic reviews in health research. With promising results in several stages of the review process, these tools are gradually finding their place in the systematic review toolkit. Nevertheless, with great power comes great responsibility—researchers must ensure that LLMs are used wisely and cautiously to keep the integrity of science intact.
As the field continues to evolve, we can expect to see even more innovations and improvements, making systematic reviews faster and more comprehensive. So, while researchers might feel like they're still searching for that stubborn needle buried deep in the haystack, at least they now have a couple of trusty AI friends to lend a hand.
Original Source
Title: Large language models for conducting systematic reviews: on the rise, but not yet ready for use - a scoping review
Abstract: BackgroundMachine learning (ML) promises versatile help in the creation of systematic reviews (SRs). Recently, further developments in the form of large language models (LLMs) and their application in SR conduct attracted attention. ObjectiveTo provide an overview of ML and specifically LLM applications in SR conduct in health research. Study designWe systematically searched MEDLINE, Web of Science, IEEEXplore, ACM Digital Library, Europe PMC (preprints), Google Scholar, and conducted an additional hand search (last search: 26 February 2024). We included scientific articles in English or German, published from April 2021 onwards, building upon the results of a mapping review with a related research question. Two reviewers independently screened studies for eligibility; after piloting, one reviewer extracted data, checked by another. ResultsOur database search yielded 8054 hits, and we identified 33 articles from our hand search. Of the 196 included reports, 159 described more traditional ML techniques, 37 focused on LLMs. LLM approaches covered 10 of 13 defined SR steps, most frequently literature search (n=15, 41%), study selection (n=14, 38%), and data extraction (n=11, 30%). The mostly recurring LLM was GPT (n=33, 89%). Validation studies were predominant (n=21, 57%). In half of the studies, authors evaluated LLM use as promising (n=20, 54%), one quarter as neutral (n=9, 24%) and one fifth as non-promising (n=8, 22%). ConclusionsAlthough LLMs show promise in supporting SR creation, fully established or validated applications are often lacking. The rapid increase in research on LLMs for evidence synthesis production highlights their growing relevance. HIGHLIGHTSO_LIMachine learning (ML) offers promising support for systematic review (SR) creation. C_LIO_LIGPT was the most commonly used large language model (LLM) to support SR production. C_LIO_LILLM application included 10 of 13 defined SR steps, most often literature search. C_LIO_LIValidation studies predominated, but fully established LLM applications are rare. C_LIO_LILLM research for SR conduct is surging, highlighting the increasing relevance. C_LI
Authors: Dr. Judith-Lisa Lieberum, Markus Töws, Dr. Maria-Inti Metzendorf, Felix Heilmeyer, Dr. Waldemar Siemens, Dr. Christian Haverkamp, Prof. Dr. Daniel Böhringer, Prof. Dr. Joerg J. Meerpohl, Dr. Angelika Eisele-Metzger
Last Update: 2024-12-24 00:00:00
Language: English
Source URL: https://www.medrxiv.org/content/10.1101/2024.12.19.24319326
Source PDF: https://www.medrxiv.org/content/10.1101/2024.12.19.24319326.full.pdf
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to medrxiv for use of its open access interoperability.