Retractions, AI, and the Risks of Biomedical Misinformation

Retractions are a serious threat to biomedical research

In the high-stakes world of biomedical research, where published findings can shape clinical practice, policy decisions, and even drug approvals, the presence of retracted literature is not just an academic problem, it’s a public health concern. When flawed, fabricated, or irreproducible studies are left unchecked in the scientific ecosystem, they continue to misinform downstream research, meta-analyses, clinical guidelines, and ultimately, patient care.

Retractions aren’t rare, either. According to Retraction Watch, retractions have been steadily rising over the last decade. Still, many retracted studies continue to circulate in the literature without any obvious indication that they’ve been pulled. 

AI-powered biomedical searching and retractions

There are dozens, maybe even hundreds, of AI tools that promise to revolutionize biomedical literature searching. These tools claim to make life easier for clinicians and researchers by surfacing “the best” evidence quickly.

Unfortunately, these AI tools likely struggle with reliably flagging retracted articles. None of these tools appear to cross-reference the Retraction Watch Database, even though it’s one of the most comprehensive and up-to-date sources of retraction data.

The result? Users could end up citing, summarizing, or even basing treatment decisions on debunked science and the AI tools they trusted helped them do it.

Putting three AI search assistant tools to the test

To assess whether current AI-powered tools can reliably detect and communicate retracted biomedical research, we ran a small, but telling test using a recently retracted article:

Wu, S. Y., Sharma, S., Wu, K., Tyagi, A., Zhao, D., Deshpande, R. P., & Watabe, K. (2021). Tamoxifen suppresses brain metastasis of estrogen receptor-deficient breast cancer by skewing microglia polarization and enhancing their immune functions. Breast Cancer Research, 23, 1-16.

This article was retracted on May 12, 2025.

We located this article through the Retraction Watch Database, a critical resource for identifying retracted papers. We then tested how three popular AI tools responded when we searched for this article: 1) SciSpace, 2) Consensus, and 3) Elicit/

Baseline: Publisher and PubMed got it right

The article is clearly marked as retracted on both the publisher’s website (BMC, part of Springer Nature) and in PubMed. On BMC’s site, the article is branded with a bold red banner indicating that it has been retracted, and it links directly to the retraction notice.

In PubMed, the article’s retraction status is labelled. There’s a large red “Retracted Article” warning at the top of the article record. 

With Third Iron’s LibKey Nomad browser extension installed, the retraction warning also appeared directly in the search results list, providing an extra layer of protection.

These platforms demonstrate that it is possible to handle retractions clearly and transparently. But what happens when you try to search with an AI-powered tool?

1) SciSpace: No retraction flag, no awareness

SciSpace has gained traction for its AI-enabled “Papers” database and its Chat AI for article summarization. We searched for the retracted article using the Papers function. The article was retrieved with no indication that it had been retracted.

The PDF version offered by SciSpace appeared to be the original, unretracted version of the paper — there was no watermark or retraction notice. This likely occurred because SciSpace stored an earlier version of the file and does not dynamically update with retraction metadata or new PDFs.

When we asked the SciSpace Chat if the article had been retracted, the reply was: “Sorry, this is not discussed in the paper.” In other words, the AI agent only read the text of the article and had no external awareness of its retraction status.

SciSpace also failed to locate or return the associated Retraction Note (PMID: 40355962), which was published in the same journal.

2) Consensus: Accurate link, but no warning

Consensus is designed to help users quickly identify answers to scientific questions by ranking statements from published articles.

The article was returned in a basic search, and no indication of its retracted status was provided. The PDF link routed to the publisher’s version, which is good practice. Since BMC properly flags retractions, users landing on that page would see the retraction banner and be able to access the Retraction Note. While Consensus did not flag the article as retracted in its own search interface or metadata, it did link out to a source that did.

3) Elicit: Somewhat better!

Elicit offers two formats for reviewing articles: a plain-text view and a PDF view. When we searched for the retracted article via Elicit’s “Find Papers” tool, the results were mixed.

The article summary did not indicate that the paper had been retracted. The plain text view contained the word “RETRACTED” throughout the body text. Elicit also linked to a newer version of the article PDF that had the retraction stamp clearly watermarked across every page.

Lessons learned: We need accountability and standards

Users skimming article summaries, relying on search results, or using data extraction tables generated by AI tools might still miss the retraction unless they click deeper into the article itself. This is especially concerning in evidence synthesis workflows, where tools like Elicit auto-populate summary tables with study characteristics and conclusions—often without indicating the article has been retracted.

If AI is going to play a meaningful role in evidence retrieval and synthesis, it needs to be held to a higher standard. At a minimum, AI tools used in biomedical contexts must:

  • Flag retracted articles clearly and automatically
  • Cross-reference multiple retraction sources, including Retraction Watch
  • Date-stamp and cite their information sources transparently
  • Allow users to report errors or omissions easily

Until the current AI tools ecosystem improves, here are some tips to protect yourself and your team:

  • Always cross-check critical articles in the Retraction Watch Database or in PubMed
  • Use reference managers (like Zotero or EndNote) that integrate with PubMed and allow for manual annotations of retracted status
  • Avoid relying solely on AI summaries or ranking algorithms, especially for high-stakes research

It’s also worth noting that all the tools we tested (SciSpace, Consensus, and Elicit) are paid products and are effectively marketed as intelligent research assistants. Yet their inconsistent handling of retracted literature highlights the need for human-level vetting and cross-referencing.

In practice, this can take significantly more time than a traditional search if you’re trying to be thorough. Instead of accelerating research, these tools often introduce a false sense of efficiency, making it easy to miss red flags that would be obvious in a well-curated, librarian-led search process. The MSK Library team can help you navigate retraction risks, validate sources, and choose the right tools for your research. Connect with us today

 

Choosing Between Extraction 1 and Extraction 2 in Covidence

Here’s a tip for getting the most out of data extraction in Covidence.

For background, the MSK Library has an institutional account to Covidence, an online software platform used for systematic and other related reviews. Covidence offers teams a collaborative space to screen, appraise, and extract data from articles, and our institutional account means anyone at MSK can use this platform for their review projects.

Once you’re within the Covidence page for your review, you’ll see there are four stages below Review Summary, with Extraction at the end. When you click on Settings to the right of Review Summary, you’ll have the option of selecting between Extraction 1 and Extraction 2.

Both extraction options offer a customizable data extraction template, so which to choose?

Covidence offers the FAQ: How to decide when to use Extraction 1 vs Extraction 2.

  • Extraction 1 is designed for intervention reviews with a standardized PICO(T) structure, as it offers a structured format for organized data collection, which makes meta-analysis easier. This structure allows it to automatically fill in data extraction fields with suggestions you can review. Results can be exported to CSV, Excel, and RevMan.
  • Extraction 2 offers an unstructured format for flexible data collection and is fully customizable. It doesn’t offer automated extraction suggestions and only exports to CSV.

Learn more about data extraction and templates for these two options in the Covidence Knowledge Base. If you prefer to be hands-on, Covidence offers a demo review, and you can test both extraction options there before choosing which one is best for your project.

Learn more about reviews, Covidence, and the way MSK librarians can support you within the guide to our Systematic Review Service.

How To Search for a Phrase in PubMed

Searching for phrases in PubMed can be an exercise in frustration. To understand why, we need to look at how PubMed interprets your search, a process that is now described in detail in a new training module from the National Library of Medicine.

When you enter a phrase into PubMed without using quotation marks, the database does several things:

  1. Looks for the phrase as a subject heading, or MeSH, term. Subject headings have been preselected by the database and are assigned to each citation on a topic.
  2. Breaks apart the phrase and looks for each word separately.
  3. Looks for the phrase, if recognized.
A graphic describing how PubMed Searches for a phrase.

How PubMed searches for a phrase. From the National Library of Medicine.

Sometimes, this process brings back the results you need. But it can also lead to search results that do not match your topic.

What happens when you search using quotation marks?

  • Instead of looking for matching subject headings, PubMed checks for the phrase in its phrase index, a list drawn from the literature included in the database.
  • Not every phrase is included in the phrase index. If your phrase is not found, PubMed may ignore your quotation marks and follow the search steps above, bringing in irrelevant results.
  • You can see how many results include your phrase from the advanced search page. Start typing your term, then click “Show index” to the right of the search box.
A screenshot showing how to check if a phrase is included in PubMed's phrase index.

Use the “Check index” button on the advanced search page to see if your phrase is included in PubMed’s Phrase index. From the National Library of Medicine.

Fortunately, there is a workaround if your phrase is not found. You can recommend the addition of phrases to PubMed. You can try searching for the phrase in different databases,  most of which are much more user-friendly when phrase searching. You can also try searching for the phrase in PubMed using adjacency.

For example, “Adult Non-Verbal Pain Scale” is not a phrase PubMed recognizes. Have PubMed look for the phrase with all words next to each other, in any order, by telling it to look in the title and abstract fields (tiab) with the words adjacent to each other (:~0):

The PubMed advanced search page showing the difference in results using quotation marks and proximity searching for a phrase.

Using proximity searching ([tiab:~0]) when a phrase is not included in PubMed’s phrase index can lead to more focused results compared to using quotation marks alone.

For more search help, or to request a literature search, contact the Library.