MSK Chatbots Can’t Perform a Literature Search

MSK now offers employees access to Open WebUI, a source for several chatbots available for workplace use. But if you think this tool can be used for searching the literature, think again.

What is Open WebUI and How Do I Access It?

This portal is “a proprietary, user-friendly, and PHI-secure portal where staff can access a wide array of popular large language models (LLMs) as well as tools for experienced developers behind the MSK firewall.”

To access:

  1. Log on to the VPN or be onsite
  2. Visit https://chat.aicopilot.aws.mskcc.org/
  3. Select “Continue with MSK PingID” if prompted
  4. You’ll then get the message “Account Activation Pending” followed by “Contact Admin for WebUI Access.”

No further action is needed and contacting admin isn’t necessary. You will not get confirmation once your account has been activated. But once it has, visiting the URL while onsite or on the VPN will take you to the tools.

Open WebUI includes the following chatbots:

Chatbot Description
Amazon Nova Pro A reasoning model for general analysis and summarization. Knowledge cutoff date: Unknown
Claude Sonnet 3.5 A general-use model by Anthropic. Effective with code generation. Knowledge cutoff date: April 15th, 2024
Claude Sonnet 3.7 Improved version of Sonnet 3.5, and also targets code generation as a differentiator. Knowledge cutoff date: October 2024
Claude Sonnet 4 High intelligence and balanced performance. Good for complex coding/debugging, detailed explanations, and documentation review. Detailed prompts recommended. Knowledge cutoff date: January 2024
DeepSeek R1 A reasoning model for logical inference, math problem-solving, code generation, or text-based clinical reasoning. Cannot process images. Knowledge cutoff date: October 2023
OpenAI o1 A reasoning model that thinks before it answers, making it suitable for deep analysis, task breakdown, or image-based clinical analysis. Knowledge cutoff date: October 2023
OpenAI GPT-4o A general-purpose model that balances quality, speed, and cost-effectiveness. Knowledge cutoff date: October 2023

You can toggle between tools on the top left of the page and click the “set as default” option under a tool name after you’ve selected it.

Why Can’t I Use These Tools to Perform a Literature Search?

When you ask Amazon Nova Pro to perform a literature search, it appears to do so:

A screenshot of Amazon Nova Pro appearing to summarize the literature in response to a prompt.

However, a follow-up question reveals that all is not as it seems, and that any citations provided are likely not real:

Amazon Nova Pro answering a prompt asking if it searched databases to come up with its answer. It says it did not.

Other tools are clearer about their limitations from the start:

OpenAI o1 saying it does not have database access and giving advice on how to search.
Claude Sonnet 3.7 saying it does not have database access and recommending speaking to a librarian.

What Should I Do Instead?

There are AI tools that specialize in searching the literature, but even these are typically limited to open-source texts. Use these tools cautiously, perhaps in the brainstorming and planning stages of a project.

As an alternative, we welcome you to contact us to request a literature search.

Want to learn more about the use of AI for literature searching? Sign up for our next class on August 19 from 12-1 pm.

From a PubMed Record to NCBI’s Gene information Portal

PubMed – the National Library of Medicine’s database of biomedical literature that will celebrate its 30th anniversary in January 2026 – is an incredible resource that’s freely available to all.

So much so in fact that many other search tools, including Google Scholar and most of the generative AI research assistants that are popping up at a dizzying speed, heavily rely on PubMed for their content needs, especially since the National Center for Biotechnology Information (NCBI) has always been eager to build APIs and other tools to help facilitate collaborative relationships with developers of other research tools.

One of the downsides, however, of discovering PubMed’s content solely via other search engines and tools is that users miss out on some of the incredible value-added links to other information that appear within each PubMed record. This is particularly true for searches on topics with a genetic information or bioinformatics aspect.

Take this example (inspired by NLM training exercises):

You are interested in exploring how the CYP2r1 gene might impact vitamin D deficiency risk.

A basic search in PubMed might look something like this:

Clicking on the Title to view the Full Abstract view, users can scroll below the abstract text to see the MeSH terms and other Related Information – see:

The Related Information links include a link to NCBI’s Gene information portal which:

“integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.”

The Gene record can also include GeneRIFs or a “Gene Reference into Function”.
See https://www.ncbi.nlm.nih.gov/gene/about-generif for a more detailed description.

“GeneRIF provides a simple mechanism to allow scientists to add to the functional annotation of genes described in Gene.”

As per https://www.ncbi.nlm.nih.gov/books/NBK3841/#EntrezGene.Bibliography:

“A GeneRIF is a concise phrase describing a function or functions of a gene, with the PubMed citation supporting that assertion.”

Filtering out the references that address a specific gene’s function can be a useful time-saver when literature searching.

For those who find the Gene records a bit overwhelming and prefer to stay within the familiar PubMed environment, limiting PubMed search results to those items that have been added as Gene RIFs can be filtered out in a PubMed search by adding “pubmed gene rif” [Filter].

For example, adding it to the PubMed search string:

“CYP2R1 gene” AND “vitamin D” AND “pubmed gene rif” [Filter]

If you have any questions or want additional guidance on designing specialized literature searches, feel free to Ask Us at the MSK Library.

Choosing Between Extraction 1 and Extraction 2 in Covidence

Here’s a tip for getting the most out of data extraction in Covidence.

For background, the MSK Library has an institutional account to Covidence, an online software platform used for systematic and other related reviews. Covidence offers teams a collaborative space to screen, appraise, and extract data from articles, and our institutional account means anyone at MSK can use this platform for their review projects.

Once you’re within the Covidence page for your review, you’ll see there are four stages below Review Summary, with Extraction at the end. When you click on Settings to the right of Review Summary, you’ll have the option of selecting between Extraction 1 and Extraction 2.

Both extraction options offer a customizable data extraction template, so which to choose?

Covidence offers the FAQ: How to decide when to use Extraction 1 vs Extraction 2.

  • Extraction 1 is designed for intervention reviews with a standardized PICO(T) structure, as it offers a structured format for organized data collection, which makes meta-analysis easier. This structure allows it to automatically fill in data extraction fields with suggestions you can review. Results can be exported to CSV, Excel, and RevMan.
  • Extraction 2 offers an unstructured format for flexible data collection and is fully customizable. It doesn’t offer automated extraction suggestions and only exports to CSV.

Learn more about data extraction and templates for these two options in the Covidence Knowledge Base. If you prefer to be hands-on, Covidence offers a demo review, and you can test both extraction options there before choosing which one is best for your project.

Learn more about reviews, Covidence, and the way MSK librarians can support you within the guide to our Systematic Review Service.