MSK Chatbots Can’t Perform a Literature Search

MSK now offers employees access to Open WebUI, a source for several chatbots available for workplace use. But if you think this tool can be used for searching the literature, think again.

What is Open WebUI and How Do I Access It?

This portal is “a proprietary, user-friendly, and PHI-secure portal where staff can access a wide array of popular large language models (LLMs) as well as tools for experienced developers behind the MSK firewall.”

To access:

  1. Log on to the VPN or be onsite
  2. Visit https://chat.aicopilot.aws.mskcc.org/
  3. Select “Continue with MSK PingID” if prompted
  4. You’ll then get the message “Account Activation Pending” followed by “Contact Admin for WebUI Access.”

No further action is needed and contacting admin isn’t necessary. You will not get confirmation once your account has been activated. But once it has, visiting the URL while onsite or on the VPN will take you to the tools.

Open WebUI includes the following chatbots:

Chatbot Description
Amazon Nova Pro A reasoning model for general analysis and summarization. Knowledge cutoff date: Unknown
Claude Sonnet 3.5 A general-use model by Anthropic. Effective with code generation. Knowledge cutoff date: April 15th, 2024
Claude Sonnet 3.7 Improved version of Sonnet 3.5, and also targets code generation as a differentiator. Knowledge cutoff date: October 2024
Claude Sonnet 4 High intelligence and balanced performance. Good for complex coding/debugging, detailed explanations, and documentation review. Detailed prompts recommended. Knowledge cutoff date: January 2024
DeepSeek R1 A reasoning model for logical inference, math problem-solving, code generation, or text-based clinical reasoning. Cannot process images. Knowledge cutoff date: October 2023
OpenAI o1 A reasoning model that thinks before it answers, making it suitable for deep analysis, task breakdown, or image-based clinical analysis. Knowledge cutoff date: October 2023
OpenAI GPT-4o A general-purpose model that balances quality, speed, and cost-effectiveness. Knowledge cutoff date: October 2023

You can toggle between tools on the top left of the page and click the “set as default” option under a tool name after you’ve selected it.

Why Can’t I Use These Tools to Perform a Literature Search?

When you ask Amazon Nova Pro to perform a literature search, it appears to do so:

A screenshot of Amazon Nova Pro appearing to summarize the literature in response to a prompt.

However, a follow-up question reveals that all is not as it seems, and that any citations provided are likely not real:

Amazon Nova Pro answering a prompt asking if it searched databases to come up with its answer. It says it did not.

Other tools are clearer about their limitations from the start:

OpenAI o1 saying it does not have database access and giving advice on how to search.
Claude Sonnet 3.7 saying it does not have database access and recommending speaking to a librarian.

What Should I Do Instead?

There are AI tools that specialize in searching the literature, but even these are typically limited to open-source texts. Use these tools cautiously, perhaps in the brainstorming and planning stages of a project.

As an alternative, we welcome you to contact us to request a literature search.

Want to learn more about the use of AI for literature searching? Sign up for our next class on August 19 from 12-1 pm.

How To Search for a Phrase in PubMed

Searching for phrases in PubMed can be an exercise in frustration. To understand why, we need to look at how PubMed interprets your search, a process that is now described in detail in a new training module from the National Library of Medicine.

When you enter a phrase into PubMed without using quotation marks, the database does several things:

  1. Looks for the phrase as a subject heading, or MeSH, term. Subject headings have been preselected by the database and are assigned to each citation on a topic.
  2. Breaks apart the phrase and looks for each word separately.
  3. Looks for the phrase, if recognized.
A graphic describing how PubMed Searches for a phrase.

How PubMed searches for a phrase. From the National Library of Medicine.

Sometimes, this process brings back the results you need. But it can also lead to search results that do not match your topic.

What happens when you search using quotation marks?

  • Instead of looking for matching subject headings, PubMed checks for the phrase in its phrase index, a list drawn from the literature included in the database.
  • Not every phrase is included in the phrase index. If your phrase is not found, PubMed may ignore your quotation marks and follow the search steps above, bringing in irrelevant results.
  • You can see how many results include your phrase from the advanced search page. Start typing your term, then click “Show index” to the right of the search box.
A screenshot showing how to check if a phrase is included in PubMed's phrase index.

Use the “Check index” button on the advanced search page to see if your phrase is included in PubMed’s Phrase index. From the National Library of Medicine.

Fortunately, there is a workaround if your phrase is not found. You can recommend the addition of phrases to PubMed. You can try searching for the phrase in different databases,  most of which are much more user-friendly when phrase searching. You can also try searching for the phrase in PubMed using adjacency.

For example, “Adult Non-Verbal Pain Scale” is not a phrase PubMed recognizes. Have PubMed look for the phrase with all words next to each other, in any order, by telling it to look in the title and abstract fields (tiab) with the words adjacent to each other (:~0):

The PubMed advanced search page showing the difference in results using quotation marks and proximity searching for a phrase.

Using proximity searching ([tiab:~0]) when a phrase is not included in PubMed’s phrase index can lead to more focused results compared to using quotation marks alone.

For more search help, or to request a literature search, contact the Library.

Join us for “Adventures in Text Mining: Applications, Ethics, and Cancer Care”

Promotional banner for Adventures in Text Mining eventJoin us for our webinar “Adventures in Text Mining: Applications, Ethics, and Cancer Care” on October 16 from 12:00 PM-1:00 PM Eastern Time.

What is Text Mining?
Text mining helps researchers sift through mountains of documents, clinical notes, and research papers to find important patterns and information quickly. Dr. Manika Lamba (Assistant Professor, School of Library and Information Studies, University of Oklahoma) will introduce the topic through the lens of her work in digital libraries and information organization.

Applications in Cancer Care
Dr. Anyi Li (Chief, Associate Attendings, Department of Medical Physics, Memorial Sloan Kettering) will explain how applying text mining technologies to clinical notes at MSK has automated radiation therapy processes, saving clinician time and allowing for risk event analysis and mitigation. He will address the ethical aspects of text mining in healthcare, including patient privacy and responsible data use.

Applications in the Published Literature
Text mining can allow researchers to analyze the vast volume of scientific literature. Dr. Zhiyong Lu (Senior Investigator, NIH/NLM, Deputy Director for Literature Search, NCBI) will showcase his work mining the literature in PubMed, which led to tools including the Best Match algorithm and LitCovid. 

Register now. All registrants will receive a link to the event recording, whether or not they can attend synchronously.

About the speakers:

Dr. Manika Lamba is an Assistant Professor at the School of Library and Information Studies, University of Oklahoma. Previously, she served as a Postdoctoral Research Associate at the HathiTrust Research Center, University of Illinois. Her research broadly falls under computational social science and science of science. She primarily focuses on using computational methods, such as text mining and machine learning, to provide better solutions for information retrieval and organization of digital libraries.

Dr. Anyi Li, Associate Attending Physicist and Chief of Computer Service at the Department of Medical Physics at MSK, leads a talented team comprising mathematicians, physicists, engineers, and data scientists. Together, they collaborate with the Division of Clinical Physics and the Department of Radiation Oncology to harness artificial intelligence, operational research algorithms, and big data. Their objective is to optimize radiation therapy plans, enhance the efficiency of the radiation treatment process from start to finish, develop a data platform for clinical decision support, and improve patient safety by managing accumulated radiation doses. They utilize the latest language models to analyze clinical event timelines and construct workflow knowledge graphs, which improve the radiation therapy workflow and provide valuable insights to the clinical team. With a background as a theoretical nuclear physicist and research scientist tackling NP-hard (nondeterministic polynomial time) problems, Dr. Li transitioned into big data engineering and AI, bringing experience from positions at Yahoo and IBM Watson Health.

Dr. Zhiyong Lu is a tenured Senior Investigator at the NIH/NLM IPR, leading research in biomedical text and image processing, information retrieval, and AI/machine learning. In his role as Deputy Director for Literature Search at NCBI, Dr. Lu oversees the overall R&D efforts to improve literature search and information access in resources like PubMed and LitCovid, which are used by millions worldwide each day. Additionally, Dr. Lu is Adjunct Professor of Computer Science at the University of Illinois Urbana-Champaign (UIUC). With over 400 peer-reviewed publications, Dr. Lu is a highly cited author, and a Fellow of the American College of Medical Informatics (ACMI) and the International Academy of Health Sciences Informatics (IAHSI).