From a PubMed Record to NCBI’s Gene information Portal

PubMed – the National Library of Medicine’s database of biomedical literature that will celebrate its 30th anniversary in January 2026 – is an incredible resource that’s freely available to all.

So much so in fact that many other search tools, including Google Scholar and most of the generative AI research assistants that are popping up at a dizzying speed, heavily rely on PubMed for their content needs, especially since the National Center for Biotechnology Information (NCBI) has always been eager to build APIs and other tools to help facilitate collaborative relationships with developers of other research tools.

One of the downsides, however, of discovering PubMed’s content solely via other search engines and tools is that users miss out on some of the incredible value-added links to other information that appear within each PubMed record. This is particularly true for searches on topics with a genetic information or bioinformatics aspect.

Take this example (inspired by NLM training exercises):

You are interested in exploring how the CYP2r1 gene might impact vitamin D deficiency risk.

A basic search in PubMed might look something like this:

Clicking on the Title to view the Full Abstract view, users can scroll below the abstract text to see the MeSH terms and other Related Information – see:

The Related Information links include a link to NCBI’s Gene information portal which:

“integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.”

The Gene record can also include GeneRIFs or a “Gene Reference into Function”.
See https://www.ncbi.nlm.nih.gov/gene/about-generif for a more detailed description.

“GeneRIF provides a simple mechanism to allow scientists to add to the functional annotation of genes described in Gene.”

As per https://www.ncbi.nlm.nih.gov/books/NBK3841/#EntrezGene.Bibliography:

“A GeneRIF is a concise phrase describing a function or functions of a gene, with the PubMed citation supporting that assertion.”

Filtering out the references that address a specific gene’s function can be a useful time-saver when literature searching.

For those who find the Gene records a bit overwhelming and prefer to stay within the familiar PubMed environment, limiting PubMed search results to those items that have been added as Gene RIFs can be filtered out in a PubMed search by adding “pubmed gene rif” [Filter].

For example, adding it to the PubMed search string:

“CYP2R1 gene” AND “vitamin D” AND “pubmed gene rif” [Filter]

If you have any questions or want additional guidance on designing specialized literature searches, feel free to Ask Us at the MSK Library.

How To Search for a Phrase in PubMed

Searching for phrases in PubMed can be an exercise in frustration. To understand why, we need to look at how PubMed interprets your search, a process that is now described in detail in a new training module from the National Library of Medicine.

When you enter a phrase into PubMed without using quotation marks, the database does several things:

  1. Looks for the phrase as a subject heading, or MeSH, term. Subject headings have been preselected by the database and are assigned to each citation on a topic.
  2. Breaks apart the phrase and looks for each word separately.
  3. Looks for the phrase, if recognized.
A graphic describing how PubMed Searches for a phrase.

How PubMed searches for a phrase. From the National Library of Medicine.

Sometimes, this process brings back the results you need. But it can also lead to search results that do not match your topic.

What happens when you search using quotation marks?

  • Instead of looking for matching subject headings, PubMed checks for the phrase in its phrase index, a list drawn from the literature included in the database.
  • Not every phrase is included in the phrase index. If your phrase is not found, PubMed may ignore your quotation marks and follow the search steps above, bringing in irrelevant results.
  • You can see how many results include your phrase from the advanced search page. Start typing your term, then click “Show index” to the right of the search box.
A screenshot showing how to check if a phrase is included in PubMed's phrase index.

Use the “Check index” button on the advanced search page to see if your phrase is included in PubMed’s Phrase index. From the National Library of Medicine.

Fortunately, there is a workaround if your phrase is not found. You can recommend the addition of phrases to PubMed. You can try searching for the phrase in different databases,  most of which are much more user-friendly when phrase searching. You can also try searching for the phrase in PubMed using adjacency.

For example, “Adult Non-Verbal Pain Scale” is not a phrase PubMed recognizes. Have PubMed look for the phrase with all words next to each other, in any order, by telling it to look in the title and abstract fields (tiab) with the words adjacent to each other (:~0):

The PubMed advanced search page showing the difference in results using quotation marks and proximity searching for a phrase.

Using proximity searching ([tiab:~0]) when a phrase is not included in PubMed’s phrase index can lead to more focused results compared to using quotation marks alone.

For more search help, or to request a literature search, contact the Library.

Join us for “Adventures in Text Mining: Applications, Ethics, and Cancer Care”

Promotional banner for Adventures in Text Mining eventJoin us for our webinar “Adventures in Text Mining: Applications, Ethics, and Cancer Care” on October 16 from 12:00 PM-1:00 PM Eastern Time.

What is Text Mining?
Text mining helps researchers sift through mountains of documents, clinical notes, and research papers to find important patterns and information quickly. Dr. Manika Lamba (Assistant Professor, School of Library and Information Studies, University of Oklahoma) will introduce the topic through the lens of her work in digital libraries and information organization.

Applications in Cancer Care
Dr. Anyi Li (Chief, Associate Attendings, Department of Medical Physics, Memorial Sloan Kettering) will explain how applying text mining technologies to clinical notes at MSK has automated radiation therapy processes, saving clinician time and allowing for risk event analysis and mitigation. He will address the ethical aspects of text mining in healthcare, including patient privacy and responsible data use.

Applications in the Published Literature
Text mining can allow researchers to analyze the vast volume of scientific literature. Dr. Zhiyong Lu (Senior Investigator, NIH/NLM, Deputy Director for Literature Search, NCBI) will showcase his work mining the literature in PubMed, which led to tools including the Best Match algorithm and LitCovid. 

Register now. All registrants will receive a link to the event recording, whether or not they can attend synchronously.

About the speakers:

Dr. Manika Lamba is an Assistant Professor at the School of Library and Information Studies, University of Oklahoma. Previously, she served as a Postdoctoral Research Associate at the HathiTrust Research Center, University of Illinois. Her research broadly falls under computational social science and science of science. She primarily focuses on using computational methods, such as text mining and machine learning, to provide better solutions for information retrieval and organization of digital libraries.

Dr. Anyi Li, Associate Attending Physicist and Chief of Computer Service at the Department of Medical Physics at MSK, leads a talented team comprising mathematicians, physicists, engineers, and data scientists. Together, they collaborate with the Division of Clinical Physics and the Department of Radiation Oncology to harness artificial intelligence, operational research algorithms, and big data. Their objective is to optimize radiation therapy plans, enhance the efficiency of the radiation treatment process from start to finish, develop a data platform for clinical decision support, and improve patient safety by managing accumulated radiation doses. They utilize the latest language models to analyze clinical event timelines and construct workflow knowledge graphs, which improve the radiation therapy workflow and provide valuable insights to the clinical team. With a background as a theoretical nuclear physicist and research scientist tackling NP-hard (nondeterministic polynomial time) problems, Dr. Li transitioned into big data engineering and AI, bringing experience from positions at Yahoo and IBM Watson Health.

Dr. Zhiyong Lu is a tenured Senior Investigator at the NIH/NLM IPR, leading research in biomedical text and image processing, information retrieval, and AI/machine learning. In his role as Deputy Director for Literature Search at NCBI, Dr. Lu oversees the overall R&D efforts to improve literature search and information access in resources like PubMed and LitCovid, which are used by millions worldwide each day. Additionally, Dr. Lu is Adjunct Professor of Computer Science at the University of Illinois Urbana-Champaign (UIUC). With over 400 peer-reviewed publications, Dr. Lu is a highly cited author, and a Fellow of the American College of Medical Informatics (ACMI) and the International Academy of Health Sciences Informatics (IAHSI).