Systematic Bulk Downloading of Articles from PubMed Central (PMC)

In this era of artificial intelligence (AI) and machine learning (ML), there is increased interest in accessing large numbers of full-text articles to train deep learning models and/or evaluate their performance. The U. S. National Library of Medicine (NLM)’s PubMed Central (PMC) full-text article repository is a popular choice with AI/ML researchers who are often looking for a free, openly accessible source of the scholarly biomedical literature. For a recent example of research carried out using the PMC Open Access Subset, see PMID: 37094464:

Although the NLM is generally accommodating of researchers using and even building upon all the tools and resources that it develops and supports, there is an expectation on the part of NLM that researchers will work within their rules and restrictions. Anyone interested in “automated retrieval of articles in machine-readable formats in PubMed Central (PMC)” is encouraged to explore the “several large datasets of journal articles and other scientific publications made available for retrieval under license terms that generally allow for more liberal redistribution and reuse than a traditional copyrighted work (e.g., Creative Commons licenses)”. However, there are “Restrictions on the Systematic Downloading of Articles”– see https://www.ncbi.nlm.nih.gov/pmc/tools/textmining/

When researchers try to bulk download a large amount of content via the regular PMC web interface on their own, PMC’s systems notice the increased activity and block the IP range(s) responsible as this is in violation of the terms of the PMC Copyright Notice which states that “Systematic downloading of batches of articles from the main PMC web site, in any way, is prohibited because of copyright restrictions.”

From: https://www.ncbi.nlm.nih.gov/pmc/about/copyright/:

PMC makes certain subsets of articles (i.e., the PMC Article Datasets) accessible through auxiliary services that may be used for automated retrieval and downloading. These are:

These services are the only services that may be used for this purpose. Do not use any other automated processes for downloading articles, even if you are only retrieving articles from the PMC Article Datasets (including the PMC Open Access Subset).

Questions? Be sure to Ask Us at the MSK Library!

Proximity Search Functionality Added to PubMed

With over 35 million records indexed in PubMed, finding exactly the information you need in an efficient way can often prove challenging for many searchers. To help with this, NLM recently added a new search capability to the PubMed search interface called “proximity searching”. In a nutshell, proximity searching is when a search interface allows the user to look for records containing two different search terms of interest, while specifying how far part these two terms can be from one another in the title and/or abstract of the citation record.

This relational specificity allows the searcher to conduct a broader search (with more search results returned) than they would if they were phrase searching. A proximity search would also return a narrower (smaller) set of results than if the two search terms were being picked up by the search engine having appeared anywhere in the text, regardless of the distance between each other.

This ability to increase the precision of search results is what makes proximity searching a useful capability to have in the PubMed search interface toolbox. For detailed instructions and screenshots illustrating how PubMed’s proximity searching works, be sure to check out the following links:

Questions?

Be sure to Ask Us or attend an upcoming MSK Library PubMed training session.

Race and Ethnicity-related 2022 MeSH changes

Over the last year, many stakeholders involved in scholarly publishing have been revisiting the terminology used for reporting race and ethnicity in biomedical literature, for example:

Flanagin A, Frey T, Christiansen SL; AMA Manual of Style Committee. Updated Guidance on the Reporting of Race and Ethnicity in Medical and Science Journals. JAMA. 2021 Aug 17;326(7):621-627. 

Flanagin A, Frey T, Christiansen SL, Bauchner H. The Reporting of Race and Ethnicity in Medical and Science Journals: Comments Invited. JAMA. 2021 Mar 16;325(11):1049-1052. 

In 2022, the National Library of Medicine, producer of PubMed/MEDLINE, also made changes to the Medical Subject Headings (MeSH) related to race and ethnicity, replacing multiple headings with more up-to-date terminology that better matches with the latest United States Census terminology. Among the 24 changes to MeSH headings this year were:

African Continental Ancestry Group >>>> Blacks
American Natives  >>>>  American Indians or Alaska Natives
Asian Continental Ancestry Group  >>>>  Asians
Continental Population Groups   >>>>  Racial Groups
Ethnic Groups >>>>  Ethnicity
European Continental Ancestry Group >>>> Whites
Hispanic Americans  >>>>  Hispanic or Latino
Oceanic Ancestry Group  >>>>  Native Hawaiian or Other Pacific Islander

Below is a more detailed view of how the MeSH Tree Structures were affected by the changes. To compare, here is the Population Groups Tree from MeSH 2021:
2022 MeSH replacements:

“Ethnicity”[Mesh]

A group of people with a common cultural heritage that sets them apart from others in a variety of social relationships.

“Racial Groups”[Mesh]

Groups of individuals with similar physical appearances often reinforced by cultural, social and/or linguistic similarities.

 

 

 

 

 

 

 


2022 MeSH additions:
(to MeSH trees other than “Population Groups”)

“Health Disparity, Minority and Vulnerable Populations”[Mesh] 

Groups of persons whose special characteristics make them a minority, vulnerable, and frequently subjected to conditions with limited levels of access to health care and other opportunities. (Most of the 2021 “Ethnic Groups” MeSH tree terms were moved here.) 

“Ethnic and Racial Minorities”[Mesh]

Socially constructed groups of people who differ in race, color or national, religious, or cultural origin from the dominant group and is often the majority population of the country in which they live. Ethnic minority groups generally share a common sense of identity and common characteristics such as language, religion, tribe, nationality, race, or a combination thereof.

 

 

 


The MeSH vocabulary is reviewed annually and revised on an “as needed” basis to best represent the latest subject matter appearing in the biomedical literature. It is not perfect and always a work in progress that grows and changes organically. Everyone is welcome to write to the NLM help desk to submit a request for a change or addition to the MeSH vocabulary.

Questions? Ask Us at the MSK Library.