ClinicalTrials.gov – Discovery Tool and Research Data Source

As ClinicalTrials.gov celebrates its 25th anniversary, reaches its half-million registered studies milestone, and completes its modernization, it’s a good time to appreciate this invaluable research tool that has been around since 2000. In 2008, NLM launched the ClinicalTrials.gov results database, which now (as of 12/2024) has >70K registered studies posted with results.

Openly available to all with “about 90 thousand visitors per day and 2 million unique visitors every month”, ClinicalTrials.gov is a registry where individuals can identify both ongoing and completed registered trials from “50 States and in 229 countries and territories”.

Some functionality that has been added over the last few years (related to how you can search the database using Complex Search Queries and how you can download and use the search results/records from ClinicalTrials.gov) has made this database increasingly attractive as a data source for answering research questions.

From: https://clinicaltrials.gov/find-studies 

In addition to having search functionality that allows for very precise searching, it is now possible to download search results from ClinicalTrials.gov in the RIS file format that can be imported into citation management tools like EndNote and Covidence (used for managing systematic review projects).

It is important to note that the data fields included in the RIS download (which is not customizable), differ from those included in the CSV file download data fields (which a user can select from a menu of options), which differ from the JSON format (which can include every available data field for each study being downloaded). The ClinicalTrials.gov API option allows the ClinicalTrials.gov database to be accessed on a large scale, automated way by researchers and developers.

From: https://clinicaltrials.gov/find-studies/how-to-use-search-results

Examples of research projects that have leveraged ClinicalTrials.gov data:

  1. Alhajahjeh A, Rotter LK, Stempel JM, Grimshaw AA, Bewersdorf JP, Blaha O, Kewan T, Podoltsev NA, Shallis RM, Mendez L, Stahl M, Zeidan AM. Global Disparities in the Characteristics and Outcomes of Leukemia Clinical Trials: A Cross-Sectional Study of the ClinicalTrials.gov Database. JCO Glob Oncol. 2024 Nov;10:e2400316. doi: 10.1200/GO-24-00316. Epub 2024 Dec 2. PMID: 39621951.

  2. Chen D, Parsa R, Chauhan K, Lukovic J, Han K, Taggar A, Raman S. Review of brachytherapy clinical trials: a cross-sectional analysis of ClinicalTrials.gov. Radiat Oncol. 2024 Feb 13;19(1):22. doi: 10.1186/s13014-024-02415-8. PMID: 38351013; PMCID: PMC10863227.

  3. Falade AS, Adeoye O, Van Loon K, Buckle GC. Clinical Trials in Gastroesophageal Cancers: An Analysis of the Global Landscape of Interventional Trials From ClinicalTrials.gov. JCO Glob Oncol. 2024 Aug;10:e2400169. doi: 10.1200/GO.24.00169. PMID: 39173083.

  4. Pearce FJ, Cruz Rivera S, Liu X, Manna E, Denniston AK, Calvert MJ. The role of patient-reported outcome measures in trials of artificial intelligence health technologies: a systematic evaluation of ClinicalTrials.gov records (1997-2022). Lancet Digit Health. 2023 Mar;5(3):e160-e167. doi: 10.1016/S2589-7500(22)00249-7. PMID: 36828608.

  5. Yang A, Baxi S, Korenstein D. ClinicalTrials.gov for Facilitating Rapid Understanding of Potential Harms of New Drugs: The Case of Checkpoint Inhibitors. J Oncol Pract. 2018 Feb;14(2):72-76. doi: 10.1200/JOP.2017.025114. Epub 2018 Jan 3. PMID: 29298113; PMCID: PMC5812307.

Questions? Ask Us at the MSK Library!

How To Search for a Phrase in PubMed

Searching for phrases in PubMed can be an exercise in frustration. To understand why, we need to look at how PubMed interprets your search, a process that is now described in detail in a new training module from the National Library of Medicine.

When you enter a phrase into PubMed without using quotation marks, the database does several things:

  1. Looks for the phrase as a subject heading, or MeSH, term. Subject headings have been preselected by the database and are assigned to each citation on a topic.
  2. Breaks apart the phrase and looks for each word separately.
  3. Looks for the phrase, if recognized.
A graphic describing how PubMed Searches for a phrase.

How PubMed searches for a phrase. From the National Library of Medicine.

Sometimes, this process brings back the results you need. But it can also lead to search results that do not match your topic.

What happens when you search using quotation marks?

  • Instead of looking for matching subject headings, PubMed checks for the phrase in its phrase index, a list drawn from the literature included in the database.
  • Not every phrase is included in the phrase index. If your phrase is not found, PubMed may ignore your quotation marks and follow the search steps above, bringing in irrelevant results.
  • You can see how many results include your phrase from the advanced search page. Start typing your term, then click “Show index” to the right of the search box.
A screenshot showing how to check if a phrase is included in PubMed's phrase index.

Use the “Check index” button on the advanced search page to see if your phrase is included in PubMed’s Phrase index. From the National Library of Medicine.

Fortunately, there is a workaround if your phrase is not found. You can recommend the addition of phrases to PubMed. You can try searching for the phrase in different databases,  most of which are much more user-friendly when phrase searching. You can also try searching for the phrase in PubMed using adjacency.

For example, “Adult Non-Verbal Pain Scale” is not a phrase PubMed recognizes. Have PubMed look for the phrase with all words next to each other, in any order, by telling it to look in the title and abstract fields (tiab) with the words adjacent to each other (:~0):

The PubMed advanced search page showing the difference in results using quotation marks and proximity searching for a phrase.

Using proximity searching ([tiab:~0]) when a phrase is not included in PubMed’s phrase index can lead to more focused results compared to using quotation marks alone.

For more search help, or to request a literature search, contact the Library.

Google Dataset Search, a dataset-discovery tool 

With data sharing increasingly being encouraged in academic research and datasets increasingly being added to data repositories and being published on the Web, it makes sense that a Web browser company like Google would dedicate resources towards the goal of developing a Web discovery tool that is optimized for finding datasets.

How does it work?

Google Dataset Search, a dataset-discovery tool, basically uses Google’s web crawl technology to search for datasets that have been made available on the Web, identifying them based on their metadata (standardized descriptions of the datasets added to them by their owners/publishers).“ Google’s Dataset Search extracts dataset metadata—expressed using schema.org and similar vocabularies—from Web pages in order to make datasets discoverable.”

For an in-depth overview of how Google Dataset Search has been developed – please see:

Sostek, Katrina, Daniel M. Russell, Nitesh Goyal, Tarfah Alrashed, Stella Dugall, and Natasha Noy. “Discovering datasets on the web scale: Challenges and recommendations for Google Dataset Search.” Harvard Data Science Review Special Issue 4 (2024).

How can you search it?

To get started with using Google Dataset Search, go to: Dataset Search at https://datasetsearch.research.google.com/

If you are looking for something specific, you can refine your search results by limiting your search to a particular website domain (for example, site:nih.gov) or adding additional terms to your search. You can also filter your results by when the dataset was last updated, by format, by usage rights, topic/discipline, and whether the dataset is freely-available. Furthermore, you can save your search results, link-out to the external source website where you can download the datasets, and you can easily cite the dataset by copying the citation information that is generated when you click on the citation button (i.e. the quotation mark button).

To learn more – see:

Dataset Search Quick Start Guide –
https://newsinitiative.withgoogle.com/resources/trainings/dataset-search-quickstart-guide/

User Support Center – https://datasetsearch.research.google.com/help

Dataset Developer Page –
https://developers.google.com/search/docs/appearance/structured-data/dataset

How is it being used?

It appears that biomedical researchers have already started using Google Dataset Search in their scholarly projects. Some examples focusing on finding image datasets include:

  1. Abbad Andaloussi M, Maser R, Hertel F, Lamoline F, Husch AD. Exploring adult glioma through MRI: A review of publicly available datasets to guide efficient image analysis. Neurooncol Adv. 2025;7(1):vdae197. Epub 20250128. doi: 10.1093/noajnl/vdae197. PubMed PMID: 39877749; PMCID: PMC11773385.

  2. Rozhyna A, Somfai GM, Atzori M, DeBuc DC, Saad A, Zoellin J, Müller H. Exploring Publicly Accessible Optical Coherence Tomography Datasets: A Comprehensive Overview. Diagnostics (Basel). 2024;14(15). Epub 20240801. doi: 10.3390/diagnostics14151668. PubMed PMID: 39125544; PMCID: PMC11312046.

  3. Wen D, Khan SM, Ji Xu A, Ibrahim H, Smith L, Caballero J, Zepeda L, de Blas Perez C, Denniston AK, Liu X, Matin RN. Characteristics of publicly available skin cancer image datasets: a systematic review. Lancet Digit Health. 2022;4(1):e64-e74. Epub 20211109. doi: 10.1016/s2589-7500(21)00252-1. PubMed PMID: 34772649.

  4. Khan SM, Liu X, Nath S, Korot E, Faes L, Wagner SK, Keane PA, Sebire NJ, Burton MJ, Denniston AK. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. Lancet Digit Health. 2021;3(1):e51-e66. Epub 20201001. doi: 10.1016/s2589-7500(20)30240-5. PubMed PMID: 33735069.

Questions? Ask Us at the MSK Library!