Scientific Writing Resources

As generative AI tools have become increasingly available to academic researchers, so too have the reports of GPT-fabricated scientific papers creeping into the public scholarly record, for example, this 2024 report from the Harvard Kennedy School:

GPT-fabricated scientific papers on Google Scholar: Key features, spread, and implications for preempting evidence manipulation | HKS Misinformation Review

Developing strong scientific writing skills has always been an important component of graduate training in the basic sciences, however, not all scientific authors have the same degree of exposure to writing classes and authorship opportunities. As the burden of recognizing fake papers is falling more and more on the readers of scientific works, there couldn’t be a better way to protect yourself against fraudulent articles than by becoming an expert at scientific writing yourself.

Here’s some resources to explore if you wish to develop your scientific writing skills:

1) E-books from the MSK Library’s collection and full-text book chapters available online

2)     Duke Graduate School Scientific Writing Resource
https://sites.duke.edu/scientificwriting/
“The Scientific Writing Resource is online course material that teaches how to write effectively. The material is not about correctness (grammar, punctuation, etc.), but about communicating what you intend to the reader. It can be used either in a science class or by individuals. It is intended for science students at the graduate level.”

“This guide to scientific writing was originally created in 2010-2011 by Nathan Sheffield for the Duke University Graduate School and funded by a Duke University Graduate School Teaching mini-grant. This current site is maintained by the Duke Graduate School. If you have questions about this site, please contact gradschool@duke.edu.”

The MSK Library also provides access to writing support tools, including:

1)     Citation Management tools – https://libguides.mskcc.org/citationmanagement  

Find out about a variety of citation management software tools that can save you time when you are formatting your manuscript’s references and bibliography.


2)     Trinka AI – https://libguides.mskcc.org/trinka

“Trinka is an AI-powered writing assistant designed for academic and technical writing. Trinka corrects advanced grammar errors and contextual spelling mistakes by providing writing suggestions in real-time. It helps academicians write in a formal, concise, and engaging manner. In addition to correcting grammatical errors, Trinka allows you to paraphrase the text and improve consistency, enabling you to enhance the quality of your writing based on your requirements.”

3)     iThenticate – https://libguides.mskcc.org/ithenticate

“iThenticate is a tool for researchers and writers to check their original works for potential plagiarism. This resource will check against 93% of Top Cited Journal content and 70+ billion current and archived web pages.” 

Questions? Ask Us at the MSK Library!

NCI’s Cancer Data Science Course

With International Love Data Week 2025 just around the corner, you might be wondering how data science could be leveraged in your own cancer research projects. Luckily, the National Cancer Institute’s Center for Biomedical Informatics & Information Technology (CBIIT) has been developing some wonderful training resources designed to help clinical oncologists and cancer researchers build their basic cancer data science skills – see https://datascience.cancer.gov/training.

Whether you have the time available to dedicate to working through a multi-chapter video course or prefer the flexibility of jumping to particular topics of interest via the online training guides, there is something useful for all types of learners with different knowledge levels.

https://datascience.cancer.gov/training/learn-data-science

https://datascience.cancer.gov/training#howcan

https://datascience.cancer.gov/training/improve-data-science-skills

NCI’s basic skills video course is a great place for beginners to start. You can work through each chapter at your own pace, watching the videos, testing your knowledge, and exploring links to extensive lists of related materials. No registration required – just jump in and start learning – gaining data science skills as you go!

https://datascience.cancer.gov/training/improve-data-science-skills/video-course/chapter/data-science-myths

Questions? Ask Us at the MSK Library!

NIH Common Data Element (CDE) Repository

The practice of re-using a research survey or measurement instrument (while respecting copyright and giving proper attribution) – especially a validated one – is a common one that everyone  (especially research funders) can agree makes research more efficient and cost-effective.  For example, anyone familiar with the REDCap electronic data capture tool is likely aware of the REDCap Shared Library that “is a repository for REDCap data collection instruments and forms that can be downloaded and used by researchers at REDCap partner institutions“. Even NIH survey materials like the NIH’s All of US Programs are available for download from there.



Even though a data collection instrument in its entirety often cannot satisfy the unique needs of an original research project, it is still useful to collect the commonly-used individual data elements/variables of an instrument in a consistent and standardized way (that other researchers are also adopting in their own projects) because this makes the data collected for diverse studies more interoperable (i.e. increases the potential for this information to be shared/combined in future research projects).

And “the use of particular standards to enable interoperability of datasets” is an important component of the 2023 NIH Data Management and Sharing Policy, which aligns with the FAIR data principles – see: 

“NIH has issued the Data Management and Sharing (DMS) policy (effective January 25, 2023) to promote the sharing of scientific data. Sharing scientific data accelerates biomedical research discovery, in part, by enabling validation of research results, providing accessibility to high-value datasets, and promoting data reuse for future research studies.

This brings us to the NIH Common Data Element (CDE) Repository, which is “hosted and maintained by the National Library of Medicine (NLM)”. To encourage the use of Common Data Elements (CDEs) and make it easier for researchers to identify CDEs that might be useful for their research project, NLM has created this searchable repository/catalog that users can freely access online.

Users can search for individual CDEs or multiple CDEs that are curated into Forms. The search can also be limited to NIH-Endorsed CDEs, which are CDEs that have “been reviewed and approved by an expert panel, and meet established criteria”. Furthermore, “NIH-recognized bodies (institutes, research initiatives, etc.) may submit CDEs to the NIH CDE Governance Committee for consideration for endorsement” via the Repository’s homepage – see: 

NIH Common Data Element (CDE) Repository – https://cde.nlm.nih.gov/home

From the NIH CDE Repository User Guide: https://cde.nlm.nih.gov/guides

“The NIH CDE Repository uses the Unified Medical Language System (UMLS) Terminology Service (UTS) Sign on Service which lets you set up an account and sign in using your NIH credentials, your account with a research organization, or a personal account such as Google, Microsoft, or Login.gov.

A user account is not required to browse the NIH CDE Repository, but when you are signed in, you will have expanded access to features. User account holders can create Boards and save CDEs and Forms to them, remember your preferences on all your devices, and if approved, become a curator, and view/manage your organization’s content. Users with NIH credentials can choose to see CDEs of any registration status – including previewing draft CDEs that have not yet been published.”

NLM also offers these CDE training options where you can learn more:

Questions? Ask Us at the MSK Library!