Citing Code via GitHub

As we were taught in school, whenever someone quotes, paraphrases, summarizes, or otherwise references another scholar’s research, they must properly attribute that research with a citation in their work. This same rule applies to code!

Citing codes is not only required as part of the publication process, its value also includes:

  • contributing to ethical and transparent science,
  • recognizing the contributions of programmers to a research project,
  • tracking reuse of code over time, and
  • reinforcing the value of non-traditional bibliographic research outputs (like code, datasets, and software).

Code can be challenging to cite because the traditional bibliographic elements are not always readily apparent. Often the only citation information in a code repo has to be garnered from a README.md file or from the original publication that references that code, if such a publication exists.

If you are maintaining your code in GitHub, you have a few options to encourage proper citation by self-identifying contributors and citation elements.

DOI for Code. In 2016, GitHub partnered with Zenodo, the CERN-operated open-source data repository, to mint Digital Object Identifiers (DOI) for archived repos. A DOI is a persistent identifier registered in an internationally recognized database which gives your code (or data) a disambiguated, permanent redirect. DOIs are a great first step in ensuring that the correct version of code is being clearly identified with proper attribution.

To take advantage of this, create a free account with Zenodo and be prepared to archive a specific version of your code. Read more information on how to generate the webhooks between your repos and Zenodo! 

Citation Support for Code. Recently (August 2021), GitHub announced enhanced support for citation adding a ruby-cff RubyGem to their code to incorporate .cff citation files. Adding a CITATION.cff file to one’s GitHub repository lets the owner identify attribution elements, and automatically generates a simple ‘Cite this repository’ button in the repo with APA and BibTex citation formatting.

Some of the elements a repo owner can include are:

  • code author names,
  • author ORCID iDs,
  • preferred software name,
  • DOI, and
  • other info related to date and version.

In particular, ORCID iDs and DOIs have value as disambiguation elements which ensure that credit is correctly identified. Read more information on how set up citation support in GitHub!  or Schema elements for .cff

If you need help understanding how to set this up or want to discuss how you can get and/or give proper citation to code, data, or software, please reach out to Anthony Dellureficio, Associate Librarian, Research Data Management.

Pubmed Single Citation Matcher

Need to find a particular citation and don’t have the complete information about it?

Use a guided search with PubMed’s Single Citation Finder (the link is located on Pubmed main page in the Find category below the Search Box).

Single Citation Matcher guides you in entering information in pre-set search boxes dedicated to specific searchable fields in a Pubmed record, e.g. Journal, Title, Author.
If you don’t have all information about the article, enter only information you have at your disposal. The more information you enter the less search results you will get as your search will be more precise. Alternatively, if you enter very little information you will get more search results but you may still be able to get to the reference in question faster than by doing a general search in Pubmed.

You can use this tool for other purposes as well. For example, you can only use the Journal field to be able to browse the journal’s content, which you can do efficiently if your search results sorting order is Most Recent. To use the Journal field in Single Citation Matcher just start typing the name of the journal and then select the name, following the prompts.

Enjoy the convenience of this tool!

Embase: A Refresher

Embase, linked from the Library homepage under Top Databases, is a proprietary database, produced in Netherlands by Elsevier publishing company. It indexes journals in Medicine, Dentistry, Veterinary Science, Life Sciences, Public Health, Nursing, etc. While its coverage has significant overlap with PubMed (it actually indexes all of Medline), it also indexes a large number of international journals not found in PubMed. Embase also indexes supplements such as conference abstracts, clinical trials, and more.

Similar to PubMed’s MeSH terms, Embase also has the ability to map search terms to subject headings. Embase’s subject headings are called Emtree terms and their classification also has a hierarchical structure. One major difference between PubMed (MeSH) and Embase (Emtree) is that in PubMed narrower terms are automatically included, whereas in Embase, Emtree terms must be “exploded” to include all narrower terms found beneath a specific Emtree term.

Embase also includes some functions that are not found in PubMed, such as proximity searching, which besides AND, OR,and NOT, adds a layer to searches to make them more specific, using NEAR and NEXT.

Embase is typically one of the databases of choice used in searches when conducting a Systematic Review or a Meta-Analysis in biomedicine. It is recommended to specify the platform on which Embase was used. Embase is available on its native Elsevier platform (Embase.com) or on the OVID platform. MSKCC Library offers Embase on the Elsevier platform. The platform has an impact on the way the searches are conducted, so it is important to note when conducting systematic reviews.

Note: Starting July 1, 2021 Embase now requires signing into your Embase account to export citations to Endnote and other citation management tools. It is free to create an Embase/ Elsevier account and this login can be used for any Elsevier product (Embase, Scopus, etc.).