Citing Code via GitHub

As we were taught in school, whenever someone quotes, paraphrases, summarizes, or otherwise references another scholar’s research, they must properly attribute that research with a citation in their work. This same rule applies to code!

Citing codes is not only required as part of the publication process, its value also includes:

  • contributing to ethical and transparent science,
  • recognizing the contributions of programmers to a research project,
  • tracking reuse of code over time, and
  • reinforcing the value of non-traditional bibliographic research outputs (like code, datasets, and software).

Code can be challenging to cite because the traditional bibliographic elements are not always readily apparent. Often the only citation information in a code repo has to be garnered from a README.md file or from the original publication that references that code, if such a publication exists.

If you are maintaining your code in GitHub, you have a few options to encourage proper citation by self-identifying contributors and citation elements.

DOI for Code. In 2016, GitHub partnered with Zenodo, the CERN-operated open-source data repository, to mint Digital Object Identifiers (DOI) for archived repos. A DOI is a persistent identifier registered in an internationally recognized database which gives your code (or data) a disambiguated, permanent redirect. DOIs are a great first step in ensuring that the correct version of code is being clearly identified with proper attribution.

To take advantage of this, create a free account with Zenodo and be prepared to archive a specific version of your code. Read more information on how to generate the webhooks between your repos and Zenodo! 

Citation Support for Code. Recently (August 2021), GitHub announced enhanced support for citation adding a ruby-cff RubyGem to their code to incorporate .cff citation files. Adding a CITATION.cff file to one’s GitHub repository lets the owner identify attribution elements, and automatically generates a simple ‘Cite this repository’ button in the repo with APA and BibTex citation formatting.

Some of the elements a repo owner can include are:

  • code author names,
  • author ORCID iDs,
  • preferred software name,
  • DOI, and
  • other info related to date and version.

In particular, ORCID iDs and DOIs have value as disambiguation elements which ensure that credit is correctly identified. Read more information on how set up citation support in GitHub!  or Schema elements for .cff

If you need help understanding how to set this up or want to discuss how you can get and/or give proper citation to code, data, or software, please reach out to Anthony Dellureficio, Associate Librarian, Research Data Management.

MSK Data Catalog: We’ve Reached a Milestone!

The Library’s Research Data Management team is happy to announce that thanks to the efforts of our cataloging crew, we’ve reached a milestone of 200+ datasets in the MSK Data Catalog!

The MSK Data Catalog employs enhanced metadata to help increase discoverability of MSK research data, connect researchers working on similar topics, and describe how one can access publicly available datasets. Some of the features include:

  • Application of taxonomies, such as OncoTree and MeSH (Medicine Subject Headings),
  • Identification of analytical tools and software used to create or manipulate data,
  • Filters by subject/repository/author/etc.,
  • Persistent links to datasets and, wherever possible, DOIs (Digital Object Identifiers),
  • Connections to Synapse for data authors and associated publications,
  • Technical info, such as size and format of datasets,
  • Instructions on how to access datasets.

Many of the records we’ve recently added describe MSK datasets in the cBioPortal, Gene Expression Ombinus, dbGap, and the Protein Databank. If you’d like to know what the Library can do to help you increase the discoverability of your research data, please reach out to us!

Explore the New Research Data Management Webpage

Curious about what services the Library offers to help you find, manage, and share your research data? We now have a webpage dedicated to providing information about the Library’s Research Data Management services and resources. Learn about our Data Catalog, get help understanding publisher data sharing requirements, or schedule a consultation to discuss best practices. We’ll be updating this page regularly as we bring more services online. 

We’re here to help you throughout the life of your research from planning to publication, and beyond.