RDM and COVID-19 (Part 4): Data Repositories and Publishers

This is the final post of a 4 part series of posts on Research Data Management and COVID-19. Click here for part 1, part 2, and part 3.

Generalist Data Repositories and Publishers have created some impressive collaborative, open access resources to support COVID-19 research. Even some traditionally for-pay companies have been contributing to the effort. Many of these repositories allow researchers to contribute their own data as well as provide platforms to enhance discoverability of datasets, code, and on-going projects. In late April, NIH hosted a webinar in which representatives from several generalist repositories spoke about their efforts to promote sharing, discovery, and citation guidance for COVID-19 data and code. The recorded webinar is available to the public: NIH/NLM Sharing, Discovering, and Citing COVID-19 Data and Code in Generalist Repositories Webinar. Here are a few of the COVID-19 resources advanced by generalist repositories:

  • Figshare: As a discipline agnostic data repository, Figshare provides a free platform (after creating a login) for researchers to store their datasets regardless of the data format. In response to COVID-19, they have created a subset of open datasets based on tags applied by data submitters and curated by Figshare staff. In addition to traditional numerical and quantity-based datasets, they encourage submission of code, presentations slides, conference posters, and other formats of information which might not typically be viewed as ‘data’, particularly since many conferences and meetings were canceled in light of the global pandemic.
    Figshare - Digital Science
  • Mendeley: This data repository provided by the publisher Elsevier, has also created a COVID-19 hub. They index numerous repositories, curating an aggregate of datasets including those deposited in their own repository. Access to the Elsevier Coronavirus Research Hub is free with registration to individual researchers wishing to use Mendeley to organize their own researh. Alternatively, COVID-19 datasets indexed in Mendeley are freely accessible to the public without an account. You can search or view them here.Mendeley Elsevier
  • Zenodo: Part of the CERN Against COVID-19 project, Zenodo, has allocated additional dedicated server space for COVID-19 research and beefed up its overall storage infrastructure in anticipation of expanded computing needs researchers. Their main ‘Coronavirus Disease Research Community’ page contains a simple but powerful search interface to discover information about COVID-19 projects and datasets. The database is powered by Elasticsearch and populated in part by the OpenAIRE harvesting gateway to bring in records for datasets from across the European research community. Zenodo has also added a librarian to their staff specifically dedicated to curating COVID-19 datasets.
    Zenodo
  • GitHub: The open code repository and versioning tool, GitHub, has aggregated public COVID-19 projects to aid in discoverability and reduce collaboration barriers. Based on tags added by contributors to their own repos, GitHub has a weekly ingest to add projects to their covid-19-repo-data. The README page of this repository contains additional documenation about how to view, search, and use code and data in this collection.
    GitHub
  • Dryad: Although Dryad, another public platform for storing and sharing research data, does not have a particular COVID-19 portal, they have engaged in specific initiatives related to supporting open science during this pandemic. Dryad supports building standards for COVID-19 data in an effort to prepare the data for future use. They are also offering guidance for researchers in appropriate handling of Protected Health Information (PHI), since many researcher have shifted to working with human subjects for the first time.Dryad Digital Repository | DataONE
  • Vivli: For clinical trial, Vivli has created a COVID-19 portal to encourage data submission and sharing from participant-level clinical trials. A challenging aspect to this has been maintaining a balance between openness, speed, and privacy. Vivli is waiving fees for submission and sharing of COVID-19 datasets.
    Vivli - Center for Global Clinical Research Data

We hope the resources highlighted in this series of Research Data Management and COVID-19 will keep any interested researchers busy exploring and potentially contributing to the global effort to understand, treat, and eradicate COVID-19!

Just remember that if you decide to take part in this tremendous collaborative effort, all data included in a publication must be cited. If you’re unsure of how to do that or would like guidance in finding datasets, using data tools, or sharing your data, please reach out to Anthony Dellureficio, Associate Librarian for Data Management Services.