Publishing Data

Data Repositories

Generally, a data repository refers to a data storage entity. There are two categories of data repositories that are recommended for Hakai researchers or affiliates to publish data to.

Hakai Institute Data Catalogue and Data Repository

Metadata (data about data) are stored in the Hakai Data Catalogue in the form of metadata records that data providers can create using the Hakai Metadata Intake Form. The metadata record contains important and standardized information about data that ensure it is broadly discoverable and accessible. The Hakai Data Catalogue is not itself a database or data repository but rather an index for where datasets or data packages are stored. Therefore, in the Resources section of the metadata intake form you must provide a link to where your data package is stored. Figure 1 details the workflow for creating metadata records for the Hakai Data Catalogue and publishing data. It is recommended that Hakai data packages be hosted in the Hakai Institute GitHub Repository according to the data package content recommendations below. See Hosting Data on GitHub for more information. 

Figure 1. Hakai Data Publishing Decision Tree for creating a metadata record in the Hakai Data Catalogue and publishing data to domain-specific, open-access data repositories.

Hakai data package content recommendations:


Domain-specific Repositories/Databases/Knowledge Bases

In addition to storing data packages in the Hakai GitHub Data Repository, each Hakai affiliated project should determine whether (a subset of) their data can (or should) be mobilized to a global repository. Publishing data to a domain-specific repository may require special formatting or structuring to ensure that data are interoperable with other datasets on the platform. The effort to transform your data can be worthwhile because it increases the reach of your science and should result in more citations and recognition for your scientific work and is an important facet of the work Hakai does in the public interest.

Publishing Data to CIOOS 

Datasets produced by Hakai that include Essential Ocean Variables (EOVs) should be published to the Canadian Integrated Ocean Observing System (CIOOS). The workflow for publishing a metadata record to the CIOOS Data Catalogue, starts with the same process for publishing data to the Hakai Data Catalogue using the Hakai metadata intake form and publishing a data package on the GitHub repository. Often, only a subset of the full data package can be mobilized to a domain-specific repository. It is recommended to create a data package for the overall, complete, processed data, storing this in the Hakai GitHub repository. The data package should include a separate folder that contains the standardized data file. This data file should then be published to the identified repository. 

The metadata intake form will identify which metadata records should be published to the CIOOS Data Catalogue based on the presence of EOVs. However, there is the additional step of transforming data and including links to standardized data in the metadata record. CIOOS requires data to be hosted on specific platforms requiring transformation and standardization to ensure they are interoperable. CIOOS recommends the Ocean Biodiversity Information System (OBIS) for biological species occurrences and ERDDAP for physical and biogeochemical oceanographic data. These repositories will require data to be standardized to a format and may require additional metadata and/or the creation of an additional metadata record in that repository. If that is the case, just ensure that the metadata record in the Hakai Data Catalogue includes a link to the domain-specific metadata record as well as a link to the Hakai GitHub Repository where the complete dataset is hosted.

Workflows for publishing data to domain-specific repositories have been outlined in other sections (see e.g. 05 - OBIS and GBIF Best-practices for publishing biological occurrence data to OBIS). To begin the process of transforming your dataset for a domain-specific repository, contact for guidance.

Restricted data

Restricted data might include e.g. data that is (partially) collected in collaboration with First Nation partners, on their ancestral lands, or include sensitive data e.g. information on endangered species. As such, data providers might choose to not make this data publicly accessible. When data contains restricted or sensitive data which should not be publicly accessible, this needs to be disclosed in a data management plan (DMP). Alternatively, this information can be captured in a research agreement or a memorandum of understanding (MoU) between Hakai Institute (affiliates) and involved parties. It is recommended for metadata records in the Hakai Data Catalogue to link to the compressed data package that only contains the publicly accessible data, and include the limitations to the dataset in the specific field of the Metadata Intake Form (e.g. endangered species occurrence data is omitted from the dataset).

In the case of sensitive data that First Nations have collected in collaboration with Hakai, the Local Contexts initiative may be a suitable avenue to facilitate the right of Data Sovereignty and self-determination in how datasets may be licensed or restricted.