Hakai data package content recommendations:
Describe thoroughly the field, lab, and data processing protocols used to produce your data. This could be done in a Readme.txt, a .pdf, or a previous publication defining your methods.
Create a Data Dictionary (.txt or .csv file). This describes each variable in every table of your data package. Include variable name, units, description.
Assign a version to your data package using a major. minor version ie. v2.1. The semantic versioning of the data package should match the semantic versioning included in the metadata record in the Hakai Data Catalogue.
Create a changelog in a .txt file to keep track of what changes have occurred since the last version. Follow this guide to keep a changelog.
Include all your data tables as plain text files (.csv, .txt, .tsv).
Include any scripts that were used to clean up, filter data from the raw data, calculate values in the final data package, or example scripts to join data.
If applicable, the data package should contain an Archive folder to house older/previous versions of the data package or time-series data.
Include literature referenced, equipment manuals, anything relevant to your methods.
If your data package has numerous tables that fit together in a relational database structure, include a diagram such as an Entity Relationship Diagram to describe hierarchical relationships of tables.
A folder containing the standardized data that is published to an open-access, domain-specific repository.
Domain-specific Repositories/Databases/Knowledge Bases
In addition to storing data packages in the Hakai GitHub Data Repository, each Hakai affiliated project should determine whether (a subset of) their data can (or should) be mobilized to a global repository. Publishing data to a domain-specific repository may require special formatting or structuring to ensure that data are interoperable with other datasets on the platform. The effort to transform your data can be worthwhile because it increases the reach of your science and should result in more citations and recognition for your scientific work and is an important facet of the work Hakai does in the public interest.
Publishing Data to CIOOS
Datasets produced by Hakai that include Essential Ocean Variables (EOVs) should be published to the Canadian Integrated Ocean Observing System (CIOOS). The workflow for publishing a metadata record to the CIOOS Data Catalogue, starts with the same process for publishing data to the Hakai Data Catalogue using the Hakai metadata intake form and publishing a data package on the GitHub repository. Often, only a subset of the full data package can be mobilized to a domain-specific repository. It is recommended to create a data package for the overall, complete, processed data, storing this in the Hakai GitHub repository. The data package should include a separate folder that contains the standardized data file. This data file should then be published to the identified repository.
The metadata intake form will identify which metadata records should be published to the CIOOS Data Catalogue based on the presence of EOVs. However, there is the additional step of transforming data and including links to standardized data in the metadata record. CIOOS requires data to be hosted on specific platforms requiring transformation and standardization to ensure they are interoperable. CIOOS recommends the Ocean Biodiversity Information System (OBIS) for biological species occurrences and ERDDAP for physical and biogeochemical oceanographic data. These repositories will require data to be standardized to a format and may require additional metadata and/or the creation of an additional metadata record in that repository. If that is the case, just ensure that the metadata record in the Hakai Data Catalogue includes a link to the domain-specific metadata record as well as a link to the Hakai GitHub Repository where the complete dataset is hosted.
Workflows for publishing data to domain-specific repositories have been outlined in other sections (see e.g. 05 - OBIS and GBIF Best-practices for publishing biological occurrence data to OBIS). To begin the process of transforming your dataset for a domain-specific repository, contact email@example.com for guidance.
Restricted data might include e.g. data that is (partially) collected in collaboration with First Nation partners, on their ancestral lands, or include sensitive data e.g. information on endangered species. As such, data providers might choose to not make this data publicly accessible. When data contains restricted or sensitive data which should not be publicly accessible, this needs to be disclosed in a data management plan (DMP). Alternatively, this information can be captured in a research agreement or a memorandum of understanding (MoU) between Hakai Institute (affiliates) and involved parties. It is recommended for metadata records in the Hakai Data Catalogue to link to the compressed data package that only contains the publicly accessible data, and include the limitations to the dataset in the specific field of the Metadata Intake Form (e.g. endangered species occurrence data is omitted from the dataset).
In the case of sensitive data that First Nations have collected in collaboration with Hakai, the Local Contexts initiative may be a suitable avenue to facilitate the right of Data Sovereignty and self-determination in how datasets may be licensed or restricted.