Skip to content

Publishing

Data Repositories

When publishing research outputs, including data, reports, model outputs, posters, conference presentations, software and protocols, Hakai researchers or affiliates can follow these three steps to adhere to Hakai's Open Science Policy:

  1. Create a Record in the Hakai Institute Metadata Catalogue. Research outputs produced or aggregated by Hakai employees, affiliates, postdocs or students funded by the Tula Foundation should be catalogued in the Hakai Metadata Catalogue using the Hakai Metadata Entry Tool.

  2. Store Data on a Secure, Backed-up and Collaborative Platform. Research outputs can be hosted on Hakai servers or external data platforms used by Hakai researchers for collaborative data sharing and/or long-term archiving. Selected platforms should fit the needs of the research project or Hakai Planning Group.

  3. Mobilize Data to Domain-specific Repositories. Wherever possible, data should be published to open-access, domain-specific data repositories that standardize data for interoperability in addition to being hosted for project collaboration or archiving and recorded in the Hakai Metadata Catalogue. These are data repositories that have well-developed community standards for data and metadata. Examples include CIOOS, ERDDAP, OBIS, Genbank, SOCAT, and HydroShare just to name a few. Discover a relevant domain-specific repository for your data at https://www.re3data.org/.

Hakai Institute Metadata Catalogue and Data Repository

The metadata catalogue contains standardized information about research outputs that ensure it is broadly discoverable and accessible. The Hakai Metadata Catalogue is not itself a database or data repository but rather an index for where data packages or other research outputs are stored. Therefore, in the Resources section of the Metadata Entry Tool you must provide a link to where your research product is stored.

Tabular text-based data packages should contain a minimum set of files (outlined below) to provide sufficient information to interpret and reuse data.

Tabular data should be hosted in the Hakai Institute GitHub Repository, rather than Google Drive. See Hosting Data and the section on data package contents below for more information.

Hakai data package content recommendations

  1. Describe the field, lab, and data processing protocols used to produce your data if not described elsewhere. This could be done in a Readme.txt, a .pdf, or a previous publication defining your methods.
  2. If your data don't conform to a specific standard linked to and described elsewhere, create a Data Dictionary (.txt or .csv file). This should describe each variable in every table of your data package. Include variable name, units, description.
  3. Assign a version to your data package folder name using a major and minor version eg. 'v2.1'. The semantic versioning of the data package should match the semantic versioning included in the metadata record in the Hakai Data Catalogue.
  4. If you are updating a data package, create a changelog.txt file to keep track of what changes have occurred since the last version. Follow this guide to keep a changelog.
  5. Provide all your tabular data as plain text files (.csv, .txt, .tsv).
  6. Include any code scripts that were used to clean up, filter data from the raw data, calculate values in the final data package, or example scripts to join data.
  7. Ensure previous versions of the data remain accessible and available for download to ensure reproducibility.

Optional

  • Include literature referenced, equipment manuals, anything relevant to your methods.
  • If your data package has numerous tables that fit together in a relational database structure, include a diagram such as an Entity Relationship Diagram to describe hierarchical relationships of tables.
  • A folder containing the standardized data that is published to an open-access, domain-specific repository.

Domain-specific Repositories/Databases/Knowledge Bases

In addition to storing data packages in the Hakai GitHub Data Repository, each Hakai affiliated project should determine whether (possibly a subset of) their data can (or should) be mobilized to a globally-integrated domain-specific repository. Publishing data to a domain-specific repository may require special formatting or structuring to ensure that data are interoperable with other datasets on the platform. The effort to transform your data can be worthwhile because it increases the reach of your science and results in more citations and recognition for your scientific work and is an important facet of the work Hakai does in the public interest.

Publishing Data to CIOOS

Datasets produced by Hakai that include Essential Ocean Variables (EOVs) should be published to the Canadian Integrated Ocean Observing System (CIOOS). The workflow for publishing a metadata record to the CIOOS Data Catalogue starts by publishing data to the Hakai Metadata Catalogue and hosting research outputs. Often, only a subset of the full data package can be mobilized to a domain-specific repository. Create a data package for the overall, complete, processed data, storing this in a Hakai institutional repository. The data package should include a separate folder that contains the standardized data file(s). These should then be published to the identified repository (ERDDAP or OBIS for eg.).

The Hakai Metadata Entry Tool will identify which metadata records should be published to the CIOOS Data Catalogue as well based on the presence of EOVs. However, there is the additional step of transforming data and including links to standardized data in the metadata record. CIOOS recommends data to be hosted on specific platforms requiring transformation and standardization to ensure they are interoperable. CIOOS recommends the Ocean Biodiversity Information System (OBIS) for biological species occurrences and ERDDAP for physical and biogeochemical oceanographic data. These repositories will require data to be standardized to a format and may require additional metadata and/or the creation of an additional metadata record in that repository. If that is the case, the Hakai Data Mobilization Team can help transform your data and ensure that the metadata record in the Hakai Metadata Catalogue includes a link to the domain-specific repository metadata record as well as a link to the Hakai institutional repository where the complete dataset is hosted.

To begin the process of transforming your dataset for a domain-specific repository, contact data@hakai.org for guidance.

Restricted data

When collaborating with Indigenous Partners, the Hakai Institute supports and applies the CARE and OCAP Principles to facilitate the right of data sovereignty and self-determination in how data and research outputs may be licensed or restricted. Restricted data might include e.g. data that is (partially) collected in collaboration with Indigenous Partners, on their ancestral lands, or include sensitive data e.g. information on endangered species. As such, data providers might choose to not make this data publicly accessible. When data contains restricted or sensitive data which should not be publicly accessible, this needs to be disclosed in a data management plan (DMP). Alternatively, this information can be captured in a research agreement or a memorandum of understanding (MoU) between Hakai Institute (affiliates) and involved parties. Metadata records in the Hakai Data Catalogue should link to the data package that only contains the publicly accessible data, and include the limitations to the data (interpretation) in the specific field of the Metadata Entry Tool (e.g. "endangered species occurrence data is omitted from the dataset").