Skip to content

Versioning

It is generally good practice to assign a version to your datasets or data products.

We recommend using either:

Metadata Form

When metadata records are submitted for publication to the Hakai Catalogue they should include a version number. Versioning is specific to the data product, not the metadata. Minor changes to the metadata do not require an update on the version.

The recommended citation in the Hakai Catalogue can be updated and should always reflect the recommended citation for the latest version.

Major vs. Minor version increments?

As a suggested practice, to handle version increments we define the version number components as MAJOR.MINOR.PATCH.

Major Increments

Something that affects the whole dataset or product, such as:

  • Reprocessing of data
  • Updates to sampling methodology

Note

We recommand generating a new metadata record and DOI.

Minor Increments

Minor updates includes:

  • Minor error updates
  • Documentation updates
  • Timeseries update (suggested for not real-time datasets)

Warning

When versions are incremented, it is critical that previous versions of the dataset remain available. GitHub can easily make the different versions available.

Non public datasets

If your data record links users to data that is only available upon request, it is still good practice to include a version element. To avoid confusion, ensure that the version number in your metadata record matches the most up to date version number found in your data product.

Versioning should also be included when no incremental changes are expected.

CHANGELOG.txt

For changes in major and minor versions, it is important to include a log of what changes were made using something like a Changelog.txt file. This Changelog file should be stored within your data package.

When the data product is hosted on GitHub (Hosting Data) the version associated with the data record in the Hakai Catalogue should match the version of the GitHub Release. For data products not stored on the Hakai GitHub repository, ensure that your data product has an associated Changelog that indicates what the most recent version is, in addition to a folder containing archived versions.

How to make a Release

GitHub simplifies the ability to generate a new release of your dataset. Once you're happy with the latest version of your repository main branch, create a new release by clicking on the Create a new release link within the release section of the repository main page, or go the url:

https://github.com/HakaiInstitute/YOUR_REPOSITORY_NAME/releases/new

Fill the form, add an appropriate new version tag and click Publish release.

This will package your latest repository version into a zipped package and add it to all the releases made available on your repository releases page:

https://github.com/HakaiInstitute/YOUR_REPOSITORY_NAME/releases

Large Datasets

For larger, continuous data records such as the CTD Research Grade Data that are published to ERDDAP, we want to ensure that we retain old versions of the datasets for reproducibility purposes. The suggested practice is to export the data to an open standard format. Among these, the NetCDF CF 1.6 is most recommended format. Please contact data@hakai.org for more details.

Once the data package is completed, we recommend submitting this dataset to the general repository ZENODO1 where a DOI can be generated if needed. This dataset submission can then be referred by the main associated dataset record on the Hakai Catalogue.


  1. ZENODO limits the size of a dataset to 50Gb.