Dataset Versioning

When metadata records are submitted for publication to the Hakai Data Catalogue they should include a version number that includes a major and minor version, ie. v2.1, where 2 is the major version and .1 is the minor version. Versioning is specific to the data product, not the metadata. Minor changes to the metadata do not require an update on the version. When updating the version number, please contact the Data Mobilization Team so the information can be updated in DataCite as well. Please ensure data is properly quality controlled prior to publishing, otherwise you end up having to often release a new version and update the metadata record.

For every data product, data providers need to determine what constitutes a major vs. minor version increment in a data management plan. As a suggested practice, something that affects the whole dataset or product, such as reprocessing or a change in author list, would constitute a major version while annual additions of data (i.e. time series datasets) or minor error updates constitute minor versions. When versions are incremented, it is critical that previous versions of the dataset remain available.

If your data record links users to data that is only available upon request, it is still good practice to include a version element. To avoid confusion, ensure that the version number in your metadata record matches the most up to date version number associated with your data product. Versioning should also be included for derived, one-off or synthesis/aggregated datasets (i.e. when no incremental changes are expected).

For changes in major and minor versions, it is important to include a log of what changes were made using something like a Changelog.txt file, where other options are not available. This Changelog file should be stored with your data product. When citing a dataset, ensure you are aware of what version of the dataset you have and include the version number in a citation as well as the access date (optional). Example citation: “Pontier, O., & Hessing-Lewis, M. (2019). Rocky subtidal fish and invertebrate swath data from BC Central Coast (Version 1.2) [Data set]. Hakai Institute.” 

When the data product is hosted on GitHub (see 08 - Hosting Data on GitHub) the version associated with the data record in the Hakai Data Catalogue should match the version of the GitHub Release. See section 08 for details on how to ‘release’ a version on GitHub, and how to provide the appropriate URL for a seamless download in the Hakai Data Catalogue. For data products not stored on the Hakai GitHub repository, ensure that your data product has an associated Changelog that indicates what the most recent version is, in addition to a folder containing archived versions. 

For larger, continuous data records such as the CTD Research Grade Data that are published to ERDDAP, we want to ensure that we retain old versions of the datasets for reproducibility purposes. The suggested practice is to export .csv files annually into a data package and host this on a GitHub repository. A release of a data package would reflect annual increments of the data (i.e. one release for the 2015 - 2021 data, another release for 2015 - 2022 data, etc.). These releases would all be included in the same GitHub repository and added as downloadable resources (.zip file) to a single metadata record. The DOI associated with the data record would remain the same, but the version element of the data record in the Hakai Catalogue would reflect the latest version, with older versions archived and downloadable as unique releases via the Download and Resources section of the metadata record. These older versions should contain a readme file with the recommended citation specific to that version. The recommended citation in the Hakai Data Catalogue can be updated and should always reflect the recommended citation for the latest version.

Currently, no fields exist in the Metadata Entry tool to link data records or DOIs to existing records in the Hakai Data Catalogue or other external catalogues. Until this feature is implemented, please include any related data products that you want to reference in the abstract, along with the relation type (i.e. “subset of”, “is newer version of”, etc). A list of the most commonly used relations can be found here.