GitHub

Hosting Data

The steps to create a GitHub repository are outlined below.

If you don't already have a GitHub account, sign up here (it is free!)
Email Ray Brunsting your GitHub your username to be added to the Hakai organizational GitHub repository
Navigate to the Hakai GitHub
Under the tab Repositories you can select 'New repository' in the top right corner
For a template, select 'HakaiInstitute/hakai-dataset-repository-template' (see Figure 1)
Give your repository a name and decide whether you want it to be publicly accessible or private.

Github Template Selection

Figure 1. Selecting the hakai-dataset-repository-template

This template populates your repository with a useful organizational structure and files that are strongly recommended to be included in your data package, such as a data dictionary, readme file, reference citation, methods section and resources. These should be updated prior to release.

How to add data files using the GitHub site

You may copy a repository to your local machine to make edits, commit and push changes, but if you are unfamiliar with that workflow it is possible to simply use the GitHub web interface to upload files, delete files and edit existing files (Figure 2). To delete files, click on the filename (left hand side of Figure 2) and you will be taken to another page that previews the file. You will see a button with three dots (...) if you click that you can choose to delete a file.

Github Data Repository Example

Figure 2. A GitHub repository will always include the 'Add files' button at the middle top of this image which allows you to upload files, and edit plain text files such as your data dictionary, readme file or changelog.

Archiving, Versioning and Digital Object Identifiers

Some academic journals may require data to be archived for long-term storage in a data system specifically meant for that purpose. Sometimes the Hakai Institute GitHub repository may not be viewed to meet that requirement. Luckily, there is a service available from Zenodo (https://zenodo.org/) to automatically archive versions of your repository and assign a digital object identifier when you create a GitHub release thereby satisfying requirements for long-term archiving.

To set up an automated integration to archive your data and assign a digital object identifier, follow the instructions at https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content

When you have finalized your data package and updated all the relevant documentation, you can release this version of your repository. A 'release' essentially takes a snapshot of your repository, and makes it easier for data users to navigate and access different versions of your data package. Releases can be found on the right hand side of your repository. Select 'create a new release'. On the next page you can give your release a title, and a description. Under 'Choose a tag', the default setting is that you will have to create a new tag when publishing your version. You must match the release tag with the version element in your metadata record.

Once you have released your new version, you can find and access the latest version release and older releases under the 'Releases' tab (see e.g. JSP Releases). If you've set up the Zenodo integration, you will see a new version and DOI for your dataset through your Zenodo profile (https://zenodo.org/me/uploads).

If a new release is made, you must update the metadata record with the new version number that matches the GitHub release tag using the Metadata Entry Tool.

When filling out a metadata record and including links to data in the Download and Resources section of the metadata form, please provide a link to the 'Releases' page (e.g. Hakai Juvenile Salmon Program Releases so that when you click the link users can navigate to their desired version and click the 'Source code (zip)' link to download its contents. Or, if the Zenodo integration was used, provide the DOI that represents all versions (see Zenodo versioning documentation for more details).