Skip to content

Data Services

OBIS

What is OBIS?

The Ocean Biodiversity Information System (OBIS) is an open-access data platform for a wide range of marine biodiversity data. OBIS was adopted as a project under the UNESCO IODE program in 2009.

Darwin Core Archive (DwC-A)

To publish data to OBIS and integrate it with other data collections, marine biodiversity data needs to be standardized to the Darwin Core (DwC) body of standards, which provides stable terminology1 and controlled vocabularies2 used for sharing data. The DwC data standard and the Ecological Metadata Language (EML) metadata standard used in OBIS comprise the Darwin Core Archive (DwC-A) file format3.

OBIS Data Format

While the initial focus for OBIS was on biological taxonomic occurrence data, the schema was extended to the OBIS-ENV-DATA format4. This format allows data providers to include biological and physical measurements or attributes with their occurrence data. The Darwin Core schema consists of a core data table with (an) extension(s) tables, logically arranged in a star schema and linked to each other through unique identifiers in a nested structure. Each of these tables include required fields, with specific fields needing to be formatted to international standards (i.e. the geographic coordinates, and date-time) (see Darwin Core Mapping).

See the OBIS Manual for additional information and examples related to data tables.

Data accessibility

There are several ways to download or use data from OBIS:

ERDDAP

What is ERDDAP?

ERDDAP is a data server that provides a simple, consistent way to download subsets of scientific datasets in common file formats. Originally created by Environmental Research Division (ERD), ERDDAP is now globally used across the CIOOS, IOOS and more global communities to share data in a standardized way.

Capabilities

  • Data Access: ERDDAP provides a variety of data access methods including via a web browser, OPeNDAP, SOS, WMS, WCS, HTTP, and more.
  • Data Formats: ERDDAP can convert data to various formats such as .csv, .json, .nc, .xls, .mat, .dods, and others (more info here).
  • Data Subsetting: ERDDAP allows users to request a subset of a dataset. It converts the subset to the desired file format available for download.
  • ERDDAP API: All the information, data and figures made available via ERDDAP is also available via an API. See table datasets API docs here, and gridded datasets API documentation here.

Hakai's ERDDAPs

Hakai Institute uses an ERDDAP server to provide access to their oceanographic data. Two ERDDAP instances are maintained by Hakai:

ERDDAP Tabledap Datasets API

You can programmatically retrieve data from an ERDDAP dataset. You first need to define the data query you'd like to retrieve. For this:

  • (easy) Use the Data Access Form or Subset form. They make the URL for you.
  • (easy) Use the Make A Graph form. It makes the URL for you.
  • (not hard) generate the URL by hand or with a computer program or script.

For a more complete explanation see ERDDAP documentation here.

As an example here's how to programmatically retrieve an ERDDAP dataset query via the csv format:

import pandas as pd

# Define the URL of the ERDDAP dataset.
url = "https://catalogue.hakai.org/erddap/tabledap/{dataset}.csv?..."

# Load the data into a pandas DataFrame. Skip second row which gives each variables units
df = pd.read_csv(url, skiprows=[1])

# Convert time variable to a pandas datetime object.
df['time'] = pd.to_datetime(df['time'])

# Show the first few lines
df.head()
# Define the URL of the ERDDAP dataset 
url <- "<https://catalogue.hakai.org/erddap/tabledap/{dataset}.csv>?..."

# Load the data into a data frame
data <- read.csv(full_url)

# Print the data frame
print(data)
url = "<https://catalogue.hakai.org/erddap/tabledap/{dataset}.csv>?..."

options = weboptions;
options.Timeout = 120;

data = webread(url,options);

% Convert time to datetime object
data.time_UTC_ = datetime(data.time_UTC_,"Format","yyyy-MM-dd'T'HH:mm:ssZ",timezone="UTC");

BOLD

NCBI

NCEI


  1. Darwin Core Quick Reference Guide 

  2. NERC Vocabulary Server 

  3. Darwin Core: An Evolving Community-Developed Biodiversity Data Standard
    Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, et al. (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLOS ONE 7(1): e29715. https://doi.org/10.1371/journal.pone.0029715 

  4. Toward a new data standard for combined marine biological and environmental datasets - expanding OBIS beyond species occurrences
    De Pooter D, et al. (2017) Biodiversity Data Journal 5: e10989. https://doi.org/10.3897/BDJ.5.e10989 

  5. Contact data@hakai.org for access if needed.