Note
Go to the end to download the full example code.
Linking to External Resources (HERD)
The HERD (HDMF External Resources Data Structure) class lets you map
terms used in your data to entities defined in external, web-accessible resources such as
ontologies. For example, you may store a species name "Mus musculus" on a
Subject and want to link it to the corresponding NCBI Taxonomy term so that
the value is standardized and easy to query.
From a user’s perspective, a HERD can be treated as a single table that associates a key (a term
used on an object, i.e. a dataset or attribute in the file) with an entity (a term in an
external resource, identified by a compact URI and a full URI). Internally, HERD stores this in six
interlinked tables (keys, files, entities, entity_keys, objects, and
object_keys) and provides convenience methods so you rarely need to interact with those tables
directly.
This tutorial shows how to create a HERD, annotate objects in an NWB file, store the HERD in the
file, and inspect the annotations after reading the file back. For the full HERD API (including
add_ref_termset for validating terms against a TermSet, get_key,
and compound-data references), see the
HDMF HERD tutorial.
Create an NWB file
Start with an NWBFile that has a Subject. The
subject’s species is the value we will annotate with an external resource.
Create a HERD and attach it to the file
Create a HERD and assign it to the external_resources field of the
NWBFile.
Add references with add_ref
Use add_ref to add a row that links a key on an object to an
external entity. Here we link the subject’s species to the NCBI Taxonomy entry for Mus musculus.
Because the subject is already part of the file, the file argument is resolved automatically
from the parent hierarchy and can be omitted.
nwbfile.external_resources.add_ref(
container=nwbfile.subject,
key=nwbfile.subject.species,
entity_id="NCBITAXON:10090",
entity_uri="https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10090",
)
References can also point to an attribute of an object, such as a column of a table. Here we record
the brain region of a set of electrodes in the electrodes table and link the region to the
corresponding structure in the
Allen Mouse Brain Common Coordinate Framework (CCFv3). When the
target is a column, pass the table as the container and the column name as the attribute;
HERD resolves the reference to the column object itself, which is the closest object that has a
data type.
device = nwbfile.create_device(name="probe")
electrode_group = nwbfile.create_electrode_group(
name="shank0",
description="a shank of the recording probe",
location="VISp",
device=device,
)
for _ in range(4):
nwbfile.add_electrode(location="VISp", group=electrode_group)
nwbfile.external_resources.add_ref(
container=nwbfile.electrodes,
attribute="location",
key="VISp",
entity_id="385",
entity_uri="https://api.brain-map.org/api/v2/data/Structure/385.json",
)
Inspect the HERD
to_dataframe flattens the interlinked tables into a single
DataFrame, with one row per (object, key, entity) association.
You can also view the individual tables. Each is a
DynamicTable and has its own to_dataframe method.
get_object_type returns all annotations for objects of a
given type, for example every annotated Subject.
nwbfile.external_resources.get_object_type(object_type="Subject")
Write and read the NWB file
Writing the file stores the HERD inside it. Reading the file back makes the HERD available again
through the external_resources field.
filename = "external_resources_tutorial.nwb"
with NWBHDF5IO(filename, mode="w") as io:
io.write(nwbfile)
read_io = NWBHDF5IO(filename, mode="r")
read_nwbfile = read_io.read()
read_herd = read_nwbfile.external_resources
Access the loaded data
In a Jupyter notebook, the default display of a read HERD shows collapsible sections that can
appear empty. To see the annotations, use the same accessors as above:
to_dataframe for the flattened view, or the individual
tables for a focused view.
View the individual tables:
get_object_entities returns the entities annotated on a
single object as a DataFrame. On a HERD read back from a file this accessor
currently requires the fix for
hdmf #1496, which will be resolved soon, so it is
shown here commented out:
# read_herd.get_object_entities(container=read_nwbfile.subject)
Close the file once you are done reading from it.
Alternative: store a HERD outside an NWB file
A HERD can also be saved independently of an NWB file as a zip archive of the underlying tables
using to_zip, and read back with
from_zip. This is useful when external resources span
multiple files; see Annotating Multiple Streamed NWB Files with a Single HERD for an example that annotates many NWB
files with a single HERD. For the full HERD API, see the
HDMF HERD tutorial.