Basics of the N5 API for Java developers. This tutorial shows how to read and write array data and metadata using N5.
hdf5
n5
zarr
imglib2
tutorial
Authors

John Bogovic

Caleb Hulbert

Published

February 27, 2024

Modified

March 7, 2024

This tutorial for Java developers covers the most basic functionality of the N5 API for storing large, chunked n-dimensional arrays and arbitrary metadata. The N5 API and documentation typically refer to arrays as “datasets”, terminology inherited from HDF5 that we will use in this tutorial. We will learn about:

Readers and writers

N5Readers and N5Writers form the basis of the N5 API and allow you to read and write data, respectively. We generally recommend using an N5Factory to create readers and writers:

Code
// N5Factory can make N5Readers and N5Writers
var factory = new N5Factory();

// trying to open a reader for a container that does not yet exist will throw an error 
// var n5Reader = factory.openReader("my-container.n5");

// creating a writer creates a container at the given location
// if it does not already exist
var n5Writer = factory.openWriter("my-container.n5");

// now we can make a reader
var n5Reader = factory.openReader("my-container.n5");

// test if the container exists
n5Reader.exists(""); // true

// "" and "/" both refer to the root of the container
n5Reader.exists("/"); // true

The N5 API gives you access to a number of different storage formats: HDF5, Zarr, and N5’s own format. N5Factory’s convenience methods try to infer the storage format from the extension of the path you give it:

Code
factory.openWriter("my-container.h5").getClass();   // HDF5 Format N5Writer
factory.openWriter("my-container.n5").getClass();   // N5 Format   N5Writer
factory.openWriter("my-container.zarr").getClass(); // Zarr Format N5Writer

In fact, it is possible to read with N5Writers since every N5Writer is also an N5Reader, so from now on we’ll just be using the n5Writer.

Try it!

We use the the N5 storage format for the rest of the tutorial, but it will work just as well over either an HDF5 file or Zarr container.

Groups

N5 containers form hierarchies of groups - think “nested folders on your file system.” It’s easy to create groups and test if they exist:

Code
n5Writer.createGroup("foo");
n5Writer.createGroup("foo/bar");
n5Writer.createGroup("lorum/ipsum/dolor/sit/amet");

n5Writer.exists("lorum/ipsum");      // true
n5Writer.exists("not/a/real/group"); // false

The list method lists groups that are children of the given group:

Code
n5Writer.list("");     // [lorum, foo]
n5Writer.list("foo");  // [bar]

and deepList recursively lists every descendent of the given group:

Code
Arrays.toString(n5Writer.deepList(""));
[lorum, lorum/ipsum, lorum/ipsum/dolor, lorum/ipsum/dolor/sit, lorum/ipsum/dolor/sit/amet, foo, foo/bar]

Notice that these methods only give information about what groups are present and do not provide information about metadata or datasets.

Note

Some storage / access systems (AWS-S3) separate permissions for reading and listing, meaning it may be possible to access data but not list.

Datasets

N5 stores datasets (n-dimensional arrays) in particular groups in the hierarchy.

Warning

Datasets must be terminal (leaf) nodes in the container hierarchy - i.e. a dataset can not contain another group or dataset. (Is this strictly true? May be confusing with names like multiscale “datasets”)

We recommend using code from n5-ij or n5-imglib2 to write datasets. The examples in this post will use the latter.

The N5Utils class in n5-imglib2 has many useful methods, but in this post, we’ll cover simple methods for reading and writing. First, N5Utils.save writes a dataset and required metadata to the container at a group that you specify. The group will be created if it does not already exist. The parameters will be discussed in more detail below.

Code
// the parameters
var img = demoImage(64,64); // the image to write- size 64 x 64
var groupPath = "data"; 
var blockSize = new int[]{32,32};
var compression = new GzipCompression();

// save the image
N5Utils.save(img, n5Writer, groupPath, blockSize, compression);

You can write in parallel by providing an ExecutorService to this variant of N5Utils.save

Code
var exec = Executors.newFixedThreadPool(4); // with 4 parallel threads
N5Utils.save(img, n5Writer, groupPath, blockSize, compression, exec);

Reading the dataset from the container is also easy with N5Utils.open :

Code
var loadedImg = N5Utils.open(n5Writer, groupPath);
Util.getTypeFromInterval(loadedImg).getClass();      // FloatType
Arrays.toString(loadedImg.dimensionsAsLongArray());  // [64, 64]
Overwriting data is possible

This save method DOES NOT perform any checks prior to writing data and will overwrite data that exists in the specified location. Be sure to check and take appropriate action if it is possible that data could already be at a particular location and container to avoid data loss or corruption.

This example shows that data can be over written:

Code
// overwrite our previous data
var img = ArrayImgs.unsignedBytes(2,2);
N5Utils.save(img, n5Writer, groupPath, blockSize, compression);

// load the new data, the old data are no longer accessible
var loadedImg = N5Utils.open(n5Writer, groupPath);
Arrays.toString(loadedImg.dimensionsAsLongArray());  // [2, 2]

Parameter details

groupPath

is the location inside the container that will store the dataset. You can store an dataset at the root of a container by specifying "" or "/" as the groupPath. In this case, the container will only be able to store one dataset (see the warning above).

blockSize

is a very important parameter. HDF5, N5, and Zarr all break up the datasets they store into equally sized blocks or “chunks”. The block size parameter specifies the size of these blocks.

For the example above, we stored an image of size 64 x 64 using blocks sized 32 x 32. As a result, N5 uses four blocks to store the entire image:

Code
printBlocks("my-container.n5/data");
my-container.n5/data/1/1 is 1762 bytes
my-container.n5/data/1/0 is 2012 bytes
my-container.n5/data/0/1 is 1763 bytes
my-container.n5/data/0/0 is 2020 bytes

Quiz: How many blocks would there be if the block size was 64 x 8?

Click here to show the answer.

There would be eight blocks.

One block covers the first dimension, but it takes 8 blocks to cover the second dimension (\(8 \times 8 = 64\)). Also demonstrated by the code below:

Code
// remove the old data
n5Writer.remove(groupPath);

// rewrite with a different block size
var blockSize = new int[]{64,8};
N5Utils.save(img, n5Writer, groupPath, blockSize, compression);

// how many blocks are there?
printBlocks("my-container.n5/data");
my-container.n5/data/0/1 is 837 bytes
my-container.n5/data/0/7 is 847 bytes
my-container.n5/data/0/3 is 839 bytes
my-container.n5/data/0/6 is 844 bytes
my-container.n5/data/0/0 is 968 bytes
my-container.n5/data/0/4 is 846 bytes
my-container.n5/data/0/2 is 840 bytes
my-container.n5/data/0/5 is 847 bytes
Try it!

N5 lets you store your image in a single file if you want - just provide a block size that is equal to or larger than the image size.

compression

Each block is compressed independently, using the specified compression. Use RawCompression to store blocks without compression.

Code
// rewrite without compression
var groupPath = "dataNoCompression"; 
var blockSize = new int[]{32,32};
var compression = new RawCompression();
N5Utils.save(img, n5Writer, groupPath, blockSize, compression);

// what size are the blocks?
Code
printBlocks("my-container.n5/dataNoCompression");
my-container.n5/dataNoCompression/1/1 is 4108 bytes
my-container.n5/dataNoCompression/1/0 is 4108 bytes
my-container.n5/dataNoCompression/0/1 is 4108 bytes
my-container.n5/dataNoCompression/0/0 is 4108 bytes

Notice that blocks were previously ~1700-2000 bytes and are now ~4100 without compression.

The available compression options at the time of this writing are:

Metadata

N5 can also store rich structured metadata in addition to array data. This tutorial will discuss basic, low-level metadata operations. Advanced operations and metadata standards may be described in a future tutorial.

Basics

N5Writers have a setAttribute method for writing metadata to the storage backend. It takes three arguments:

<T> void setAttribute(String groupPath, String attributePath, T attribute)
  • groupPath : the group in which to store this metadata
  • attributePath : the name of this attribute
  • attribute : the metadata attribute to be stored. Can be an arbitrary type (denoted T).
Note

There are differences between an attribute “name” and an attribute “path”, but attribute “paths” are an advanced topic and will be covered elsewhere.

Similarly, N5Readers have a getAttribute method:

<T> T getAttribute(String groupPath, String attributePath, Class<T> clazz)

The last argument (Class<T>) lets you specify the type that getAttribute should return. An N5Exception will be thrown if the requested type can not be created from the requested attribute. If an attribute does not exist, null will be returned (see the last example of this section). Consider these examples:

Code
// create a group inside the container (think: "folder")
var groupName = "put-data-in-me";
n5Writer.createGroup(groupName);

// attributes have names and values
// make an attribute called "date" with a String value
var attributeName = "date";
n5Writer.setAttribute(groupName, attributeName, "2024-Jan-01");

// Ask the N5 API to make a double array from the data attribute
// it will try and fail, so an exception will be thrown
try {
    var nothing = n5Writer.getAttribute(groupName, attributeName, double[].class);
} catch( N5Exception e ) {
    System.out.println("Error: could not get attribute as double[]");
}

// get the value of the "date" attribute as a String
String date = n5Writer.getAttribute(groupName, attributeName, String.class);
date
Error: could not get attribute as double[]
2024-Jan-01

Sometimes it is possible to interpret an attribute as multiple different types:

Code
n5Writer.setAttribute(groupName, "a", 42);
var num = n5Writer.getAttribute(groupName, "a", double.class); // 42.0
var str = n5Writer.getAttribute(groupName, "a", String.class); // "42"

Rich metadata

It possible to save attributes of arbitrary types, enabling you to struture your metadata into classes that are easy to save and load directly. For example, if we define a metadata class FunWithMetadata:

Code
class FunWithMetadata {
    String name;
    int number;
    double[] data;
    
    public FunWithMetadata(String name, int number, double[] data) {
        this.name = name;
        this.number = number;
        this.data = data;
    }
    public String toString(){
        return String.format( "FunWithMetadata{%s(%d): %s}", 
            name, number, Arrays.toString(data));
    }
};

then make an instance and save it:

Code
var metadata = new FunWithMetadata("Dorothy", 2, new double[]{2.72, 3.14});
n5Writer.setAttribute(groupName, "metadata", metadata);

// get attribute as an instance of FunWithMetdata
n5Writer.getAttribute(groupName, "metadata",  FunWithMetadata.class);
FunWithMetadata{Dorothy(2): [2.72, 3.14]}

To retrieve all the metadata in a group as JSON:

Code
// get attribute as an instance of JsonElement
n5Writer.getAttribute(groupName, "/", JsonElement.class);
{"date":"2024-Jan-01","a":42,"metadata":{"name":"Dorothy","number":2,"data":[2.72,3.14]}}

Removing metadata

You can remove attributes by their name as well. To return the element that was removed, just provide the class for that element (this mirrors the remove method for Lists in Java.

Code
// set attributes
n5Writer.setAttribute(groupName, "sender", "Alice");
n5Writer.setAttribute(groupName, "receiver", "Bob");

// notice that they're set
n5Writer.getAttribute(groupName, "sender", String.class);   // Alice
n5Writer.getAttribute(groupName, "receiver", String.class); // Bob

// remove "sender"
n5Writer.removeAttribute(groupName, "sender");

// remove "receiver" and store result in a variable
var receiver = n5Writer.removeAttribute(groupName, "receiver", String.class); // Bob

n5Writer.getAttribute(groupName, "sender", String.class);   // null
n5Writer.getAttribute(groupName, "receiver", String.class); // null

Working with Dataset Metadata

Metadata used to describe datasets can be get and set the same as all other metadata. However there are special DatasetAttributes methods to safely work with dataset metadata. N5Reader.getDatasetAttributes and N5Writer.setDatasetAttributes ensure the metadata is always a valid representation of dataset metadata. Setting DatasetAttributes however should only be done when the dataset is initially saved. This ensure the required metadata is tightly coupled with the data. For example, setting dataset metadata should be done through the N5Writer.createDataset methods (or indirectly through the N5Utils.save methods mentioned above)

Code
var arrayMetadata = n5Writer.getDatasetAttributes("data");
arrayMetadata.getDimensions();
arrayMetadata.getBlockSize();
arrayMetadata.getDataType();
arrayMetadata.getCompression();
Warning

The attributes that N5 uses to read datasets can be set with setAttribute, and modifying them could corrupt your data. Do not manually set these attributes unless you absolutely know what you’re doing!

  • dimensions
  • blockSize
  • dataType
  • compression

The attributes that describe datasets are also accessible using getAttribute, try running:

n5Writer.getAttribute("data", "dimensions", long[].class);

though using getDatasetAttributes().getDimensions() are generally recommended.

What to try next