The graph data structure of Mastodon.

The mastodon-graph Java package can be used to implement directed graphs with small memory footprint. Vertices and edges are not stored as individual objects. Instead vertex and edge data is laid out in a primitive byte[] array and accessed via proxy objects. This page describes some internals of the trackmate-graph package.

Memory layout.

Each vertex and each edge maps to a contiguous portion of a byte[] array. The size of a portion (elements) is fixed for a particular AbstractVertex or AbstractEdge subclass. Each element starts with a fixed part that represents the graph structure and then additional payload used by the subclass to describe some vertex attributes, etc.

There is one byte[] array that stores all vertices, and one byte[] array that stores all edges. References between these arrays are in the form of element indices. Whether these are indices refer to elements in the vertex or in the edge memory array is clear from the context.

Vertex layout.

The following diagram illustrates the layout of vertices in a byte[] array:

In the left-most column, element indices are shown (in blue), followed by byte indices (in grey). In the example, each element of the AbstractVertex subclass Spot requires 57 bytes to store. The data for the i th Spot starts at byte i * 57. The fixed AbstractVertex part comprises element indices FIRST_IN_EDGE and FIRST_OUT_EDGE, occupying 4 bytes each. The remaining 49 bytes are Spot attributes.

FIRST_IN_EDGE is the element index (in the edge memory array) of the first incoming edge, i.e., an edge pointing to this vertex. The remaining incoming edges of the same vertex are stored as a linked list in the edge memory as described below. If this vertex does not have any incoming edges FIRST_IN_EDGE is -1.

Similarly, FIRST_OUT_EDGE is the element index of the first outgoing edge, i.e., an edge starting from this vertex. The remaining outgoing edges of the same vertex are stored as a linked list in the edge memory as described below. If this vertex does not have any outgoing edges FIRST_OUT_EDGE is -1.

Edge layout.

The following diagram illustrates the layout of edges in a byte[] array:

In the left-most column, element indices are shown (in red), followed by byte indices (in grey). In the example, each element of the AbstractEdge subclass Edge requires 24 bytes to store. The data for the i th Edge starts at byte i * 24. The fixed AbstractEdge part comprises element indices SOURCE, TARGET, NEXT_SOURCE_EDGE, and NEXT_TARGET_EDGE, occupying 4 bytes each. The remaining 8 bytes are Edge attributes.

  • SOURCE is the element index (in the vertex memory array) of the vertex from which this edge starts.

  • TARGET is the element index (in the vertex memory array) of the vertex to which this edge points.

  • NEXT_SOURCE_EDGE is the element index (in the edge memory array) of the next outgoing edge of the source vertex, , the next edge that has the same SOURCE. If there is no such edge then NEXT_SOURCE_EDGE is -1.

  • NEXT_TARGET_EDGE is the element index (in the edge memory array) of the next incoming edge of the target vertex, , the next edge that has the same TARGET. If there is no such edge then NEXT_TARGET_EDGE is -1.

Example

Consider the following example graph comprising vertices A, B, C, D, E and edges a, b, c, d.

This is laid out in memory as follows

The links between vertex B and it’s adjacent edges a, b, c have been highlighted. Let’s look at that in more detail: Vertex B is stored at element index 1 in the vertex memory array. B has one incoming edge a. The edge a is stored at element index 0 in the edge memory array. Therefore the FIRST_IN_EDGE field of B is 0. Apart from a, the vertex B has no further incoming edges. Therefore, the NEXT_TARGET_EDGE field of a is -1, i.e., the list of edges entering B terminates here.

B has two outgoing edges b, c. The edge b is stored at element index 1 in the edge memory array. Therefore the FIRST_OUT_EDGE field of B is 1. The next outgoing edge of B is c which is stored at element index 2. Therefore, the NEXT_SOURCE_EDGE field of b is 2. After c, the vertex B has no further outgoing edges. Therefore, the NEXT_SOURCE_EDGE field of c is -1, i.e., the list of edges leaving B terminates here.

Below is the same memory layout again, this time highlighting the references from edges a, b, c back to the vertex memory array.

For example, edge c is leaving vertex B (index 1) and entering vertex D (index 3). Therefore the SOURCE field of c is 1, and the TARGET field of c is 3.

Free-list of unallocated elements.

The vertex and edge memory arrays can only ever grow. When elements are released, they are simply marked as free for re-use. Assume that in the above example vertex D is deleted, as well as it’s adjacent edges c, d, leaving this:

After removing c, d, D the memory layout looks like this:

The element 3 in the vertex memory array as well as elements 2 and 3 in the edge memory array have been marked as free. This is done by putting the magic number “-2” into the first 4 bytes of the element. In vertices and edges the first 4 bytes are always occupied by a (positive) index or a −1 index list terminator. Therefore, occupied and free blocks can not be confused. The next 4 bytes of a freed element are the index of the next freed element in the (same) memory array, or -1 if there is no next freed element. Each memory array remembers the index free of the first freed element.

Newly freed elements are enqueued at free, that is, at the head of the free-list. So in the above example, edge element 2 was freed first, followed by edge element 3.

The next edge element will be allocated at head of the free-list and the free index move to the next element. For example, assume that a new edge is created from B to E:

If free is -1, then no more freed elements are available and the underlying memory array has to grow to fit newly added elements.