Memory Formats

In oneDNN memory format is how a multidimensional tensor is stored in 1-dimensional linear memory address space. oneDNN specifies two kinds of memory formats: plain which correspond to traditional multidimensional arrays, and optimized which are completely opaque.

Plain Memory Formats

Plain memory formats describe how multidimensional tensors are laid out in memory using an array of \(\operatorname{dimensions}\) and an array of \(\operatorname{strides}\) both of which have length equal to the rank of the tensor. In oneDNN the order of dimensions is fixed and different dimensions can have certain canonical interpretation depending on the primitive. For example, for CNN primitives the order for activation tensors is \(\{N, C, ..., D, H, W\}\), where \(N\) stands for minibatch, \(C\) stands for channels, and \(D\), \(H\), and \(W\) stand for image spatial dimensions: depth, height and width respectively. Spatial dimensions may be omitted in the order from outermost to innermost; for example, it is not possible to omit \(H\) when \(D\) is present and it is never possible to omit \(W\). Canonical interpretation is documented for each primitive. However, this means that the \(\operatorname{strides}\) array plays an important role defining the order in which different dimension are laid out in memory. Moreover, the \(\operatorname{strides}\) need to agree with \(\operatorname{dimensions}\).

More precisely, let \(T\) be a tensor of rank \(n\) and let \(\sigma\) be the permutation of the \(\operatorname{strides}\) array that sorts it, i.e. \(\operatorname{strides}[i] \geq \operatorname{strides}[j]\) if \(\sigma(i) < \sigma(j)\) for all \(0 \leq i, j < n\). Then the following must hold:

\[\operatorname{stides}[i] \geq \operatorname{strides}[j] * \operatorname{dimensions}[j] \text{ if } \sigma(i) < \sigma(j) \text{ for all } 0 \leq i, j < n.\]

For an element with coordinates \((i_0, \ldots, i_{n-1})\) such that \(0 \leq i_j < \operatorname{dimensions}[j]\) for \(0 \leq j < n\), its offset in memory is computed as:

\[\operatorname{offset}(i_0, \ldots, i_{n-1}) = \operatorname{offset_0} + \sum_{j=0}^{n-1} i_j * \operatorname{strides}[j].\]

Here \(\operatorname{offset_0}\) is the offset from the parent memory and is non-zero only for submemory memory descriptors created using dnnl::memory::desc::submemory_desc(). Submemory memory descriptors inherit strides from the parent memory descriptor. Their main purpose is to express in-place concat operations.

As an example, consider an \(M \times N\) matrix \(A\) (\(M\) rows times \(N\) columns). Regardless of whether \(A\) is stored transposed or not, \(\operatorname{dimensions}_A = \{M, N\}\). However, \(\operatorname{strides}_A = \{LDA, 1\}\) if it is not transposed and \(\operatorname{strides}_A = \{1, LDA\}\) if it is, where \(LDA\) is such that \(LDA \geq N\) if \(A\) is not transposed, and \(LDA \geq M\) if it is. This also shows that \(A\) does not have to be stored densly in memory.

Note

The example above shows that oneDNN assumes data to be stored in row-major order.

Code example:

int M, N;
dnnl::memory::dims dims {M, N}; // Dimensions always stay the same

// Non-transposed matrix
dnnl::memory::dims strides_non_transposed {N, 1};
dnnl::memory::desc A_non_transposed {dims, dnnl::memory::data_type::f32,
        strides_non_transposed};

// Transposed matrix
dnnl::memory::dims strides_transposed {1, M};
dnnl::memory::desc A_transposed {dims, dnnl::memory::data_type::f32,
        strides_transposed};

Format Tags

In addition to strides, oneDNN provides named format tags via the dnnl::memory::format_tag enum type. The enumerators of this type can be used instead of strides for dense plain layouts.

The format tag names for \(N\)-dimensional memory formats use first \(N\) letters of the English alphabet which can be arbitrarily permuted. This permutation is used to compute strides for tensors with up to 6 dimensions. The resulting strides specify dense storage, in other words, using the nomenclature from the previous section, the following equality holds:

\[\operatorname{stides}[i] = \operatorname{strides}[j] * \operatorname{dimensions}[j] \text{ if } \sigma(i) + 1 = \sigma(j) \text{ for all } 0 \leq i, j < n - 1.\]

In the matrix example, we could have used format_tag::ab for the non-transposed matrix above, and format_tag::ba for the transposed:

int M, N;
dnnl::memory::dims dims {M, N}; // Dimensions always stay the same

// Non-transposed matrix
dnnl::memory::desc A_non_transposed {dims, dnnl::memory::data_type::f32,
        dnnl::memory::format_tag::ab};

// Transposed matrix
dnnl::memory::desc A_transposed {dims, dnnl::memory::data_type::f32,
        dnnl::memory::format_tag::ba};

In addition to abstract format tag names, oneDNN also provides convenience aliases. Some examples for CNNs and RNNs:

Optimized Format ‘any’

Another kind of format that oneDNN supports is an opaque _optimized_ memory format that cannot be created directly from \(\operatorname{strides}\) and \(\operatorname{dimensions}\) arrays. A memory descriptor for an optimized memory format can only be created by passing format_tag::any when creating certain operation descriptors, using them to create corresponding primitive descriptors and then querying them for memory descriptors. Data in plain memory format should then be reordered into the data in optimized data format before computations. Since reorders are expensive, the optimized memory format needs to be _propagated_ through computations graph.

Optimized formats can employ padding, blocking and other data transformations to keep data in layout optimal for a certain architecture. This means that it in general operations like dnnl::memory::desc::permute_axes() or dnnl::memory::desc::submemory_desc() may fail. It is in general incorrect to use product of dimension sizes to calculate amount of memory required to store data: dnnl::memory::desc::get_size() must be used instead.

Memory Format Propagation

Memory format propagation is one of the central notions that needs to be well-understood to use oneDNN correctly.

Convolution and inner product primitives choose the memory format when you create them with the placeholder memory format format_tag::any for input or output. The memory format chosen is based on different circumstances such as hardware and convolution parameters. Using the placeholder memory format is the recommended practice for convolutions, since they are the most compute-intensive operations in most topologies where they are present.

Other primitives, such as Elementwise, LRN, batch normalization and other, on forward propagation should use the same memory format as the preceding layer thus propagating the memory format through multiple oneDNN primitives. This avoids unnecessary reorders which may be expensive and should be avoided unless a compute-intensive primitive requires a different format. For performance reasons, backward computations of such primitives requires consistent memory format with the corresponding forward computations. Hence, when initializing there primitives for backward computations you should use dnnl::memory::format_tag::any memory format tag as well.

Below is the short summary when to use and not to use memory format format_tag::any during operation description initialization:

Primitive Kinds

Forward Propagation

Backward Propagation

No Propagation

Compute intensive: (De-)convolution, Inner product, RNN

Use format_tag::any

Use format_tag::any

N/A

Memory-bandwidth limited: Pooling, Layer and Batch Normalization, Local Response Normalization, Elementwise, Shuffle, Softmax

Use memory format from preceding layer for source tensors, and format_tag::any for destination tensors

Use format_tag::any for gradient tensors, and actual memory formats for data tensors

N/A

Memory-bandwidth limited: Reorder, Concat, Sum, Binary

N/A

N/A

Use memory format from preceding layer for source tensors, and format_tag::any for destination tensors

Additional format synchronization is required between forward and backward propagation when running training workloads. This is achieved via the hint_pd arguments of primitive descriptor constructors for primitives that implement backward propagation.

API

enum dnnl::memory::format_tag

Memory format tag specification.

Memory format tags can be further divided into two categories:

  • Domain-agnostic names, i.e. names that do not depend on the tensor usage in the specific primitive. These names use letters from a to f to denote logical dimensions and form the order in which the dimensions are laid in memory. For example, dnnl::memory::format_tag::ab is used to denote a 2D tensor where the second logical dimension (denoted as b) is the innermost, i.e. has stride = 1, and the first logical dimension (a) is laid out in memory with stride equal to the size of the second dimension. On the other hand, dnnl::memory::format_tag::ba is the transposed version of the same tensor: the outermost dimension (a) becomes the innermost one.

  • Domain-specific names, i.e. names that make sense only in the context of a certain domain, such as CNN. These names are aliases to the corresponding domain-agnostic tags and used mostly for convenience. For example, dnnl::memory::format_tag::nc is used to denote 2D CNN activations tensor memory format, where the channels dimension is the innermost one and the batch dimension is the outermost one. Moreover, dnnl::memory::format_tag::nc is an alias for dnnl::memory::format_tag::ab, because for CNN primitives the logical dimensions of activations tensors come in order: batch, channels, spatial. In other words, batch corresponds to the first logical dimension (a), and channels correspond to the second one (b).

The following domain-specific notation applies to memory format tags:

  • 'n' denotes the mini-batch dimension

  • 'c' denotes a channels dimension

  • When there are multiple channel dimensions (for example, in convolution weights tensor), 'i' and 'o' denote dimensions of input and output channels

  • 'g' denotes a groups dimension for convolution weights

  • 'd', 'h', and 'w' denote spatial depth, height, and width respectively

Values:

enumerator undef

Undefined memory format tag.

enumerator any

Placeholder memory format tag. Used to instruct the primitive to select a format automatically.

enumerator a

plain 1D tensor

enumerator ab

plain 2D tensor

enumerator ba

permuted 2D tensor

enumerator abc

plain 3D tensor

enumerator acb

permuted 3D tensor

enumerator bac

permuted 3D tensor

enumerator bca

permuted 3D tensor

enumerator cba

permuted 3D tensor

enumerator abcd

plain 4D tensor

enumerator abdc

permuted 4D tensor

enumerator acdb

permuted 4D tensor

enumerator bacd

permuted 4D tensor

enumerator bcda

permuted 4D tensor

enumerator cdba

permuted 4D tensor

enumerator dcab

permuted 4D tensor

enumerator abcde

plain 5D tensor

enumerator abdec

permuted 5D tensor

enumerator acbde

permuted 5D tensor

enumerator acdeb

permuted 5D tensor

enumerator bcdea

permuted 5D tensor

enumerator cdeba

permuted 5D tensor

enumerator decab

permuted 5D tensor

enumerator abcdef

plain 6D tensor

enumerator acbdef

plain 6D tensor

enumerator defcab

plain 6D tensor

enumerator x = a

1D tensor; an alias for dnnl::memory::format_tag::a

enumerator nc = ab

2D CNN activations tensor; an alias for dnnl::memory::format_tag::ab

enumerator cn = ba

2D CNN activations tensor; an alias for dnnl::memory::format_tag::ba

enumerator tn = ab

2D RNN statistics tensor; an alias for dnnl::memory::format_tag::ab

enumerator nt = ba

2D RNN statistics tensor; an alias for dnnl::memory::format_tag::ba

enumerator ncw = abc

3D CNN activations tensor; an alias for dnnl::memory::format_tag::abc

enumerator nwc = acb

3D CNN activations tensor; an alias for dnnl::memory::format_tag::acb

enumerator nchw = abcd

4D CNN activations tensor; an alias for dnnl::memory::format_tag::abcd

enumerator nhwc = acdb

4D CNN activations tensor; an alias for dnnl::memory::format_tag::acdb

enumerator chwn = bcda

4D CNN activations tensor; an alias for dnnl::memory::format_tag::bcda

enumerator ncdhw = abcde

5D CNN activations tensor; an alias for dnnl::memory::format_tag::abcde

enumerator ndhwc = acdeb

5D CNN activations tensor; an alias for dnnl::memory::format_tag::acdeb

enumerator oi = ab

2D CNN weights tensor; an alias for dnnl::memory::format_tag::ab

enumerator io = ba

2D CNN weights tensor; an alias for dnnl::memory::format_tag::ba

enumerator oiw = abc

3D CNN weights tensor; an alias for dnnl::memory::format_tag::abc

enumerator owi = acb

3D CNN weights tensor; an alias for dnnl::memory::format_tag::acb

enumerator wio = cba

3D CNN weights tensor; an alias for dnnl::memory::format_tag::cba

enumerator iwo = bca

3D CNN weights tensor; an alias for dnnl::memory::format_tag::bca

enumerator oihw = abcd

4D CNN weights tensor; an alias for dnnl::memory::format_tag::abcd

enumerator hwio = cdba

4D CNN weights tensor; an alias for dnnl::memory::format_tag::cdba

enumerator ohwi = acdb

4D CNN weights tensor; an alias for dnnl::memory::format_tag::acdb

enumerator ihwo = bcda

4D CNN weights tensor; an alias for dnnl::memory::format_tag::bcda

enumerator iohw = bacd

4D CNN weights tensor; an alias for dnnl::memory::format_tag::bacd

enumerator oidhw = abcde

5D CNN weights tensor; an alias for dnnl::memory::format_tag::abcde

enumerator dhwio = cdeba

5D CNN weights tensor; an alias for dnnl::memory::format_tag::cdeba

enumerator odhwi = acdeb

5D CNN weights tensor; an alias for dnnl::memory::format_tag::acdeb

enumerator idhwo = bcdea

5D CNN weights tensor; an alias for dnnl::memory::format_tag::bcdea

enumerator goiw = abcd

4D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::abcd

enumerator wigo = dcab

4D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::dcab

enumerator goihw = abcde

5D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::abcde

enumerator hwigo = decab

5D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::decab

enumerator giohw = acbde

5D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::acbde

enumerator goidhw = abcdef

6D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::abcdef

enumerator giodhw = acbdef

6D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::abcdef

enumerator dhwigo = defcab

6D CNN weights tensor with groups; an alias for dnnl::memory::format_tag::defcab

enumerator tnc = abc

3D RNN data tensor in the format (seq_length, batch, input channels).

enumerator ntc = bac

3D RNN data tensor in the format (batch, seq_length, input channels).

enumerator ldnc = abcd

4D RNN states tensor in the format (num_layers, num_directions, batch, state channels).

enumerator ldigo = abcde

5D RNN weights tensor in the format (num_layers, num_directions, input_channels, num_gates, output_channels).

  • For LSTM cells, the gates order is input, forget, candidate and output gate.

  • For GRU cells, the gates order is update, reset and output gate.

enumerator ldgoi = abdec

5D RNN weights tensor in the format (num_layers, num_directions, num_gates, output_channels, input_channels).

  • For LSTM cells, the gates order is input, forget, candidate and output gate.

  • For GRU cells, the gates order is update, reset and output gate.

enumerator ldio = abcd

4D LSTM projection tensor in the format (num_layers, num_directions, num_channels_in_hidden_state, num_channels_in_recurrent_projection).

enumerator ldoi = abdc

4D LSTM projection tensor in the format (num_layers, num_directions, num_channels_in_recurrent_projection, num_channels_in_hidden_state).

enumerator ldgo = abcd

4D RNN bias tensor in the format (num_layers, num_directions, num_gates, output_channels).

  • For LSTM cells, the gates order is input, forget, candidate and output gate.

  • For GRU cells, the gates order is update, reset and output gate.