Configuration of Data Layouts#

The DFT interface provides the configuration parameters config_param::FWD_STRIDES (resp. config_param::BWD_STRIDES) to define the data layout locating entries of relevant data sequences in the forward (resp. backward) domain. In case of batched transforms, i.e., if \(M > 1\) is configured by setting config_param::NUMBER_OF_TRANSFORMS accordingly, config_param::FWD_DISTANCE (resp. config_param::BWD_DISTANCE) completes the description of the data layout by specifying the distances between successive data sequences in the forward (resp. backward) domain.

Using the notations from the introduction and the superscript \(\text{fwd}\) (resp. \(\text{bwd}\)) for data sequences belonging to forward (resp. backward) domain, for any \(m\) and multi-index \(\left(k_1, k_2, \ldots, k_d\right)\) within valid range, the corresponding entry \(\left(\cdot\right)^{m}_{k_{1}, k_{2}, \dots, k_d }\) - or the real or imaginary part thereof - of the relevant data sequence is located at index

(1)#\[s^{\text{xwd}}_0 + k_1\ s^{\text{xwd}}_1 + k_2\ s^{\text{xwd}}_2 + \dots + k_d\ s^{\text{xwd}}_d + m\ l^{\text{xwd}}\]

of the corresponding data container (sycl::buffer object or device-accessible USM allocation) provided to the compute function, the base data type of which is (possibly implicitly re-interpreted) as documented in the table below. In the index expression (1), \(\text{x} = \text{f}\) (resp. \(\text{x} = \text{b}\)) for entries of forward-domain (resp. backward-domain) data sequences and

  • \(s^{\text{xwd}}_j\), \(\forall j \in \lbrace 0, \ldots, d\rbrace\) represents the offset and generalized strides defining the locations of entries within each \(d\)-dimensional data sequence in the forward (resp. backward) domain if \(\text{x} = \text{f}\) (resp. if \(\text{x} = \text{b}\)), counted in number of elements of the relevant implicitly-assumed elementary data type;

  • \(l^{\text{xwd}}\) represents the distance between successive \(d\)-dimensional data sequences in the forward (resp. backward) domain if \(\text{x} = \text{f}\) (resp. if \(\text{x} = \text{b}\)), counted in number of elements of the relevant implicitly-assumed elementary data type.

Note

All data sequences (or respective real and imaginary parts thereof if separately stored) must belong to the same block allocation, as a consequence of the generalized index (1).

Implicitly-assumed elementary data type

When reading or writing an element at index (1) of any user-provided data container used at compute time, a descriptor object may re-interpret the base data type of that data container into an implicitly-assumed elementary data type. That implicitly-assumed data type depends on the object type, i.e., on the specialization values used for the template parameters when instantiating the descriptor class, and, in case of complex descriptors, on the configuration value set for its configuration parameter config_param::COMPLEX_STORAGE. The table below lists the implicitly-assumed data type in either domain (last 2 columns) based on the object type and its configuration value for config_param::COMPLEX_STORAGE (first 2 columns).

Object type

Configuration value for configuration parameter config_param::COMPLEX_STORAGE

Implicitly-assumed elementary data type in forward domain

Implicitly-assumed elementary data type in backward domain

descriptor<precision::SINGLE, domain::COMPLEX>

config_value::COMPLEX_COMPLEX

std::complex<float>

std::complex<float>

descriptor<precision::DOUBLE, domain::COMPLEX>

config_value::COMPLEX_COMPLEX

std::complex<double>

std::complex<double>

descriptor<precision::SINGLE, domain::COMPLEX>

config_value::REAL_REAL

float

float

descriptor<precision::DOUBLE, domain::COMPLEX>

config_value::REAL_REAL

double

double

descriptor<precision::SINGLE, domain::REAL>

irrelevant

float

std::complex<float>

descriptor<precision::DOUBLE, domain::REAL>

irrelevant

double

std::complex<double>

Configuring data layouts for batched transforms

The value \(l^{\text{xwd}}\) in (1) above is communicated as an std::int64_t configuration value, set for the configuration parameter config_param::FWD_DISTANCE if \(\text{x} = \text{f}\) (resp. config_param::BWD_DISTANCE if \(\text{x} = \text{b}\)). This value is irrelevant for unbatched transforms, i.e., for descriptors set to handle a number of transforms \(M\) equal to \(1\) (default behavior).

In case of batched transforms, the number \(M > 1\) of desired DFTs must be set explicitly as an std::int64_t configuration value for the configuration parameter config_param::NUMBER_OF_TRANSFORMS. In that case, the configuration parameters config_param::FWD_DISTANCE and config_param::BWD_DISTANCE must also be set explicitly since their default configuration values of \(0\) would break the consistency requirements for any \(M > 1\).

Configuring strides in forward and backward domains

The values \(s^{\text{xwd}}_0, s^{\text{xwd}}_1, \dots, s^{\text{xwd}}_d\) in (1) above are communicated as elements, in that order, of a \((d+1)\)-long std::vector<std::int64_t> configuration value, set for the configuration parameter config_param::FWD_STRIDES if \(\text{x} = \text{f}\) (resp. config_param::BWD_STRIDES if \(\text{x} = \text{b}\)). The element \(s^{\text{xwd}}_0\) represents an absolute offset (or “displacement”) in the data sets while the subsequent elements \(s^{\text{xwd}}_j\ (j > 0)\) are generalized strides to be considered along dimensions \(j \in \lbrace 1, \ldots, d\rbrace\).

The default values set for the forward and backward strides correspond to the data layout configurations for unbatched, in-place transforms using unit stride along the last dimension with no offset (and minimal padding in forward domain in case of real descriptors, aligning with the requirements for in-place transforms). In other words, the default values are \(s^{\text{fwd}}_0 = s^{\text{bwd}}_0 = 0\), \(s^{\text{fwd}}_d = s^{\text{bwd}}_d = 1\) and, for \(d\)-dimensional DFTs with \(d > 1\),

  • \(s^{\text{fwd}}_{d-1} = s^{\text{bwd}}_{d-1} = n_{d}\) for complex descriptors;

  • \(s^{\text{bwd}}_{d-1} = \lfloor \frac{n_{d}}{2} \rfloor + 1\), and \(s^{\text{fwd}}_{d-1} = 2 s^{\text{bwd}}_{d-1}\) for real descriptors;

  • if \(d > 2\), \(s^{\text{xwd}}_k = n_{k+1} s^{\text{xwd}}_{k+1}\) for \(k \in \lbrace 1, \ldots, d - 2\rbrace\) (for \(\text{x} = \text{f}\) and \(\text{x} = \text{b}\)).

General consistency requirements

In general, the distances and strides must be set so that every index value (1) corresponds to a unique entry of the data sequences under consideration. In other words, there must not be one index value as expressed in (1) that corresponds to two different \((d+1)\)-tuples \((m, k_{1}, k_{2}, \dots, k_d)\) that are both within the elementary range of indices considered by oneMKL.

Additionally, for in-place transforms (configuration value config_value::INPLACE associated with configuration parameter config_param::PLACEMENT), the smallest stride value must be associated with the same dimension in forward and backward domains and the data layouts must abide by following “consistency requirement”: the memory address(es) of leading entry(ies) along the last dimension must be identical in forward and backward domains. Specifically, considering any \((d+1)\)-tuple \((m, k_{1}, k_{2}, \dots, k_{d-1}, 0)\) within valid range, the memory address of the element of corresponding index value (1) in forward domain (considering the implicitly assumed type in forward domain) must be identical to the memory address of the element of corresponding index value (1) in backward domain (considering the implicitly assumed type in backward domain). Equivalently,

  • for complex descriptors, the offset, stride(s) (and distances, if relevant) must be equal in forward and backward domain;

  • for real descriptors, offsets and strides must satisfy \(s^{\text{fwd}}_{j} = 2 s^{\text{bwd}}_{j}\ \forall j \in \lbrace 0, \ldots, d - 1\rbrace\) (note that \(0 \leq j < d\)) and distances, if relevant, must satisfy \(l^{\text{fwd}} = 2 l^{\text{bwd}}\). Note that this leads to some data padding being required in forward domain if unit strides are used along the last dimension in forward and backward domains.

Configuring strides for input and output data [deprecated, not recommended]

Instead of specifying strides by domain, one may choose to specify the strides for input and output data sequences. Let \(s^{\text{x}}_{j}, \ j \in \lbrace 0, 1, \ldots, d\rbrace\) be the stride values for input (resp. output) data sequences if \(\text{x} = \text{i}\) (resp. \(\text{x} = \text{o}\)). Such \(s^{\text{x}}_0, s^{\text{x}}_1, \dots, s^{\text{x}}_d\) values may be communicated as elements, in that order, of a \((d+1)\)-long std::vector<std::int64_t> configuration value, set for the (deprecated) configuration parameter config_param::INPUT_STRIDES if \(\text{x} = \text{i}\) (resp. config_param::OUTPUT_STRIDES if \(\text{x} = \text{o}\)).

The values of \(s^{\text{i}}_{j}\) and \(s^{\text{o}}_{j}\) are to be used and considered by oneMKL if and only if \(s^{\text{fwd}}_{j} = s^{\text{bwd}}_{j} = 0, \forall j \in \lbrace 0, 1, \ldots, d\rbrace\). (This will happen automatically if config_param::INPUT_STRIDES and config_param::OUTPUT_STRIDES are set and config_param::FWD_STRIDES and config_param::BWD_STRIDES are not. See note below.) In such a case, descriptor objects must consider the data layouts corresponding to the two compute directions separately. As detailed above, relevant data sequence entries are accessed as elements of data containers (sycl::buffer objects or device-accessible USM allocations) provided to the compute function, the base data type of which is (possibly implicitly re-interpreted) as documented in this table. If using input and output strides, for any \(m\) and multi-index \(\left(k_1, k_2, \ldots, k_d\right)\) within valid range, the index to be used when accessing a data sequence entry - or part thereof - in forward domain is

\[s^{\text{x}}_0 + k_1\ s^{\text{x}}_1 + k_2\ s^{\text{x}}_2 + \dots + k_d\ s^{\text{x}}_d + m\ l^{\text{fwd}}\]

where \(\text{x} = \text{i}\) (resp. \(\text{x} = \text{o}\)) for forward (resp. backward) DFT(s). Similarly, the index to be used when accessing a data sequence entry - or part thereof - in backward domain is

\[s^{\text{x}}_0 + k_1\ s^{\text{x}}_1 + k_2\ s^{\text{x}}_2 + \dots + k_d\ s^{\text{x}}_d + m\ l^{\text{bwd}}\]

where \(\text{x} = \text{o}\) (resp. \(\text{x} = \text{i}\)) for forward (resp. backward) DFT(s).

As a consequence, configuring descriptor objects using these deprecated configuration parameters makes their configuration direction-dependent when different stride values are used in forward and backward domains. Since the intended compute direction is unknown to the descriptor object when committing it, every direction that results in a consistent data layout in forward and backward domains must be supported by successfully committed descriptor objects.

Note

For descriptor objects with strides configured via these deprecated configuration parameters, the consistency requirements may be satisfied for only one of the two compute directions, i.e., for only one of the forward or backward DFT(s). Such a configuration should not cause an exception to be thrown by the descriptor’s commit member function but the behavior of oneMKL is undefined if using that object for the compute direction that does not align with the consistency requirements.

Note

Setting either of config_param::INPUT_STRIDES or config_param::OUTPUT_STRIDES triggers any default or previously-set values for config_param::FWD_STRIDES and config_param::BWD_STRIDES to reset to std::vector<std::int64_t>(d+1, 0) values, and vice versa. This default behavior prevents mix-and-matching usage of either of config_param::INPUT_STRIDES or config_param::OUTPUT_STRIDES with either of config_param::FWD_STRIDES or config_param::BWD_STRIDES, which is not to be supported. If such a configuration is attempted, an exception is to be thrown at commit time due to invalid configuration, as the stride values that were implicitly reset surely invalidate the consistency requirements for any non-trivial DFT.

If specifying the data layout strides using these deprecated configuration parameters and if the strides differ in forward and backward domain, the descriptor must be re-configured and re-committed for computing the DFT in the reverse direction as shown below.

// ...
desc.set_value(config_param::INPUT_STRIDES,  fwd_domain_strides);
desc.set_value(config_param::OUTPUT_STRIDES, bwd_domain_strides);
desc.commit(queue);
compute_forward(desc, ...);
// ...
desc.set_value(config_param::INPUT_STRIDES,  bwd_domain_strides);
desc.set_value(config_param::OUTPUT_STRIDES, fwd_domain_strides);
desc.commit(queue);
compute_backward(desc, ...);

The config_param::INPUT_STRIDES and config_param::OUTPUT_STRIDES parameters are deprecated. A warning message “{IN,OUT}PUT_STRIDES are deprecated: please use {F,B}WD_STRIDES, instead.” is to be reported to applications using these configuration parameters.

Parent topic DFT-related scoped enumeration types