Attributes

Attributes are parameters that extend a primitive’s behavior.

Attributes can also contain post-ops, which are computations executed after the primitive.

API

enum dnnl::scratchpad_mode

Scratchpad mode.

Values:

enumerator library

The library manages the scratchpad allocation. There may be multiple implementation-specific policies that can be configured via mechanisms that fall outside of the scope of this specification.

enumerator user

The user manages the scratchpad allocation by querying and providing the scratchpad memory to primitives. This mode is thread-safe as long as the scratchpad buffers are not used concurrently by two primitive executions.

enum dnnl::prop_kind

Propagation kind.

Values:

enumerator undef

Undefined propagation kind.

enumerator forward_training

Forward data propagation (training mode). In this mode, primitives perform computations necessary for subsequent backward propagation.

enumerator forward_inference

Forward data propagation (inference mode). In this mode, primitives perform only computations that are necessary for inference and omit computations that are necessary only for backward propagation.

enumerator forward_scoring

Forward data propagation, alias for dnnl::prop_kind::forward_inference.

enumerator forward

Forward data propagation, alias for dnnl::prop_kind::forward_training.

enumerator backward

Backward propagation (with respect to all parameters).

enumerator backward_data

Backward data propagation.

enumerator backward_weights

Backward weights propagation.

enumerator backward_bias

Backward bias propagation.

enum dnnl::algorithm

Kinds of algorithms.

Values:

enumerator undef

Undefined algorithm.

enumerator convolution_auto

Convolution algorithm that is chosen to be either direct or Winograd automatically

enumerator convolution_direct

Direct convolution.

enumerator convolution_winograd

Winograd convolution.

enumerator deconvolution_direct

Direct deconvolution.

enumerator deconvolution_winograd

Winograd deconvolution.

enumerator eltwise_relu

Elementwise: rectified linear unit (ReLU)

enumerator eltwise_tanh

Elementwise: hyperbolic tangent non-linearity (tanh)

enumerator eltwise_elu

Elementwise: exponential linear unit (ELU)

enumerator eltwise_square

Elementwise: square.

enumerator eltwise_abs

Elementwise: abs.

enumerator eltwise_sqrt

Elementwise: square root.

enumerator eltwise_swish

Elementwise: swish ( \(x \cdot sigmoid(a \cdot x)\))

enumerator eltwise_linear

Elementwise: linear.

enumerator eltwise_bounded_relu

Elementwise: bounded_relu.

enumerator eltwise_soft_relu

Elementwise: soft_relu.

enumerator eltwise_logistic

Elementwise: logistic.

enumerator eltwise_exp

Elementwise: exponent.

enumerator eltwise_gelu

Elementwise: gelu alias for dnnl::algorithm::eltwise_gelu_tanh

enumerator eltwise_gelu_tanh

Elementwise: tanh-based gelu.

enumerator eltwise_gelu_erf

Elementwise: erf-based gelu.

enumerator eltwise_log

Elementwise: natural logarithm.

enumerator eltwise_clip

Elementwise: clip.

enumerator eltwise_pow

Elementwise: pow.

enumerator eltwise_relu_use_dst_for_bwd

Elementwise: rectified linar unit (ReLU) (dst for backward)

enumerator eltwise_tanh_use_dst_for_bwd

Elementwise: hyperbolic tangent non-linearity (tanh) (dst for backward)

enumerator eltwise_elu_use_dst_for_bwd

Elementwise: exponential linear unit (ELU) (dst for backward)

enumerator eltwise_sqrt_use_dst_for_bwd

Elementwise: square root (dst for backward)

enumerator eltwise_logistic_use_dst_for_bwd

Elementwise: logistic (dst for backward)

enumerator eltwise_exp_use_dst_for_bwd

Elementwise: exponent (dst for backward)

enumerator lrn_across_channels

Local response normalization (LRN) across multiple channels.

enumerator lrn_within_channel

LRN within a single channel.

enumerator pooling_max

Max pooling.

enumerator pooling_avg

Average pooling exclude padding, alias for dnnl::algorithm::pooling_avg_include_padding

enumerator pooling_avg_include_padding

Average pooling include padding.

enumerator pooling_avg_exclude_padding

Average pooling exclude padding.

enumerator vanilla_rnn

RNN cell.

enumerator vanilla_lstm

LSTM cell.

enumerator vanilla_gru

GRU cell.

enumerator lbr_gru

GRU cell with linear before reset. Differs from original GRU in how the new memory gate is calculated: \(c_t = tanh(W_c*x_t + b_{c_x} + r_t*(U_c*h_{t-1}+b_{c_h})) \) LRB GRU expects 4 bias tensors on input: \([b_{u}, b_{r}, b_{c_x}, b_{c_h}]\)

enumerator binary_add

Binary add.

enumerator binary_mul

Binary mul.

enumerator binary_max

Binary max.

enumerator binary_min

Binary min.

enumerator resampling_nearest

Nearest Neighbor resampling method.

enumerator resampling_linear

Linear (Bilinear, Trilinear) resampling method.

struct dnnl::post_ops

Post-ops.

Post-ops are computations executed after the main primitive computations and are attached to the primitive via primitive attributes.

Public Functions

post_ops()

Constructs an empty sequence of post-ops.

int len() const

Returns the number of post-ops entries.

primitive::kind kind(int index) const

Returns the primitive kind of post-op at entry with a certain index.

Return

Primitive kind of the post-op at the specified index.

Parameters
  • index: Index of the post-op to return the kind for.

void append_sum(float scale = 1.)

Appends an accumulation (sum) post-op. Prior to accumulating the result, the previous value would be multiplied by a scaling factor scale.

The kind of this post-op is dnnl::primitive::kind::sum.

This feature may improve performance for cases like residual learning blocks, where the result of convolution is accumulated to the previously computed activations. The parameter scale may be used for the integer-based computations when the result and previous activations have different logical scaling factors.

In the simplest case when the accumulation is the only post-op, the computations would be dst[:] := scale * dst[:] + op(...) instead of dst[:] := op(...).

Note

This post-op executes in-place and does not change the destination layout.

Parameters
  • scale: Scaling factor.

void get_params_sum(int index, float &scale) const

Returns the parameters of an accumulation (sum) post-op.

Parameters
  • index: Index of the sum post-op.

  • scale: Scaling factor of the sum post-op.

void append_eltwise(float scale, algorithm algorithm, float alpha, float beta)

Appends an elementwise post-op.

The kind of this post-op is dnnl::primitive::kind::eltwise.

In the simplest case when the elementwise is the only post-op, the computations would be dst[:] := scale * eltwise_op (op(...)) instead of dst[:] <- op(...).

where eltwise_op is configured with the given parameters.

Parameters
  • scale: Scaling factor.

  • algorithm: Elementwise algorithm.

  • alpha: Alpha parameter for the elementwise algorithm.

  • beta: Beta parameter for the elementwise algorithm.

void get_params_eltwise(int index, float &scale, algorithm &algorithm, float &alpha, float &beta) const

Returns parameters of an elementwise post-up.

Parameters
  • index: Index of the post-op.

  • scale: Output scaling factor.

  • algorithm: Output elementwise algorithm kind.

  • alpha: Output alpha parameter for the elementwise algorithm.

  • beta: Output beta parameter for the elementwise algorithm.

struct dnnl::primitive_attr

Primitive attributes.

Public Functions

primitive_attr()

Constructs default (empty) primitive attributes.

scratchpad_mode get_scratchpad_mode() const

Returns the scratchpad mode.

void set_scratchpad_mode(scratchpad_mode mode)

Sets scratchpad mode.

Parameters
  • mode: Specified scratchpad mode.

void get_output_scales(int &mask, std::vector<float> &scales) const

Returns output scaling factors correspondence mask and values.

Parameters
  • mask: Scaling factors correspondence mask that defines the correspondence between the output tensor dimensions and the scales vector. The set i-th bit indicates that a dedicated output scaling factor is used for each index along that dimension. The mask value of 0 implies a common output scaling factor for the whole output tensor.

  • scales: Vector of output scaling factors.

void set_output_scales(int mask, const std::vector<float> &scales)

Sets output scaling factors correspondence mask and values.

Example usage:

int mb = 32, oc = 32,
    oh = 14, ow = 14; // convolution output params
// unique output scales per output channel
vector<float> scales = { ... };
int oc_dim = 1; // mb_dim = 0, channel_dim = 1, height_dim = 2, ...

// construct a convolution descriptor
dnnl::convolution::desc conv_d;

dnnl::primitive_attr attr;
attr.set_output_scales(attr, oc, 1 << oc_dim, scales);

dnnl::primitive_desc conv_pd(conv_d, attr, engine);

Note

The order of dimensions does not depend on how elements are laid out in memory. For example:

  • for a 2D CNN activations tensor the order is always (n, c)

  • for a 4D CNN activations tensor the order is always (n, c, h, w)

  • for a 5D CNN weights tensor the order is always (g, oc, ic, kh, kw)

Parameters
  • mask: Defines the correspondence between the output tensor dimensions and the scales vector. The set i-th bit indicates that a dedicated scaling factor is used for each index along that dimension. Set the mask to 0 to use a common output scaling factor for the whole output tensor.

  • scales: Constant vector of output scaling factors. If the scaling factors are known at the time of this call, the following equality must hold: \(scales.size() = \prod\limits_{d \in mask} output.dims[d].\) Violations can only be detected when the attributes are used to create a primitive descriptor. If the scaling factors are not known at the time of the call, this vector must contain a single DNNL_RUNTIME_F32_VAL value and the output scaling factors must be passed at execution time as an argument with index DNNL_ARG_ATTR_OUTPUT_SCALES.

void get_scales(int arg, int &mask, std::vector<float> &scales) const

Returns scaling factors correspondence mask and values for a given memory argument.

Parameters
  • arg: Parameter argument index as passed to the primitive::execute() call.

  • mask: Scaling factors correspondence mask that defines the correspondence between the output tensor dimensions and the scales vector. The set i-th bit indicates that a dedicated scaling factor is used for each index along that dimension. Set the mask to 0 to use a common scaling factor for the whole output tensor.

  • scales: Output vector of scaling factors.

void set_scales(int arg, int mask, const std::vector<float> &scales)

Sets scaling factors for primitive operations for a given memory argument.

See

dnnl::primitive_attr::set_output_scales

Parameters
  • arg: Parameter argument index as passed to the primitive::execute() call.

  • mask: Scaling factors correspondence mask that defines the correspondence between the tensor dimensions and the scales vector. The set i-th bit indicates that a dedicated scaling factor is used for each index along that dimension. Set the mask to 0 to use a common scaling factor for the whole output tensor.

  • scales: Constant vector of scaling factors. The following equality must hold: \(scales.size() = \prod\limits_{d \in mask} argument.dims[d].\)

void get_zero_points(int arg, int &mask, std::vector<int32_t> &zero_points) const

Returns zero points correspondence mask and values.

Parameters
  • arg: Parameter argument index as passed to the primitive::execute() call.

  • mask: Zero points correspondence mask that defines the correspondence between the output tensor dimensions and the zero_points vector. The set i-th bit indicates that a dedicated zero point is used for each index along that dimension. Set the mask to 0 to use a common zero point for the whole output tensor.

  • zero_points: Output vector of zero points.

void set_zero_points(int arg, int mask, const std::vector<int32_t> &zero_points)

Sets zero points for primitive operations for a given memory argument.

See

dnnl::primitive_attr::set_output_scales

Parameters
  • arg: Parameter argument index as passed to the primitive::execute() call.

  • mask: Zero point correspondence mask that defines the correspondence between the tensor dimensions and the zero_points vector. The set i-th bit indicates that a dedicated zero point is used for each index along that dimension. Set the mask to 0 to use a common zero point for the whole output tensor.

  • zero_points: Constant vector of zero points. If the zero points are known at the time of this call, the following equality must hold: \(zero\_points.size() = \prod\limits_{d \in mask} argument.dims[d].\) If the zero points are not known at the time of the call, this vector must contain a single DNNL_RUNTIME_F32_VAL value and the zero points must be passed at execution time as an argument with index DNNL_ARG_ATTR_ZERO_POINTS.

const post_ops get_post_ops() const

Returns post-ops previously set via set_post_ops().

Return

Post-ops.

void set_post_ops(const post_ops ops)

Sets post-ops.

Note

There is no way to check whether the post-ops would be supported by the target primitive. Any error will be reported by the respective primitive descriptor constructor.

Parameters
  • ops: Post-ops object to copy post-ops from.

void set_rnn_data_qparams(float scale, float shift)

Sets quantization scale and shift parameters for RNN data tensors.

For performance reasons, the low-precision configuration of the RNN primitives expect input activations to have the unsigned 8-bit integer data type. The scale and shift parameters are used to quantize floating-point data to unsigned integer and must be passed to the RNN primitive using attributes.

The quantization formula is scale * (data + shift).

Example usage:

// RNN parameters
int l = 2, t = 2, mb = 32, sic = 32, slc = 32, dic = 32, dlc = 32;
// Activations quantization parameters
float scale = 2.0f, shift = 0.5f;

primitive_attr attr;

// Set scale and shift for int8 quantization of activation
attr.set_rnn_data_qparams(scale, shift);

// Create and configure rnn op_desc
vanilla_rnn_forward::desc rnn_d(/* arguments */);
vanilla_rnn_forward::primitive_desc rnn_d(rnn_d, attr, engine);

Note

Quantization scale and shift are common for src_layer, src_iter, dst_iter, and dst_layer.

Parameters
  • scale: The value to scale the data by.

  • shift: The value to shift the data by.

void set_rnn_weights_qparams(int mask, const std::vector<float> &scales)

Sets quantization scaling factors for RNN weights tensors. The low-precision configuration of the RNN primitives expect input weights to use the signed 8-bit integer data type. The scaling factors are used to quantize floating-point data to signed integer and must be passed to RNN primitives using attributes.

Note

The dimension order is always native and does not depend on the actual layout used. For example, five-dimensional weights always have (l, d, i, g, o) logical dimension ordering.

Note

Quantization scales are common for weights_layer and weights_iteration

Parameters
  • mask: Scaling factors correspondence mask that defines the correspondence between the output tensor dimensions and the scales vector. The set i-th bit indicates that a dedicated scaling factor should be used each index along that dimension. Set the mask to 0 to use a common scaling factor for the whole output tensor.

  • scales: Constant vector of output scaling factors. The following equality must hold: \(scales.size() = \prod\limits_{d \in mask} weights.dims[d].\) Violations can only be detected when the attributes are used to create a primitive descriptor.