Convolution¶

Versioned name: Convolution-1

Category: Convolution

Short description: Reference

Detailed description: Reference

In this description, $$r$$ denotes the spatial rank. We describe the convolution for each sample in a batch of $$N$$ inputs; the results are combined into an output batch of size $$N$$.

The convolution is implemented as if each sample input first has $$p_b$$ zeros inserted before and p_e zeros inserted for the channels on the spatial axes, giving a padded input size of $$p_b+p_e+X_I$$.

The kernel is stretched by a factor of d on each of its spatial dimensions. The last index of the stretched kernel is then $$d(X_K-1)$$ so the shape is $$d(X_K-1)+1$$.

The padded input and the dilated kernel are then ungrouped into g equal-sized input and kernel segments; padded input segment $$i$$ and dilated kernel segment $$i$$ are convolved. The convolution is only performed where there is complete spatial overlap between the shifted kernel and the padded input, so there will be $$p_b+p_e+X_I-d(X_K-1)$$ outputs. The output segments are then regrouped along the output channel axis. Finally, all but the results on a multiple of $$d$$ spatial axis are removed, so the output will have size:

$\left\lfloor \frac{p_b+p_e+X_I-d(X_K-1)-1}{s} \right\rfloor +1$

Attributes

• strides

• Description: strides is how much the convolution output is down-sampled to produce the output.

• Range of values: Non-negative integer values.

• Type: int[r]

• Variable: $$s$$

• Default value: None

• Required: yes

• Description: pads_begin is a number of zeros to add to the beginning of each spatial axis.

• Range of values: Non-negative integers.

• Type: int[r]

• Variable: $$p_b$$

• Default value: None

• Required: yes

• Note: the attribute is ignored when auto_pad attribute is specified.

• Description: pads_end is a number of zeros to add to the end of each spatial axis.

• Range of values: Non-negative integers.

• Type: int[r]

• Variable: $$p_e$$

• Default value: None

• Required: yes

• Note: the attribute is ignored when auto_pad attribute is specified.

• dilations

• Description: dilations denotes the amount to stretch the kernel before convolving.

• Range of values: positive integers.

• Type: int[r]

• Variable: $$d$$

• Default value: None

• Required: yes

• None (not specified): use explicit padding values.

• same_upper (same_lower) the input is padded to match the output size. In case of odd padding value an extra padding is added at the end (at the beginning).

• valid - No padding ($$p_b=p_e=0$$).

• Type: string

• Default value: None

• Required: no

With same_upper and same_lower the padding is chosen to make the pre-stride output spatial shape the same as the input shape. When possible, $$p_b=p_e$$. If the total padding needed is odd, same_upper makes $$p_e=p_b+1$$, same_lower makes p_b=p_e+1. In either case,

$p_b+p_e=d(X_I-1).$
• groups

• Description: groups denotes the number of groups input and output channels are divided into.

• Range of values: integer value greater than 0

• Type: int

• Variable: g

• Default value: 1

• Required: no

• data_format

• Description: data_format denotes the data format of the input and output data.

• Range of values: NXC or NCX (X means HW for 2D convolution, DHW for 3D convolution)

• Type: string

• Default value: NXC

• Required: no

• filter_format

• Description: filter_format denotes the data format of the filter.

• Range of values: XIO or OIX (X means HW for 2D convolution, DHW for 3D convolution)

• Type: string

• Default value: XIO

• Required: no

Inputs:

• 1: input - the input tensor. The format is specified by data_format. Required.

• 2: filter - convolution kernel tensor. The format is specified by filter_format. The size of the kernel is derived from the shape of this input and not specified by any attribute. Required.

• 3: bias - a 1-D tensor adds to channel dimension of input. Broadcasting is supported. Optional.

Outputs:

• 1: output – output tensor. The format is specified by data_format.