.. SPDX-FileCopyrightText: 2020-2021 Intel Corporation .. .. SPDX-License-Identifier: CC-BY-4.0 ------------------------ BatchNormForwardTraining ------------------------ **Versioned name**: *BatchNormForwardTraining-1* **Category**: *Normalization* **Short description**: *BatchNormForwardTraining* works on forward pass at training mode. **Attributes**: * *epsilon* * **Description**: *epsilon* is the number to be added to the variance to avoid division by zero when normalizing a value. For example, *epsilon* equal to 0.001 means that 0.001 is added to the variance. * **Range of values**: arbitrary positive f32 value * **Type**: f32 * **Required**: *yes* * *momentum* * **Description**: *momentum* is used for the computation of running_mean and running_var. If it's not available, a cumulative moving average (i.e. simple average) will be computed. * **Range of values**: arbitrary positive f32 value * **Type**: f32 * **Default value**: None * **Required**: *no* * *data_format* * **Description**: *data_format* denotes the data format of the input and output data. * **Range of values**: *NXC* or *NCX* (X means HW for 2D, DHW for 3D) * **Type**: string * **Default value**: *NXC* * **Required**: *no* **Inputs** * **1**: ``input`` - input tensor with data for normalization. The format is specified by *data_format*. **Required.** * **Type**: T1 * **2**: ``mean`` - value for mean normalization. A 1D tensor with the same span as input's channel axis. **Required.** * **Type**: T2 * **3**: ``variance`` - value for variance normalization. A 1D tensor with the same span as input's channel axis. **Required.** * **Type**: T2 * **4**: ``gamma`` - gamma scaling for normalized value. A 1D tensor with the same span as input's channel axis. **Optional.** * **Type**: T2 * **5**: ``beta`` - beta added to the scaled normalized value. A 1D tensor with the same span as input's channel axis. **Optional.** * **Type**: T2 **Outputs** * **1**: ``output`` - the result of normalization. A tensor of the same shape and format with 1st input tensor. * **Type**: T1 * **2**: ``running mean`` - the computed running mean. * **Type**: T2 * **3**: ``running variance`` - the computed running variance. * **Type**: T2 * **4**: ``batch mean`` - the computed batch mean. * **Type**: T2 * **5**: ``batch variance`` - the computed batch variance. * **Type**: T2 **Types** * *T1*: f32, f16, bf16. * *T2*: f32, bf16. * Constraints: *T2* can be bf16 only when *T1* is bf16. **Mathematical Formulation** *BatchNormForwardTraining* normalizes the output in each hidden layer. * **Input**: Values of :math:`x` over a mini-batch: .. math:: \beta = \{ x_{1...m} \} * **Parameters to learn**: :math:`\gamma, \beta` * **Output**: .. math:: \{ o_{i} = BN_{\gamma, \beta} ( b_{i} ) \} * **Mini-batch mean**: .. math:: \mu_{\beta} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i} * **Mini-batch variance**: .. math:: \sigma_{\beta }^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\beta} )^{2} * **Normalize**: .. math:: \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\beta}}{\sqrt{\sigma_{\beta }^{2} + \epsilon }} * **Scale and shift**: .. math:: o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )