Quantize#

Versioned name: Quantize-1

Category: lower_precision

Short description: Quantize converts a f32 tensor to a quantized(u8/s8) tensor. It supports both per tensor and per channel asymmetric linear quantization. Output data type is specified in output tensor data_type. Nearest round is used in this OP. For per-tensor quantization:

\[q_{x}=round(x/scale+zp)\]

For per-channel quantization, take channel axis = 2 as example:

\[q_{x_{...,i,...,...}}=round(x_{...,i,...,...}/scale_i+zp_i),i\in{[0, channelNum-1]}\]

Attributes

  • qtype

    • Description: specifies which quantization type is used.

    • Range of values: “per_tensor” or “per_channel”

    • Type: string

    • Default value: “per_tensor”

    • Required: no

  • axis

    • Description: specifies dimension on which apply per-channel quantization. Only valid when qtype is “per_channel”.

    • Range of values: integers in [-r, r-1] where r = rank(input)

    • Type: s64

    • Default value: 1

    • Required: no

  • scales

    • Description: apply in quantization formula.

    • Range of values: arbitrary valid f32 value

    • Type: f32[]

    • Required: yes

  • zps

    • Description: offset value that maps to float zero.

    • Range of values: arbitrary valid s64 value

    • Type: s64[]

    • Required: yes

Inputs:

  • 1: input - f32 tensor to be quantized. Required.

    • Type: T1

Outputs:

  • 1: output - quantized tensor.

    • Type: T2

Types:

  • T1: f32.

  • T2: s8, u8.