quantize: codearrow-up-right
Converts a float tensor to quantized tensor with given scale and zero point.
Quantization(x, scale, zero_point) = round(x/scale + zero_point)
dequantize: codearrow-up-right
Mapping the quantized tensor back to the original scale and pivot
https://www.h-schmidt.net/FloatConverter/IEEE754.htmlarrow-up-right
Last updated 4 years ago