CtrlK

Quantization

quantize: code

Converts a float tensor to quantized tensor with given scale and zero point.
Quantization(x, scale, zero_point) = round(x/scale + zero_point)

dequantize: code

Mapping the quantized tensor back to the original scale and pivot

Reference

https://www.h-schmidt.net/FloatConverter/IEEE754.html

PreviousConfidence Interval NextBasics

Last updated 3 years ago

Was this helpful?