Quantization
World Engine supports various quantization methods for efficient model compression.
world_engine.quantize
- world_engine.quantize.fp4_linear(a_bf16, b_fp4_T, a_global_sf, b_sf_T, alpha)
- Return type:
Tensor
- class world_engine.quantize.FP4Linear(lin)[source]
Bases:
ModuleFP4 Linear layer using FlashInfer’s NVFP4 quantization.