Quantization

World Engine supports various quantization methods for efficient model compression.

world_engine.quantize

world_engine.quantize.fp4_linear(a_bf16, b_fp4_T, a_global_sf, b_sf_T, alpha)

class world_engine.quantize.FP4Linear(lin)[source]

Bases: Module

FP4 Linear layer using FlashInfer’s NVFP4 quantization.

Forward pass using FP4 quantization and FlashInfer GEMM.

class world_engine.quantize.FP8W8A8Linear(lin)[source]

Bases: Module

class world_engine.quantize.FP8Linear(lin)[source]

Bases: Module

Forward pass using FP8 matmul.

Parameters:: x (Tensor) – Input tensor of shape […, in_features] (flattens if > 2D)
Return type:: Tensor
Returns:: Output tensor of shape […, out_features] in BF16 format, unflattened if input is > 2D