Quantization

World Engine supports various quantization methods for efficient model compression.

world_engine.quantize

world_engine.quantize.fp4_linear(a_bf16, b_fp4_T, a_global_sf, b_sf_T, alpha)
Return type:

Tensor

class world_engine.quantize.FP4Linear(lin)[source]

Bases: Module

FP4 Linear layer using FlashInfer’s NVFP4 quantization.

__init__(lin)[source]
forward(x)[source]

Forward pass using FP4 quantization and FlashInfer GEMM.

Return type:

Tensor

class world_engine.quantize.FP8W8A8Linear(lin)[source]

Bases: Module

__init__(lin)[source]
forward(x)[source]
Return type:

Tensor

class world_engine.quantize.FP8Linear(lin)[source]

Bases: Module

__init__(lin)[source]
forward(x)[source]

Forward pass using FP8 matmul.

Parameters:

x (Tensor) – Input tensor of shape […, in_features] (flattens if > 2D)

Return type:

Tensor

Returns:

Output tensor of shape […, out_features] in BF16 format, unflattened if input is > 2D

world_engine.quantize.quantize_model(model, quant)[source]