aimet_onnx.experimental.adascale¶
Top level APIs
- aimet_onnx.experimental.adascale.adascale_optimizer.apply_adascale(sim, inputs, adascale_model_config, num_iterations=1500)¶
- Parameters:
sim (
QuantizationSimModel) – Quantization Sim modelinputs (
Collection[Dict[str,ndarray]]) – (Collection[Dict[str, np.ndarray]]): The set of input samples to use during optimization.adascale_model_config (
AdaScaleModelConfig) – Adascale model config. There are pre-defined configs for Llama, Qwen2, Mistral, Qwen3, Phi3. For other models use AdaScaleModelConfignum_iterations (
int) – Number of iterations to optimize for during AdaScale
- Example usage:
>>> model = DummyModel() >>> inputs = ... >>> adascale_model_config = adascale_model_config['llama'] >>> sim = QuantizationSimModel(model) >>> apply_adascale(sim, inputs, adascale_model_config, num_iterations=num_iterations) >>> sim.compute_encodings(...) >>> sim.export(...)
apply_adascale modifies the weights in-place in the model
compute encodings should not be called before the apply_adascale call
Activation quantizers will remain uninitialized throughout the feature, and so compute encodings needs to be called by the user afterwards. This is so activation encodings will be computed with updated weights taken into account.
Warning: This feature is currently considered experimental pending API changes