Diffuser Classes
Pipeline API
QEffTextEncoder
- class QEfficient.diffusers.pipelines.pipeline_module.QEffTextEncoder(model: Module)[source]
Wrapper for text encoder models with ONNX export and QAIC compilation capabilities.
This class handles text encoder models (CLIP, T5) with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. It applies custom PyTorch and ONNX transformations to prepare models for deployment.
- model
The wrapped text encoder model (deep copy of original)
- Type:
nn.Module
- _pytorch_transforms
PyTorch transformations applied before ONNX export
- Type:
List
- _onnx_transforms
ONNX transformations applied after export
- Type:
List
- compile(specializations: List[Dict], **compiler_options) None[source]
Compile the ONNX model for Qualcomm AI hardware.
- Parameters:
specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options (e.g., num_cores, aic_num_of_activations)
- export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str | None = None, export_kwargs: Dict = {}) str[source]
Export the text encoder model to ONNX format.
- Parameters:
inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments
- Returns:
Path to the exported ONNX model
- Return type:
str
- property get_model_config: Dict
Get the model configuration as a dictionary.
- Returns:
The configuration dictionary of the underlying text encoder model
- Return type:
Dict
- get_onnx_params() Tuple[Dict, Dict, List[str]][source]
Generate ONNX export configuration for the text encoder.
Creates example inputs, dynamic axes specifications, and output names tailored to the specific text encoder type (CLIP vs T5).
- Returns:
example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
QEffUNet
- class QEfficient.diffusers.pipelines.pipeline_module.QEffUNet(model: Module)[source]
Wrapper for UNet models with ONNX export and QAIC compilation capabilities.
This class handles UNet models with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. UNet is commonly used in diffusion models for image generation tasks.
- model
The wrapped UNet model
- Type:
nn.Module
- _pytorch_transforms
PyTorch transformations applied before ONNX export
- Type:
List
- _onnx_transforms
ONNX transformations applied after export
- Type:
List
- compile(specializations: List[Dict], **compiler_options) None[source]
Compile the ONNX model for Qualcomm AI hardware.
- Parameters:
specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options
- export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str | None = None, export_kwargs: Dict = {}) str[source]
Export the UNet model to ONNX format.
- Parameters:
inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments
- Returns:
Path to the exported ONNX model
- Return type:
str
- property get_model_config: Dict
Get the model configuration as a dictionary.
- Returns:
The configuration dictionary of the underlying UNet model
- Return type:
Dict
QEffVAE
- class QEfficient.diffusers.pipelines.pipeline_module.QEffVAE(model: Module, type: str)[source]
Wrapper for Variational Autoencoder (VAE) models with ONNX export and QAIC compilation.
This class handles VAE models with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. VAE models are used in diffusion pipelines for encoding images to latent space and decoding latents back to images.
- model
The wrapped VAE model (deep copy of original)
- Type:
nn.Module
- type
VAE operation type (“encoder” or “decoder”)
- Type:
str
- _pytorch_transforms
PyTorch transformations applied before ONNX export
- Type:
List
- _onnx_transforms
ONNX transformations applied after export
- Type:
List
- compile(specializations: List[Dict], **compiler_options) None[source]
Compile the ONNX model for Qualcomm AI hardware.
- Parameters:
specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options
- export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str | None = None, export_kwargs: Dict = {}) str[source]
Export the VAE model to ONNX format.
- Parameters:
inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments
- Returns:
Path to the exported ONNX model
- Return type:
str
- property get_model_config: Dict
Get the model configuration as a dictionary.
- Returns:
The configuration dictionary of the underlying VAE model
- Return type:
Dict
- get_onnx_params(latent_height: int = 32, latent_width: int = 32) Tuple[Dict, Dict, List[str]][source]
Generate ONNX export configuration for the VAE decoder.
- Parameters:
latent_height (int) – Height of latent representation (default: 32)
latent_width (int) – Width of latent representation (default: 32)
- Returns:
example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
- get_video_onnx_params() Tuple[Dict, Dict, List[str]][source]
Generate ONNX export configuration for the VAE decoder.
- Parameters:
latent_height (int) – Height of latent representation (default: 32)
latent_width (int) – Width of latent representation (default: 32)
- Returns:
example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
QEffFluxTransformerModel
- class QEfficient.diffusers.pipelines.pipeline_module.QEffFluxTransformerModel(model: Module)[source]
Wrapper for Flux Transformer2D models with ONNX export and QAIC compilation capabilities.
This class handles Flux Transformer2D models with specific transformations and optimizations for efficient inference on Qualcomm AI hardware. Flux uses a transformer-based diffusion architecture instead of traditional UNet, with dual transformer blocks and adaptive layer normalization (AdaLN) for conditioning.
- model
The wrapped Flux transformer model
- Type:
nn.Module
- _pytorch_transforms
PyTorch transformations applied before ONNX export
- Type:
List
- _onnx_transforms
ONNX transformations applied after export
- Type:
List
- compile(specializations: List[Dict], **compiler_options) None[source]
Compile the ONNX model for Qualcomm AI hardware.
- Parameters:
specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options (e.g., num_cores, aic_num_of_activations)
- export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str | None = None, export_kwargs: Dict = {}, use_onnx_subfunctions: bool = False) str[source]
Export the Flux transformer model to ONNX format.
- Parameters:
inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments (e.g., export_modules_as_functions)
use_onnx_subfunctions (bool) – Whether to export transformer blocks as ONNX functions for better modularity and potential optimization
- Returns:
Path to the exported ONNX model
- Return type:
str
- property get_model_config: Dict
Get the model configuration as a dictionary.
- Returns:
The configuration dictionary of the underlying Flux transformer model
- Return type:
Dict
- get_onnx_params(batch_size: int = 1, seq_length: int = 256, cl: int = 4096) Tuple[Dict, Dict, List[str]][source]
Generate ONNX export configuration for the Flux transformer.
Creates example inputs for all Flux-specific inputs including hidden states, text embeddings, timestep conditioning, and AdaLN embeddings.
- Parameters:
batch_size (int) – Batch size for example inputs (default: FLUX_ONNX_EXPORT_BATCH_SIZE)
seq_length (int) – Text sequence length (default: FLUX_ONNX_EXPORT_SEQ_LENGTH)
cl (int) – Compressed latent dimension (default: FLUX_ONNX_EXPORT_COMPRESSED_LATENT_DIM)
- Returns:
example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
QEffWanUnifiedTransformer
- class QEfficient.diffusers.pipelines.pipeline_module.QEffWanUnifiedTransformer(unified_transformer)[source]
Wrapper for WAN Unified Transformer with ONNX export and QAIC compilation capabilities.
This class handles the unified WAN transformer model that combines high and low noise transformers into a single model for efficient deployment. Based on the timestep shape, the model dynamically selects between high and low noise transformers during inference.
The wrapper applies specific transformations and optimizations for efficient inference on Qualcomm AI hardware, particularly for video diffusion models.
- model
The QEffWanUnifiedWrapper model that combines high/low noise transformers
- Type:
nn.Module
- _pytorch_transforms
PyTorch transformations applied before ONNX export
- Type:
List
- _onnx_transforms
ONNX transformations applied after export
- Type:
List
- compile(specializations, **compiler_options) None[source]
Compile the ONNX model for Qualcomm AI hardware.
- Parameters:
specializations (List[Dict]) – Model specialization configurations
**compiler_options – Additional compiler options (e.g., num_cores, aic_num_of_activations)
- export(inputs: Dict, output_names: List[str], dynamic_axes: Dict, export_dir: str | None = None, export_kwargs: Dict = {}, use_onnx_subfunctions: bool = False) str[source]
Export the Wan transformer model to ONNX format.
- Parameters:
inputs (Dict) – Example inputs for ONNX export
output_names (List[str]) – Names of model outputs
dynamic_axes (Dict) – Specification of dynamic dimensions
export_dir (str, optional) – Directory to save ONNX model
export_kwargs (Dict, optional) – Additional export arguments (e.g., export_modules_as_functions)
use_onnx_subfunctions (bool) – Whether to export transformer blocks as ONNX functions for better modularity and potential optimization
- Returns:
Path to the exported ONNX model
- Return type:
str
- property get_model_config: Dict
Get the model configuration as a dictionary.
- Returns:
The configuration dictionary of the underlying Wan transformer model
- Return type:
Dict
- get_onnx_params()[source]
Generate ONNX export configuration for the Wan transformer.
Creates example inputs for all Wan-specific inputs including hidden states, text embeddings, timestep conditioning, :returns: - example_inputs (Dict): Sample inputs for ONNX export
dynamic_axes (Dict): Specification of dynamic dimensions
output_names (List[str]): Names of model outputs
- Return type:
Tuple containing
Model Classes
QEffWanPipeline
- class QEfficient.diffusers.pipelines.wan.pipeline_wan.QEffWanPipeline(model, **kwargs)[source]
QEfficient-optimized WAN pipeline for high-performance text-to-video generation on Qualcomm AI hardware.
This pipeline provides an optimized implementation of the WAN diffusion model specifically designed for deployment on Qualcomm AI Cloud (QAIC) devices. It extends the original HuggingFace WAN model with QEfficient-optimized components that can be exported to ONNX format and compiled into Qualcomm Program Container (QPC) files for efficient video generation.
The pipeline supports the complete WAN workflow including: - UMT5 text encoding for rich semantic understanding - Unified transformer architecture: Combines multiple transformer stages into a single optimized model - VAE decoding for final video output - Performance monitoring and hardware optimization
- text_encoder
UMT5 text encoder for semantic text understanding (TODO: QEfficient optimization)
- unified_wrapper
Wrapper combining transformer stages
- Type:
QEffWanUnifiedWrapper
- transformer
Optimized unified transformer for denoising
- vae_decode
VAE decoder for latent-to-video conversion
- modules
Dictionary of pipeline modules for batch operations
- Type:
Dict[str, Any]
- model
Original HuggingFace WAN model reference
- Type:
WanPipeline
- tokenizer
Text tokenizer for preprocessing
- scheduler
Diffusion scheduler for timestep management
Example
>>> from QEfficient.diffusers.pipelines.wan import QEffWanPipeline >>> pipeline = QEffWanPipeline.from_pretrained("path/to/wan/model") >>> videos = pipeline( ... prompt="A cat playing in a garden", ... height=480, ... width=832, ... num_frames=81, ... num_inference_steps=4 ... ) >>> # Save generated video >>> videos.images[0].save("generated_video.mp4")
- compile(compile_config: str | None = None, parallel: bool = False, height: int = 192, width: int = 320, num_frames: int = 81, use_onnx_subfunctions: bool = False) str[source]
Compiles the ONNX graphs of the different model components for deployment on Qualcomm AI hardware.
This method takes the ONNX paths of the transformer and compiles them into an optimized format for inference using JSON-based configuration.
- Parameters:
compile_config (str, optional) – Path to a JSON configuration file containing compilation settings, device mappings, and optimization parameters. If None, uses the default configuration.
parallel (bool, default=False) – Compilation mode selection: - True: Compile modules in parallel using ThreadPoolExecutor for faster processing - False: Compile modules sequentially for lower resource usage
height (int, default=192) – Target image height in pixels.
width (int, default=320) – Target image width in pixels.
num_frames (int, deafult=81) – Target num of frames in pixel space
use_onnx_subfunctions (bool, default=False) – Whether to export models with ONNX subfunctions before compilation if not already exported.
- Raises:
RuntimeError – If compilation fails for any module or if QAIC compiler is not available
FileNotFoundError – If ONNX models haven’t been exported or config file is missing
ValueError – If configuration parameters are invalid
OSError – If there are issues with file I/O during compilation
Example
>>> pipeline = QEffWanPipeline.from_pretrained("path/to/wan/model") >>> # Sequential compilation with default config >>> pipeline.compile(height=480, width=832, num_frames=81) >>> >>> # Parallel compilation with custom config >>> pipeline.compile( ... compile_config="/path/to/custom_config.json", ... parallel=True, ... height=480, ... width=832, ... num_frames=81 ... )
- property do_classifier_free_guidance
Determine if classifier-free guidance should be used.
- Returns:
True if CFG should be applied based on current guidance scales
- Return type:
bool
- export(export_dir: str | None = None, use_onnx_subfunctions: bool = False) str[source]
Export all pipeline modules to ONNX format for deployment preparation.
This method systematically exports the unified transformer to ONNX format with video-specific configurations including temporal dimensions, dynamic axes, and optimization settings. The export process prepares the model for subsequent compilation to QPC format for efficient inference on QAIC hardware.
- Parameters:
export_dir (str, optional) – Target directory for saving ONNX model files. If None, uses the default export directory structure. The directory will be created if it doesn’t exist.
use_onnx_subfunctions (bool, default=False) – Whether to enable ONNX subfunction optimization for supported modules. This can optimize the graph structure and improve compilation efficiency for complex models like the transformer.
- Returns:
Absolute path to the export directory containing all ONNX model files.
- Return type:
str
- Raises:
RuntimeError – If ONNX export fails for any module
OSError – If there are issues creating the export directory or writing files
ValueError – If module configurations are invalid
Example
>>> pipeline = QEffWanPipeline.from_pretrained("path/to/wan/model") >>> export_path = pipeline.export( ... export_dir="/path/to/export", ... use_onnx_subfunctions=True ... )
- classmethod from_pretrained(pretrained_model_name_or_path: str | PathLike | None, **kwargs)[source]
Load a pretrained WAN model from HuggingFace Hub or local path and wrap it with QEfficient optimizations.
This class method provides a convenient way to instantiate a QEffWanPipeline from a pretrained WAN model. It automatically loads the base WanPipeline model in float32 precision on CPU and wraps all components with QEfficient-optimized versions for QAIC deployment.
- Parameters:
pretrained_model_name_or_path (str or os.PathLike) – Either a HuggingFace model identifier or a local path to a saved WAN model directory. Should contain transformer, transformer_2, text_encoder, and VAE components.
**kwargs – Additional keyword arguments passed to WanPipeline.from_pretrained().
- Returns:
- A fully initialized pipeline instance with QEfficient-optimized components
ready for export, compilation, and inference on QAIC devices.
- Return type:
- Raises:
ValueError – If the model path is invalid or model cannot be loaded
OSError – If there are issues accessing the model files
RuntimeError – If model initialization fails
Example
>>> # Load from HuggingFace Hub >>> pipeline = QEffWanPipeline.from_pretrained("path/to/wan/model") >>> >>> # Load from local path >>> pipeline = QEffWanPipeline.from_pretrained("/local/path/to/wan") >>> >>> # Load with custom cache directory >>> pipeline = QEffWanPipeline.from_pretrained( ... "wan-model-id", ... cache_dir="/custom/cache/dir" ... )
QEffFluxPipeline
- class QEfficient.diffusers.pipelines.flux.pipeline_flux.QEffFluxPipeline(model, *args, **kwargs)[source]
QEfficient-optimized Flux pipeline for high-performance text-to-image generation on Qualcomm AI hardware.
This pipeline provides an optimized implementation of the Flux diffusion model specifically designed for deployment on Qualcomm AI Cloud (QAIC) devices. It wraps the original HuggingFace Flux model components with QEfficient-optimized versions that can be exported to ONNX format and compiled into Qualcomm Program Container (QPC) files for efficient inference.
The pipeline supports the complete Flux workflow including: - Dual text encoding with CLIP and T5 encoders - Transformer-based denoising with adaptive layer normalization - VAE decoding for final image generation - Performance monitoring and optimization
- text_encoder
Optimized CLIP text encoder for pooled embeddings
- Type:
- text_encoder_2
Optimized T5 text encoder for sequence embeddings
- Type:
- transformer
Optimized Flux transformer for denoising
- Type:
- modules
Dictionary of all pipeline modules for batch operations
- Type:
Dict[str, Any]
- model
Original HuggingFace Flux model reference
- Type:
FluxPipeline
- tokenizer
CLIP tokenizer for text preprocessing
- scheduler
Diffusion scheduler for timestep management
Example
>>> from QEfficient.diffusers.pipelines.flux import QEffFluxPipeline >>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell") >>> images = pipeline( ... prompt="A beautiful sunset over mountains", ... height=512, ... width=512, ... num_inference_steps=28 ... ) >>> images.images[0].save("generated_image.png")
- compile(compile_config: str | None = None, parallel: bool = False, height: int = 512, width: int = 512, use_onnx_subfunctions: bool = False) None[source]
Compile ONNX models into optimized QPC format for deployment on Qualcomm AI hardware.
- Parameters:
compile_config (str, optional) – Path to a JSON configuration file containing compilation settings, device mappings, and optimization parameters. If None, uses the default configuration from get_default_config_path().
parallel (bool, default=False) – Compilation mode selection: - True: Compile modules in parallel using ThreadPoolExecutor for faster processing - False: Compile modules sequentially for lower resource usage
height (int, default=512) – Target image height in pixels.
width (int, default=512) – Target image width in pixels.
use_onnx_subfunctions (bool, default=False) – Whether to export models with ONNX subfunctions before compilation.
- Raises:
RuntimeError – If compilation fails for any module or if QAIC compiler is not available
FileNotFoundError – If ONNX models haven’t been exported or config file is missing
ValueError – If configuration parameters are invalid
OSError – If there are issues with file I/O during compilation
Example
>>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell") >>> # Sequential compilation with default config >>> pipeline.compile(height=1024, width=1024) >>> >>> # Parallel compilation with custom config >>> pipeline.compile( ... compile_config="/path/to/custom_config.json", ... parallel=True, ... height=512, ... width=512 ... )
- encode_prompt(prompt: str | List[str], prompt_2: str | List[str] | None = None, num_images_per_prompt: int = 1, prompt_embeds: FloatTensor | None = None, pooled_prompt_embeds: FloatTensor | None = None, max_sequence_length: int = 512)[source]
Encode text prompts using Flux’s dual text encoder architecture.
Flux employs both CLIP and T5 encoders for comprehensive text understanding: - CLIP provides pooled embeddings for global semantic conditioning - T5 provides detailed sequence embeddings for fine-grained text control
- Parameters:
prompt (str or List[str]) – Primary prompt(s) for both encoders
prompt_2 (str or List[str], optional) – Secondary prompt(s) for T5. If None, uses primary prompt
num_images_per_prompt (int) – Number of images to generate per prompt
prompt_embeds (torch.FloatTensor, optional) – Pre-computed T5 embeddings
pooled_prompt_embeds (torch.FloatTensor, optional) – Pre-computed CLIP pooled embeddings
max_sequence_length (int) – Maximum sequence length for T5 tokenization
- Returns:
- (prompt_embeds, pooled_prompt_embeds, text_ids, encoder_perf_times)
prompt_embeds (torch.Tensor): T5 sequence embeddings [batch*num_images, seq_len, 4096]
pooled_prompt_embeds (torch.Tensor): CLIP pooled embeddings [batch*num_images, 768]
text_ids (torch.Tensor): Position IDs for text tokens [seq_len, 3]
encoder_perf_times (List[float]): Performance times [CLIP_time, T5_time]
- Return type:
tuple
- export(export_dir: str | None = None, use_onnx_subfunctions: bool = False) str[source]
Export all pipeline modules to ONNX format for deployment preparation.
This method systematically exports each pipeline component (CLIP text encoder, T5 text encoder, Flux transformer, and VAE decoder) to ONNX format. Each module is exported with its specific configuration including dynamic axes, input/output specifications, and optimization settings.
The export process prepares the models for subsequent compilation to QPC format, enabling efficient inference on QAIC hardware. ONNX subfunctions can be used for certain modules to optimize memory usage and performance.
- Parameters:
export_dir (str, optional) – Target directory for saving ONNX model files. If None, uses the default export directory structure based on model name and configuration. The directory will be created if it doesn’t exist.
use_onnx_subfunctions (bool, default=False) – Whether to enable ONNX subfunction optimization for supported modules. This can optimize thegraph and improve compilation efficiency for models like the transformer.
- Returns:
- Absolute path to the export directory containing all ONNX model files.
Each module will have its own subdirectory with the exported ONNX file.
- Return type:
str
- Raises:
RuntimeError – If ONNX export fails for any module
OSError – If there are issues creating the export directory or writing files
ValueError – If module configurations are invalid
Note
All models are exported in float32 precision for maximum compatibility
Dynamic axes are configured to support variable batch sizes and sequence lengths
The export process may take several minutes depending on model size
Exported ONNX files can be large (several GB for complete pipeline)
Example
>>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell") >>> export_path = pipeline.export( ... export_dir="/path/to/export", ... use_onnx_subfunctions=True ... ) >>> print(f"Models exported to: {export_path}")
- classmethod from_pretrained(pretrained_model_name_or_path: str | PathLike | None, **kwargs)[source]
Load a pretrained Flux model from HuggingFace Hub or local path and wrap it with QEfficient optimizations.
This class method provides a convenient way to instantiate a QEffFluxPipeline from a pretrained Flux model. It automatically loads the base FluxPipeline model in float32 precision on CPU and wraps all components with QEfficient-optimized versions for QAIC deployment.
- Parameters:
pretrained_model_name_or_path (str or os.PathLike) – Either a HuggingFace model identifier (e.g., “black-forest-labs/FLUX.1-schnell”) or a local path to a saved model directory.
**kwargs – Additional keyword arguments passed to FluxPipeline.from_pretrained().
- Returns:
- A fully initialized pipeline instance with QEfficient-optimized components
ready for export, compilation, and inference on QAIC devices.
- Return type:
- Raises:
ValueError – If the model path is invalid or model cannot be loaded
OSError – If there are issues accessing the model files
RuntimeError – If model initialization fails
Example
>>> # Load from HuggingFace Hub >>> pipeline = QEffFluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell") >>> >>> # Load from local path >>> pipeline = QEffFluxPipeline.from_pretrained("/path/to/local/flux/model") >>> >>> # Load with custom cache directory >>> pipeline = QEffFluxPipeline.from_pretrained( ... "black-forest-labs/FLUX.1-dev", ... cache_dir="/custom/cache/dir" ... )