feeday/Z-Image-Turbo-GGUF

feeday

Texto a imagen

Versiones cuantizadas en formato GGUF de Z-Image Turbo de Tongyi-MAI, un modelo de generación de imágenes a partir de texto. Este repositorio ofrece variantes de baja precisión para ejecutar el modelo con herramientas compatibles con GGUF, como ComfyUI-GGUF y Diffusers, reduciendo el tamaño de descarga y los requisitos de memoria frente al modelo base completo.

Como usar

Instalar Diffusers desde GitHub y cargar el transformador GGUF local dentro de ZImagePipeline:
pip install git+https://github.com/huggingface/diffusers

from diffusers import ZImagePipeline, ZImageTransformer2DModel, GGUFQuantizationConfig
import torch

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."
height = 1024
width = 1024
seed = 42

#hf_path = "https://huggingface.co/jayn7/Z-Image-Turbo-GGUF/blob/main/z_image_turbo-Q3_K_M.gguf"
local_path = "path\to\local\model\z_image_turbo-Q3_K_M.gguf"

transformer = ZImageTransformer2DModel.from_single_file(
    local_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    dtype=torch.bfloat16,
)

pipeline = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    transformer=transformer,
    dtype=torch.bfloat16,
).to("cuda")

# [Optional] Attention Backend
# Diffusers uses SDPA by default. Switch to Custom attention backend for better efficiency if supported:
#pipeline.transformer.set_attention_backend("_sage_qk_int8_pv_fp16_triton") # Enable Sage Attention
#pipeline.transformer.set_attention_backend("flash") # Enable Flash-Attention-2
#pipeline.transformer.set_attention_backend("_flash_3") # Enable Flash-Attention-3

# [Optional] Model Compilation
# Compiling the DiT model accelerates inference, but the first run will take longer to compile.
#pipeline.transformer.compile()

# [Optional] CPU Offloading
# Enable CPU offloading for memory-constrained devices.
#pipeline.enable_model_cpu_offload()

images = pipeline(
    prompt=prompt,
    num_inference_steps=9, # This actually results in 8 DiT forwards
    guidance_scale=0.0, # Guidance should be 0 for the Turbo models
    height=height,
    width=width,
    generator=torch.Generator("cuda").manual_seed(seed)
).images[0]

images.save("zimage.png")

Para ComfyUI, colocar los archivos GGUF en una estructura como:
ComfyUI/
├── models/
│   ├── text_encoders/
│   │   └── qwen_3_4b-Q*.gguf
│   ├── diffusion_models/
│   │   └── z_image_turbo-Q*.gguf
│   └── vae/
│       └── ae.safetensors

Funcionalidades

Generación de imágenes desde prompts de texto mediante Z-Image Turbo.
Pesos cuantizados GGUF disponibles en Q3, Q4, Q5, Q6 y Q8.
Compatible con ComfyUI-GGUF y con Diffusers mediante carga desde archivo GGUF local.
Basado en el modelo Tongyi-MAI/Z-Image-Turbo.
Incluye referencia al codificador de texto Qwen3-4B en formato GGUF.
Licencia Apache 2.0, heredada del modelo Z-Image Turbo.

Casos de uso

Generar imágenes de alta resolución desde descripciones textuales usando Z-Image Turbo.
Ejecutar Z-Image Turbo en entornos con menos memoria mediante cuantización GGUF.
Integrar el modelo en flujos de trabajo de ComfyUI para generación visual local.
Probar inferencia con Diffusers cargando pesos GGUF desde un archivo local.
Comparar calidades y tamaños entre cuantizaciones Q3, Q4, Q5, Q6 y Q8.