a-r-r-o-w/LTX-Video-0.9.1-diffusers

a-r-r-o-w
Texto a video

Pesos no oficiales en formato Diffusers para LTX-Video 0.9.1 de Lightricks. El modelo permite generar video a partir de texto y también animar una imagen inicial mediante prompts, usando pipelines de Diffusers como LTXPipeline y LTXImageToVideoPipeline.

Como usar

Instalación básica:

pip install -U diffusers transformers accelerate

Uso básico con DiffusionPipeline:

import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained(
    "a-r-r-o-w/LTX-Video-0.9.1-diffusers",
    dtype=torch.bfloat16,
    device_map="cuda"
)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

Texto a video:

import torch
from diffusers import LTXPipeline
from diffusers.utils import export_to_video

pipe = LTXPipeline.from_pretrained(
    "a-r-r-o-w/LTX-Video-0.9.1-diffusers",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
    decode_timestep=0.03,
    decode_noise_scale=0.025,
).frames[0]

export_to_video(video, "output.mp4", fps=24)

Imagen a video:

import torch
from diffusers import LTXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

pipe = LTXImageToVideoPipeline.from_pretrained(
    "a-r-r-o-w/LTX-Video-0.9.1-diffusers",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

image = load_image(
    "https://huggingface.co/datasets/a-r-r-o-w/tiny-meme-dataset-captioned/resolve/main/images/8.png"
)

prompt = "A young girl stands calmly in the foreground, looking directly at the camera, as a house fire rages in the background. Flames engulf the structure, with smoke billowing into the air. Firefighters in protective gear rush to the scene, a fire truck labeled '38' visible behind them. The girl's neutral expression contrasts sharply with the chaos of the fire, creating a poignant and emotionally charged scene."
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

video = pipe(
    image=image,
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=704,
    height=480,
    num_frames=161,
    num_inference_steps=50,
    decode_timestep=0.03,
    decode_noise_scale=0.025,
).frames[0]

export_to_video(video, "output.mp4", fps=24)

Funcionalidades

Generación de video desde texto con Diffusers.
Generación de video desde imagen y texto mediante LTXImageToVideoPipeline.
Pesos en formato Safetensors.
Compatible con ejecución local en CUDA usando bfloat16.
Exportación de resultados a MP4 con export_to_video.
No está desplegado actualmente en proveedores de inferencia de Hugging Face.

Casos de uso

Crear clips de video cortos a partir de descripciones textuales detalladas.
Animar una imagen inicial siguiendo una escena descrita por prompt.
Probar localmente pesos de LTX-Video 0.9.1 dentro del ecosistema Diffusers.
Generar prototipos visuales o secuencias de prueba exportadas como MP4.