finetrainers/pika-dissolve-v0

finetrainers
Texto a video

Fine-tune de CogVideoX-5B entrenado sobre el dataset modal-labs/dissolve para generar videos de texto a video con el efecto PIKA_DISSOLVE: objetos que se desintegran o se disuelven en partículas, polvo o fibras con movimiento ascendente y descenso gradual.

Como usar

Instalación y uso directo con Diffusers:

pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("finetrainers/pika-dissolve-v0", dtype=torch.bfloat16, device_map="cuda")

prompt = "PIKA_DISSOLVE A meticulously detailed, tea cup, sits centrally on a dark brown circular pedestal. The cup, seemingly made of clay, begins to dissolve from the bottom up. The disintegration process is rapid but not explosive, with a cloud of fine, light tan dust forming and rising in a swirling, almost ethereal column that expands outwards before slowly descending. The dust particles are individually visible as they float, and the overall effect is one of delicate disintegration rather than shattering. Finally, only the empty pedestal and the intricately patterned marble floor remain."
image = pipe(prompt).images[0]

Código de inferencia alternativo usando el transformer fine-tuned con CogVideoX-5b base:

from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline
from diffusers.utils import export_to_video
import torch

transformer = CogVideoXTransformer3DModel.from_pretrained(
    "sayakpaul/pika-dissolve-v0", torch_dtype=torch.bfloat16
)

pipeline = DiffusionPipeline.from_pretrained(
    "THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")

prompt = """
PIKA_DISSOLVE A slender glass vase, brimming with tiny white pebbles, stands centered on a polished ebony dais. Without warning, the glass begins to dissolve from the edges inward. Wisps of translucent dust swirl upward in an elegant spiral, illuminating each pebble as they drop onto the dais. The gently drifting dust eventually settles, leaving only the scattered stones and faint traces of shimmering powder on the stage.
"""

negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"

video = pipeline(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_frames=81,
    height=512,
    width=768,
    num_inference_steps=50
).frames[0]

export_to_video(video, "output_vase.mp4", fps=25)

Funcionalidades

Generación texto-a-video basada en Diffusers y CogVideoX.
Especializado en escenas de disolución visual de objetos, como tazas, máscaras, jarrones o figuras de papel.
Usa el token de estilo PIKA_DISSOLVE al inicio del prompt para activar el efecto aprendido.
Entrenado como fine-tune de zai-org/CogVideoX-5b / THUDM/CogVideoX-5b sobre el dataset modal-labs/dissolve.
Distribuido en formato Safetensors y orientado a flujos con Diffusers.

Casos de uso

Crear clips de objetos que se disuelven en polvo, humo, fibras o partículas finas.
Producir efectos visuales tipo Pika para demostraciones creativas, prototipos de VFX o contenido social.
Generar variaciones de escenas donde un objeto desaparece dejando restos, siluetas o partículas suspendidas.
Experimentar con fine-tunes de CogVideoX para efectos de video estilizados.