finetrainers/3dgs-v0
finetrainers
Texto a video
Fine-tune experimental de THUDM/CogVideoX-5b para generación de video texto-a-video con un efecto visual “3D_dissolve”: objetos o personajes con apariencia 3D rodeados por chispas rojas o partículas ardientes que se disuelven sobre fondos oscuros. Fue entrenado con el dataset finetrainers/3dgs-dissolve y se advierte que su generalización es limitada.
Como usar
Instalación y uso básico con Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("finetrainers/3dgs-v0", dtype=torch.bfloat16, device_map="cuda")
prompt = "3D_dissolve A small tiger character in a colorful winter outfit appears in a 3D appearance, surrounded by a dynamic burst of red sparks. The sparks swirl around the penguin, creating a dramatic effect as they gradually evaporate into a burst of red sparks, leaving behind a stark black background."
image = pipe(prompt).images[0]
Ejemplo de inferencia para generar video:
from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline
from diffusers.utils import export_to_video
import torch
transformer = CogVideoXTransformer3DModel.from_pretrained(
"finetrainers/3dgs-v0", torch_dtype=torch.bfloat16
)
pipeline = DiffusionPipeline.from_pretrained(
"THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16
).to("cuda")
prompt = """
3D_dissolve In a 3D appearance, a bookshelf filled with books is surrounded by a burst of red sparks, creating a dramatic and explosive effect against a black background.
"""
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs"
video = pipeline(
prompt=prompt,
negative_prompt=negative_prompt,
num_frames=81,
height=512,
width=768,
num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=25)
Uso de la LoRA extraída:
from diffusers import DiffusionPipeline
from diffusers.utils import export_to_video
import torch
pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda")
pipeline.load_lora_weights("/fsx/sayak/finetrainers/cogvideox-crush/extracted_crush_smol_lora_64.safetensors", adapter_name="crush")
pipeline.load_lora_weights("/fsx/sayak/finetrainers/cogvideox-3dgs/extracted_3dgs_lora_64.safetensors", adapter_name="3dgs")
prompts = [
"""In a 3D appearance, a small bicycle is seen surrounded by a burst of fiery sparks, creating a dramatic and intense visual effect against the dark background. The video showcases a dynamic explosion of fiery particles in a 3D appearance, with sparks and embers scattering across the screen against a stark black background.""",
"""In a 3D appearance, a bookshelf filled with books is surrounded by a burst of red sparks, creating a dramatic and explosive effect against a black background.""",
]
negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs, bad physique"
id_token = "3D_dissolve"
for i, prompt in enumerate(prompts):
video = pipeline(
prompt=f"{id_token} {prompt}",
negative_prompt=negative_prompt,
num_frames=81,
height=512,
width=768,
num_inference_steps=50,
generator=torch.manual_seed(0)
).frames[0]
export_to_video(video, f"output_{i}.mp4", fps=25)
Funcionalidades
- Modelo de texto a video basado en CogVideoX-5b.
- Especializado en escenas con apariencia 3D, chispas rojas, explosiones de partículas y efecto de disolución.
- Incluye checkpoint completo y una variante LoRA de rango 64 para emular el mismo efecto.
- Compatible con Diffusers y pesos Safetensors.
- Usa el token de activación `3D_dissolve` en los prompts.
- Checkpoint experimental con baja generalización conocida.
Casos de uso
- Generar clips cortos con objetos 3D que se disuelven en ráfagas de chispas rojas.
- Crear efectos visuales tipo Pika inspirados en transformación, disolución y partículas sobre fondo oscuro.
- Probar fine-tunes de CogVideoX para estilos de video específicos.
- Usar una LoRA para aplicar el efecto 3D_dissolve sobre CogVideoX-5b sin cargar el checkpoint completo.