AIDC-AI/Ovis-Image-7B

AIDC-AI

Texto a imagen

Ovis-Image-7B es un modelo de texto a imagen de 7B parámetros optimizado para renderizado de texto de alta calidad. Está diseñado para generar imágenes con tipografía legible, texto correctamente escrito y buena alineación entre el contenido lingüístico y el diseño visual, incluso en prompts con carteles, banners, logotipos, interfaces e infografías. Se basa en Ovis-U1 y busca ofrecer rendimiento cercano a modelos mucho más grandes bajo restricciones computacionales más moderadas.

Como usar

Instalación y uso básico con Diffusers:
pip install -U diffusers transformers accelerate

import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained(
    "AIDC-AI/Ovis-Image-7B",
    dtype=torch.bfloat16,
    device_map="cuda"
)

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

Uso recomendado con OvisImagePipeline:
pip install diffusers>=0.36.0

import torch
from diffusers import OvisImagePipeline

pipe = OvisImagePipeline.from_pretrained(
    "AIDC-AI/Ovis-Image-7B",
    torch_dtype=torch.bfloat16
)
pipe.to("cuda")

prompt = "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail."

image = pipe(
    prompt,
    negative_prompt="",
    num_inference_steps=50,
    guidance_scale=5.0
).images[0]
image.save("ovis_image.png")

Instalación para inferencia con PyTorch:
git clone [email protected]:AIDC-AI/Ovis-Image.git
conda create -n ovis-image python=3.10 -y
conda activate ovis-image
cd Ovis-Image
pip install -r requirements.txt
pip install -e .

Ejecución de texto a imagen:
python ovis_image/test.py \
  --model_path AIDC-AI/Ovis-Image-7B/ovis_image.safetensors \
  --vae_path AIDC-AI/Ovis-Image-7B/ae.safetensors \
  --ovis_path AIDC-AI/Ovis-Image-7B/Ovis2.5-2B \
  --image_size 1024 \
  --denoising_steps 50 \
  --cfg_scale 5.0 \
  --prompt "A creative 3D artistic render where the text \"OVIS-IMAGE\" is written in a bold, expressive handwritten brush style using thick, wet oil paint. The paint is a mix of vibrant rainbow colors (red, blue, yellow) swirling together like toothpaste or impasto art. You can see the ridges of the brush bristles and the glossy, wet texture of the paint. The background is a clean artist's canvas. Dynamic lighting creates soft shadows behind the floating paint strokes. Colorful, expressive, tactile texture, 4k detail."

Funcionalidades

Generación de imágenes a partir de texto con énfasis en texto visible dentro de la imagen.
Renderizado fuerte de tipografía en escenarios con varias regiones de texto, textos largos y diseños sensibles al layout.
Modelo compacto de 7B que puede ejecutarse en una GPU de gama alta con memoria moderada.
Compatible con Diffusers mediante OvisImagePipeline y pesos Safetensors.
Soporta prompts en inglés y chino, con resultados destacados en benchmarks LongText-Bench EN y ZN.
Licencia Apache-2.0.
Evaluado en CVTG-2K, LongText-Bench, DPG-Bench, GenEval, OneIG-EN y OneIG-ZN.

Casos de uso

Crear carteles, banners y piezas gráficas con texto integrado y legible.
Generar logotipos o composiciones tipográficas donde la ortografía del texto es importante.
Producir mockups de UI, infografías y layouts visuales con contenido textual.
Generación interactiva de imágenes en hardware más accesible que el requerido por modelos de decenas de miles de millones de parámetros.
Servir generación por lotes para aplicaciones que necesitan texto renderizado dentro de imágenes.