Oblix/yolov10b-doclaynet_ONNX_document-layout-analysis

Oblix

Detección de objetos

Análisis de diseño de documentos utilizando Yolov10b, implementado en ONNX. Este modelo se especializa en la detección de objetos en documentos, utilizando Transformers.js para realizar el análisis de diseño de documentos. Incluye pesos ONNX y ejemplos de uso para facilitar la implementación.

Como usar

Si aún no lo has hecho, puedes instalar la biblioteca de JavaScript Transformers.js desde NPM utilizando:
npm i @huggingface/transformers

Ejemplo: Realiza detección de objetos con Oblix/yolov10b-doclaynet_ONNX_document-layout-analysis.
const model = await AutoModel.from_pretrained(
  "Oblix/yolov10b-doclaynet_ONNX_document-layout-analysis",
  {
    dtype: "fp32"
  }
);
const processor = await AutoProcessor.from_pretrained(
  "Oblix/yolov10b-doclaynet_ONNX_document-layout-analysis");
const url =
  "https://huggingface.co/DILHTWD/documentlayoutsegmentation_YOLOv8_ondoclaynet/resolve/main/sample1.png";
const image = await RawImage.read(url);
const { pixel_values, reshaped_input_sizes } = await processor(image);

// Realiza detección de objetos
const { output0 } = await model({ images: pixel_values });
const predictions = output0.tolist()[0];

const threshold = 0.35;
const [newHeight, newWidth] = reshaped_input_sizes[0]; // Altura y ancho redimensionados
const [xs, ys] = [image.width / newWidth, image.height / newHeight]; // Escalas de redimensionamiento x e y
for (const [xmin, ymin, xmax, ymax, score, id] of predictions) {
  if (score < threshold) continue;

  // Convertir a coordenadas originales de la imagen
  const bbox = [xmin * xs, ymin * ys, xmax * xs, ymax * ys]
    .map((x) => x.toFixed(2))
    .join(", ");
  console.log(
    // eslint-disable-next-line @typescript-eslint/no-explicit-any
    `Found "${(model.config as any).id2label[id]}" at [${bbox}] with score ${score.toFixed(
      2
    )}.`
  );
}

Result
Found "Text" at [53.75, 478.56, 623.46, 562.13] with score 0.98.
Found "Text" at [54.20, 593.64, 609.42, 637.15] with score 0.98.
Found "Text" at [53.98, 715.41, 621.06, 759.33] with score 0.98.
Found "Text" at [53.98, 247.44, 610.82, 277.49] with score 0.97.
Found "Title" at [53.64, 75.40, 551.96, 159.72] with score 0.97.
Found "List-item" at [55.56, 761.62, 607.48, 792.06] with score 0.97.
Found "List-item" at [56.05, 657.97, 614.57, 701.79] with score 0.97.
Found "Text" at [54.10, 195.40, 221.43, 211.88] with score 0.96.
Found "Text" at [54.25, 169.14, 95.17, 186.22] with score 0.95.
Found "Text" at [54.15, 222.11, 98.62, 237.74] with score 0.95.
Found "Text" at [53.73, 429.63, 412.82, 446.28] with score 0.95.
Found "Page-header" at [308.98, 10.07, 605.53, 34.59] with score 0.95.
Found "Section-header" at [54.18, 338.87, 102.68, 355.16] with score 0.95.
Found "List-item" at [55.75, 793.91, 519.29, 810.43] with score 0.95.
Found "Section-header" at [54.20, 453.01, 145.02, 469.42] with score 0.94.
Found "Text" at [56.76, 309.85, 316.43, 325.71] with score 0.93.
Found "List-item" at [55.62, 812.37, 445.03, 829.42] with score 0.92.
Found "Page-footer" at [308.43, 907.93, 374.03, 922.28] with score 0.92.
Found "Section-header" at [53.70, 567.21, 75.24, 584.85] with score 0.91.
Found "Text" at [56.26, 289.47, 415.46, 306.48] with score 0.80.
Found "Text" at [54.11, 365.35, 623.46, 407.97] with score 0.79.
Found "List-item" at [55.77, 638.84, 382.47, 655.46] with score 0.60.

Funcionalidades

Detección de objetos
Implementación en ONNX
Compatibilidad con Transformers.js

Casos de uso

Análisis de diseño de documentos
Segmentación de diseño documental
Detección de objetos en documentos