Oblix/yolov10b-doclaynet_ONNX_document-layout-analysis
Oblix
Detección de objetos
Análisis de diseño de documentos utilizando Yolov10b, implementado en ONNX. Este modelo se especializa en la detección de objetos en documentos, utilizando Transformers.js para realizar el análisis de diseño de documentos. Incluye pesos ONNX y ejemplos de uso para facilitar la implementación.
Como usar
Si aún no lo has hecho, puedes instalar la biblioteca de JavaScript Transformers.js desde NPM utilizando:
npm i @huggingface/transformers
Ejemplo: Realiza detección de objetos con Oblix/yolov10b-doclaynet_ONNX_document-layout-analysis.
const model = await AutoModel.from_pretrained(
"Oblix/yolov10b-doclaynet_ONNX_document-layout-analysis",
{
dtype: "fp32"
}
);
const processor = await AutoProcessor.from_pretrained(
"Oblix/yolov10b-doclaynet_ONNX_document-layout-analysis");
const url =
"https://huggingface.co/DILHTWD/documentlayoutsegmentation_YOLOv8_ondoclaynet/resolve/main/sample1.png";
const image = await RawImage.read(url);
const { pixel_values, reshaped_input_sizes } = await processor(image);
// Realiza detección de objetos
const { output0 } = await model({ images: pixel_values });
const predictions = output0.tolist()[0];
const threshold = 0.35;
const [newHeight, newWidth] = reshaped_input_sizes[0]; // Altura y ancho redimensionados
const [xs, ys] = [image.width / newWidth, image.height / newHeight]; // Escalas de redimensionamiento x e y
for (const [xmin, ymin, xmax, ymax, score, id] of predictions) {
if (score < threshold) continue;
// Convertir a coordenadas originales de la imagen
const bbox = [xmin * xs, ymin * ys, xmax * xs, ymax * ys]
.map((x) => x.toFixed(2))
.join(", ");
console.log(
// eslint-disable-next-line @typescript-eslint/no-explicit-any
`Found "${(model.config as any).id2label[id]}" at [${bbox}] with score ${score.toFixed(
2
)}.`
);
}
Result
Found "Text" at [53.75, 478.56, 623.46, 562.13] with score 0.98.
Found "Text" at [54.20, 593.64, 609.42, 637.15] with score 0.98.
Found "Text" at [53.98, 715.41, 621.06, 759.33] with score 0.98.
Found "Text" at [53.98, 247.44, 610.82, 277.49] with score 0.97.
Found "Title" at [53.64, 75.40, 551.96, 159.72] with score 0.97.
Found "List-item" at [55.56, 761.62, 607.48, 792.06] with score 0.97.
Found "List-item" at [56.05, 657.97, 614.57, 701.79] with score 0.97.
Found "Text" at [54.10, 195.40, 221.43, 211.88] with score 0.96.
Found "Text" at [54.25, 169.14, 95.17, 186.22] with score 0.95.
Found "Text" at [54.15, 222.11, 98.62, 237.74] with score 0.95.
Found "Text" at [53.73, 429.63, 412.82, 446.28] with score 0.95.
Found "Page-header" at [308.98, 10.07, 605.53, 34.59] with score 0.95.
Found "Section-header" at [54.18, 338.87, 102.68, 355.16] with score 0.95.
Found "List-item" at [55.75, 793.91, 519.29, 810.43] with score 0.95.
Found "Section-header" at [54.20, 453.01, 145.02, 469.42] with score 0.94.
Found "Text" at [56.76, 309.85, 316.43, 325.71] with score 0.93.
Found "List-item" at [55.62, 812.37, 445.03, 829.42] with score 0.92.
Found "Page-footer" at [308.43, 907.93, 374.03, 922.28] with score 0.92.
Found "Section-header" at [53.70, 567.21, 75.24, 584.85] with score 0.91.
Found "Text" at [56.26, 289.47, 415.46, 306.48] with score 0.80.
Found "Text" at [54.11, 365.35, 623.46, 407.97] with score 0.79.
Found "List-item" at [55.77, 638.84, 382.47, 655.46] with score 0.60.
Funcionalidades
- Detección de objetos
- Implementación en ONNX
- Compatibilidad con Transformers.js
Casos de uso
- Análisis de diseño de documentos
- Segmentación de diseño documental
- Detección de objetos en documentos