engineai/immensa_embeddings

engineai
Similitud de oraciones

Este es un modelo basado en transformers de oraciones afinado a partir de AI-Growth-Lab/PatentSBERTa. Mapea oraciones y párrafos a un espacio vectorial denso de 768 dimensiones y puede ser utilizado para similitud textual semántica, búsqueda semántica, minería de paráfrasis, clasificación de texto, agrupamiento y más.

Como usar

Primero instala la biblioteca Sentence Transformers:

pip install -U sentence-transformers

Luego puedes cargar este modelo y ejecutar la inferencia:

from sentence_transformers import SentenceTransformer

# Descargar del Hub de 🤗
model = SentenceTransformer('sentence_transformers_model_id')

# Ejecutar inferencia
sentences = [
    'The air compressor device according to claim 4 with two opposite side wings formed on the pole body and extending between the front end and the flange of the pole body along the axis defining two opposite wide faces and two opposite narrow faces with a area smaller than that of the wide faces with two recessions respectively formed in the side wings adjacent to the front end of the pole body with the engaging ring mounted in the recessions with the engaging ring including front and rear flanges spaced along the axis and an annular engaging groove formed between the front and rear flanges with a buffer ring mounted in the engaging groove of the engaging ring and being in contact with the inner wall of the second end of the axial hole.',
    'Accordingly the sealing and inflating assembly includes an air compressing device including an improved tire repairing container for quickly coupling and attaching and securing to an outlet tube of the air compressor and for quickly disengaging from the air compressor and for allowing the tire sealing preparation to be effectively supplied to seal and inflate the inflatable objects and for easily and quickly and changeably attaching and securing to the outlet tube of the air compressor.',
    'Thus since the path delay from interface 102 to unit 2 is known to be 1 and interface 102 is on the same unit as interface 101 and the path delay between interface 101 and unit 0 is known 1 the path delay between interface 001 and unit 2 can be entered as 112.The path delay table then becomes as shown in Table 3 immediately below.Table 3Dest Unit IdUnit 0 1 2 3 40 IF 001 0 1 2 255 2550 IF 002 0 255 2 1 21 IF 101 1 0 255 2 2551 IF 102 255 0 1 2 22 IF 201 2 1 0 255 2552 IF 202 255 255 0 2 12 IF 203 2 255 0 1 23 IF 301 1 2 255 0 2553 IF 302 255 2 1 0 23 IF 303 255 255 2 0 14 IF 401 255 2 1 2 04 IF 402 2 255 2 1 0'
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Obtener las puntuaciones de similitud para las incrustaciones
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Funcionalidades

Modelo afinado a partir de AI-Growth-Lab/PatentSBERTa
Longitud máxima de secuencia: 512 tokens
Dimensionalidad de salida: 768 tokens
Función de similitud: Similitud del coseno
Transformers de oraciones
Safetensors
mpnet
Extracción de características
Generado a partir de Trainer
Tamaño del conjunto de datos: 2545432
Pérdida: MultipleNegativesRankingLoss
Evaluación de resultados en comparación con el estado del arte

Casos de uso

Similitud textual semántica
Búsqueda semántica
Minería de paráfrasis
Clasificación de texto
Agrupamiento de oraciones y párrafos