engineai/immensa_embeddings
engineai
Similitud de oraciones
Este es un modelo basado en transformers de oraciones afinado a partir de AI-Growth-Lab/PatentSBERTa. Mapea oraciones y párrafos a un espacio vectorial denso de 768 dimensiones y puede ser utilizado para similitud textual semántica, búsqueda semántica, minería de paráfrasis, clasificación de texto, agrupamiento y más.
Como usar
Primero instala la biblioteca Sentence Transformers:
pip install -U sentence-transformers
Luego puedes cargar este modelo y ejecutar la inferencia:
from sentence_transformers import SentenceTransformer
# Descargar del Hub de 🤗
model = SentenceTransformer('sentence_transformers_model_id')
# Ejecutar inferencia
sentences = [
'The air compressor device according to claim 4 with two opposite side wings formed on the pole body and extending between the front end and the flange of the pole body along the axis defining two opposite wide faces and two opposite narrow faces with a area smaller than that of the wide faces with two recessions respectively formed in the side wings adjacent to the front end of the pole body with the engaging ring mounted in the recessions with the engaging ring including front and rear flanges spaced along the axis and an annular engaging groove formed between the front and rear flanges with a buffer ring mounted in the engaging groove of the engaging ring and being in contact with the inner wall of the second end of the axial hole.',
'Accordingly the sealing and inflating assembly includes an air compressing device including an improved tire repairing container for quickly coupling and attaching and securing to an outlet tube of the air compressor and for quickly disengaging from the air compressor and for allowing the tire sealing preparation to be effectively supplied to seal and inflate the inflatable objects and for easily and quickly and changeably attaching and securing to the outlet tube of the air compressor.',
'Thus since the path delay from interface 102 to unit 2 is known to be 1 and interface 102 is on the same unit as interface 101 and the path delay between interface 101 and unit 0 is known 1 the path delay between interface 001 and unit 2 can be entered as 112.The path delay table then becomes as shown in Table 3 immediately below.Table 3Dest Unit IdUnit 0 1 2 3 40 IF 001 0 1 2 255 2550 IF 002 0 255 2 1 21 IF 101 1 0 255 2 2551 IF 102 255 0 1 2 22 IF 201 2 1 0 255 2552 IF 202 255 255 0 2 12 IF 203 2 255 0 1 23 IF 301 1 2 255 0 2553 IF 302 255 2 1 0 23 IF 303 255 255 2 0 14 IF 401 255 2 1 2 04 IF 402 2 255 2 1 0'
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Obtener las puntuaciones de similitud para las incrustaciones
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Funcionalidades
- Modelo afinado a partir de AI-Growth-Lab/PatentSBERTa
- Longitud máxima de secuencia: 512 tokens
- Dimensionalidad de salida: 768 tokens
- Función de similitud: Similitud del coseno
- Transformers de oraciones
- Safetensors
- mpnet
- Extracción de características
- Generado a partir de Trainer
- Tamaño del conjunto de datos: 2545432
- Pérdida: MultipleNegativesRankingLoss
- Evaluación de resultados en comparación con el estado del arte
Casos de uso
- Similitud textual semántica
- Búsqueda semántica
- Minería de paráfrasis
- Clasificación de texto
- Agrupamiento de oraciones y párrafos