baconnier/Finance2_embedding_small_en-V1.5

baconnier
Similitud de oraciones

Este es un modelo de sentence-transformers afinado a partir de BAAI/bge-small-en-v1.5 en el conjunto de datos baconnier/finance_dataset_small_private. Mapea oraciones y párrafos a un espacio vectorial denso de 384 dimensiones y puede ser utilizado para similitud textual semántica, búsqueda semántica, minería de paráfrasis, clasificación de texto, agrupamiento y más.

Como usar

Uso directo (Transformadores de Oraciones)

# Primero instalar la librería de Sentence Transformers:
pip install -U sentence-transformers

# Luego puedes cargar este modelo y ejecutar inferencia.
from sentence_transformers import SentenceTransformer

# Descargar desde el 🤗 Hub
model = SentenceTransformer("baconnier/Finance2_embedding_small_en-V1.5")
# Ejecutar inferencia
sentences = [
    'What is industrial production, and how is it measured by the Federal Reserve Board?',
    'Industrial production is a statistic determined by the Federal Reserve Board that measures the total output of all US factories and mines on a monthly basis. The Fed collects data from various government agencies and trade associations to calculate the industrial production index, which serves as an important economic indicator, providing insight into the health of the manufacturing and mining sectors.\nIndustrial production is a monthly statistic calculated by the Federal Reserve Board, measuring the total output of US factories and mines using data from government agencies and trade associations, serving as a key economic indicator for the manufacturing and mining sectors.',
    'Industrial production is a statistic that measures the output of factories and mines in the US. It is released by the Federal Reserve Board every quarter.\nIndustrial production measures factory and mine output, released quarterly by the Fed.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Obtener las puntuaciones de similitud para los embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Funcionalidades

Modelo de Transformador de Oraciones
Modelo base: BAAI/bge-small-en-v1.5
Longitud máxima de secuencia: 512 tokens
Dimensionalidad de salida: 384 tokens
Función de similitud: Similitud por coseno

Casos de uso

Similitud textual semántica
Búsqueda semántica
Minería de paráfrasis
Clasificación de texto
Agrupamiento