Jarbas/all-MiniLM-L6-v2-Q4_K_M-GGUF

Jarbas

Similitud de oraciones

Este modelo fue convertido al formato GGUF desde sentence-transformers/all-MiniLM-L6-v2 utilizando llama.cpp a través del espacio GGUF-my-repo de ggml.ai. Consulte la tarjeta de modelo original para más detalles sobre el modelo.

Como usar

Uso con llama.cpp
Instalación
Instale llama.cpp a través de brew (funciona en Mac y Linux)
brew install llama.cpp

CLI:
Ejecute llama.cpp desde la línea de comandos:
llama --hf-repo Jarbas/all-MiniLM-L6-v2-Q4_K_M-GGUF --hf-file all-minilm-l6-v2-q4_k_m.gguf -p "The meaning to life and the universe is"

Servidor:
Ejecute llama-server:
llama-server --hf-repo Jarbas/all-MiniLM-L6-v2-Q4_K_M-GGUF --hf-file all-minilm-l6-v2-q4_k_m.gguf -c 2048

Pasos Alternativos:

Clone el repositorio llama.cpp desde GitHub.

git clone https://github.com/ggerganov/llama.cpp


Muévase al directorio llama.cpp y construya con la bandera LLAMA_CURL=1 junto con otras banderas específicas de hardware (por ejemplo: LLAMA_CUDA=1 para GPUs Nvidia en Linux).

cd llama.cpp && LLAMA_CURL=1 make


Ejecute la inferencia a través del binario principal.

./main --hf-repo Jarbas/all-MiniLM-L6-v2-Q4_K_M-GGUF --hf-file all-minilm-l6-v2-q4_k_m.gguf -p "The meaning to life and the universe is"

O
./server --hf-repo Jarbas/all-MiniLM-L6-v2-Q4_K_M-GGUF --hf-file all-minilm-l6-v2-q4_k_m.gguf -c 2048

Funcionalidades

Transformadores GGUF
Arquitectura BERT
Cuantizado a 4 bits Q4_K_M

Casos de uso

Detección de similitud de oraciones
Extracción de características