Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback

Ray2333

Clasificación de texto

El modelo de recompensa ajusta finamente mistralai/Mistral-7B-Instruct-v0.2 en el conjunto de datos 'llm-blender/Unified-Feedback'. Este modelo logra una precisión de 0.7740 en los conjuntos de prueba, lo que lo convierte en un buen modelo proxy para modelar las preferencias humanas y puede ser utilizado para alinear LLMs. El conjunto de datos Unified-Feedback contiene datos de preferencia diversos de conjuntos de datos de código abierto previos, incluyendo: openai/summarize_from_feedback, openai/webgpt_comparisons, Dahoas/instruct-synthetic-prompt-responses, Anthropic/hh-rlhf, lmsys/chatbot_arena_conversations, openbmb/UltraFeedback, argilla/ultrafeedback-binarized-preferences-cleaned y berkeley-nest/Nectar.

Como usar

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# cargar el modelo y el tokenizador
tokenizer = AutoTokenizer.from_pretrained('Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback')
reward_model = AutoModelForSequenceClassification.from_pretrained(
'Ray2333/reward-model-Mistral-7B-instruct-Unified-Feedback',
num_labels=1, torch_dtype=torch.float16,
device_map=0,
)
message = [
{'role': 'user', 'content': "I'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her?"},
{'role': 'assistant', 'content': "Sorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter?"}
]
message_template = tokenizer.apply_chat_template(message, tokenize=False)
# se verá así: " [INST] I'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her? [/INST]Sorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter?"

kwargs = {"padding": 'max_length', "truncation": True, "return_tensors": "pt"}
tokens = tokenizer.encode_plus(message_template, **kwargs)

with torch.no_grad():
reward_tensor = model(tokens["input_ids"][0].to(model.device), attention_mask=tokens["attention_mask"][0].to(model.device)).logits.reshape(-1)
reward = reward_tensor.cpu().detach().item()

Funcionalidades

Clasificación de texto
Basado en la librería Transformers
Compatibilidad con Safetensors
Entrenado con el conjunto de datos llm-blender/Unified-Feedback
Compatible con AutoTrain
Compatible con inferencia de generación de texto
Compatible con endpoints de inferencia
Licencia MIT
Región: US

Casos de uso

Modelar preferencias humanas en LLMs
Alinear modelos de lenguaje a gran escala (LLMs)