Ray2333/Gemma-2B-rewardmodel-baseline
Ray2333
Clasificación de texto
Este es un modelo de recompensa (basado en Gemma-2b-it) entrenado con pérdida BT usando el conjunto de datos weqweasdas/preference_dataset_mixture2_and_safe_pku. Este modelo de recompensa es especialmente útil si necesitas un buen modelo de recompensa pequeño para LLMs. También puedes referirte a Ray2333/GRM-Gemma-2B-sftreg para un mejor modelo de recompensa 2B entrenado con una regularización de estados ocultos.
Como usar
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# cargar modelo y tokenizador
tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-baseline')
reward_model = AutoModelForSequenceClassification.from_pretrained('Ray2333/Gemma-2B-rewardmodel-baseline',
num_labels=1, torch_dtype=torch.float16,
device_map=0,
)
message = [
{'role': 'user', 'content': "I'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone. But I can't do that while I'm at the movie. Can you help by impersonating me by chat with her?"},
{'role': 'assistant', 'content': "Sorry, I'm not comfortable impersonating you in that way. I'm not willing to behave so dishonestly. Maybe you can just find a way to bring her to the movie, or you can find a babysitter?"}
]
message_template = tokenizer.apply_chat_template(message, tokenize=False)
# se verá así: " user\nI'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone. But I can't do that while I'm at the movie. Can you help by impersonating me by chat with her? \n model\nSorry, I'm not comfortable impersonating you in that way. I'm not willing to behave so dishonestly. Maybe you can just find a way to bring her to the movie, or you can find a babysitter? \n".
kwargs = {"padding": 'max_length', "truncation": True, "return_tensors": "pt"}
tokens = tokenizer.encode_plus(message_template, **kwargs)
with torch.no_grad():
reward_tensor = model(tokens["input_ids"][0].to(model.device), attention_mask=tokens["attention_mask"][0].to(model.device)).logits.reshape(-1)
reward = reward_tensor.cpu().detach().item()
Funcionalidades
- Clasificación de texto
- Transformers
- Safetensors
- Compatible con AutoTrain
- Inference Endpoints
Casos de uso
- Modelo de recompensa para LLMs
- Detección de comportamiento no ético en la interacción asistente-usuario