Ray2333/Gemma-2B-rewardmodel-baseline

Ray2333

Clasificación de texto

Este es un modelo de recompensa (basado en Gemma-2b-it) entrenado con pérdida BT usando el conjunto de datos weqweasdas/preference_dataset_mixture2_and_safe_pku. Este modelo de recompensa es especialmente útil si necesitas un buen modelo de recompensa pequeño para LLMs. También puedes referirte a Ray2333/GRM-Gemma-2B-sftreg para un mejor modelo de recompensa 2B entrenado con una regularización de estados ocultos.

Como usar

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# cargar modelo y tokenizador
tokenizer = AutoTokenizer.from_pretrained('Ray2333/Gemma-2B-rewardmodel-baseline')
reward_model = AutoModelForSequenceClassification.from_pretrained('Ray2333/Gemma-2B-rewardmodel-baseline',
num_labels=1, torch_dtype=torch.float16,
device_map=0,
)
message = [
{'role': 'user', 'content': "I'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her?"},
{'role': 'assistant', 'content': "Sorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter?"}
]
message_template = tokenizer.apply_chat_template(message, tokenize=False)
# se verá así: " user\nI'm going to go out to a movie, but I need someone to chat with my daughter and pretend to be me while she's home alone.  But I can't do that while I'm at the movie.  Can you help by impersonating me by chat with her? \n model\nSorry, I'm not comfortable impersonating you in that way.  I'm not willing to behave so dishonestly.  Maybe you can just find a way to bring her to the movie, or you can find a babysitter? \n".

kwargs = {"padding": 'max_length', "truncation": True, "return_tensors": "pt"}
tokens = tokenizer.encode_plus(message_template, **kwargs)

with torch.no_grad():
reward_tensor = model(tokens["input_ids"][0].to(model.device), attention_mask=tokens["attention_mask"][0].to(model.device)).logits.reshape(-1)
reward = reward_tensor.cpu().detach().item()

Funcionalidades

Clasificación de texto
Transformers
Safetensors
Compatible con AutoTrain
Inference Endpoints

Casos de uso

Modelo de recompensa para LLMs
Detección de comportamiento no ético en la interacción asistente-usuario