theblackcat102's picture
Update README.md
7d67d6f
|
raw
history blame
387 Bytes
---
license: mit
---
Reward Model pretrained on openai/webgpt_comparison and humanfeedback summary. Unlike the other electra-large model this model is trained using rank loss with one more datasets.
On validation dataset the result is much more stable than usual.
You can refer to this [wandb](https://wandb.ai/theblackcat102/reward-model/runs/1d4e4oi2?workspace=) for more details