safe-o1-7b / README.md
dayone3nder's picture
Update README.md
e976148 verified
|
raw
history blame
1.48 kB
---
license: cc-by-4.0
language:
- en
base_model: Qwen/Qwen2.5-7B-Instruct
---
# Safe-o1 Model Card πŸ€–βœ¨
## Model Overview πŸ“
`Safe-o1` is an innovative language model that introduces a **self-monitoring thinking process** to detect and filter unsafe content, achieving more robust safety performance πŸš€.
---
## Features and Highlights 🌟
- **Safety First** πŸ”’: Through a self-monitoring mechanism, it detects potential unsafe content in the thinking process in real-time, ensuring outputs consistently align with ethical and safety standards.
- **Enhanced Robustness** πŸ’‘: Compared to traditional models, `Safe-o1` performs more stably in complex scenarios, reducing unexpected "derailments."
- **User-Friendly** 😊: Designed to provide users with a trustworthy conversational partner, suitable for various application scenarios, striking a balance between helpfulness and harmfulness.
![](https://github.com/D4YON3/images/blob/main/figs_2025-04-03%20214712.png?raw=true)
---
## Usage πŸš€
You can load `Safe-o1` using the Hugging Face `transformers` library:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("PKU-Alignment/Safe-o1")
model = AutoModelForCausalLM.from_pretrained("PKU-Alignment/Safe-o1")
input_text = "Hello, World!"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```