PKU-Alignment
/

safe-o1-7b

Model card Files Files and versions Community

safe-o1-7b / README.md

dayone3nder's picture

Update README.md

e976148 verified about 2 months ago

|

1.48 kB

	---
	license: cc-by-4.0
	language:
	- en
	base_model: Qwen/Qwen2.5-7B-Instruct
	---

	# Safe-o1 Model Card 🤖✨

	## Model Overview 📝
	`Safe-o1` is an innovative language model that introduces a self-monitoring thinking process to detect and filter unsafe content, achieving more robust safety performance 🚀.

	---

	## Features and Highlights 🌟
	- Safety First 🔒: Through a self-monitoring mechanism, it detects potential unsafe content in the thinking process in real-time, ensuring outputs consistently align with ethical and safety standards.
	- Enhanced Robustness 💡: Compared to traditional models, `Safe-o1` performs more stably in complex scenarios, reducing unexpected "derailments."
	- User-Friendly 😊: Designed to provide users with a trustworthy conversational partner, suitable for various application scenarios, striking a balance between helpfulness and harmfulness.
	![](https://github.com/D4YON3/images/blob/main/figs_2025-04-03%20214712.png?raw=true)

	---

	## Usage 🚀
	You can load `Safe-o1` using the Hugging Face `transformers` library:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("PKU-Alignment/Safe-o1")
	model = AutoModelForCausalLM.from_pretrained("PKU-Alignment/Safe-o1")

	input_text = "Hello, World!"
	inputs = tokenizer(input_text, return_tensors="pt")
	outputs = model.generate(**inputs)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	```