|
--- |
|
library_name: transformers |
|
license: openrail++ |
|
datasets: |
|
- textdetox/multilingual_paradetox |
|
- chameleon-lizard/synthetic-multilingual-paradetox |
|
language: |
|
- en |
|
- ru |
|
- uk |
|
- am |
|
- de |
|
- es |
|
- zh |
|
- ar |
|
- hi |
|
pipeline_tag: text2text-generation |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
Finetune of the mt0-xl model for text toxification task. |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This is a finetune of mt0-xl model for text toxification task. Can be used for synthetic data generation from non-toxic examples. |
|
|
|
- **Developed by:** Nikita Sushko |
|
- **Model type:** mt5-xl |
|
- **Language(s) (NLP):** English, Russian, Ukranian, Amharic, German, Spanish, Chinese, Arabic, Hindi |
|
- **License:** OpenRail++ |
|
- **Finetuned from model:** mt0-xl |
|
|
|
## Uses |
|
|
|
This model is intended to be used for synthetic data generation from non-toxic examples. |
|
|
|
### Direct Use |
|
|
|
The model may be directly used for text toxification tasks. |
|
|
|
### Out-of-Scope Use |
|
|
|
The model may be used for generating toxic versions of sentences. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
Since this model generates toxic versions of sentences, it may be used to increase toxicity of generated texts. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
import transformers |
|
|
|
checkpoint = 'chameleon-lizard/tox-mt0-xl' |
|
|
|
tokenizer = transformers.AutoTokenizer.from_pretrained(checkpoint) |
|
model = transformers.AutoModelForSeq2SeqLM.from_pretrained(checkpoint, torch_dtype='auto', device_map="auto") |
|
|
|
pipe = transformers.pipeline( |
|
"text2text-generation", |
|
model=model, |
|
tokenizer=tokenizer, |
|
max_length=512, |
|
truncation=True, |
|
) |
|
|
|
language = 'English' |
|
text = "That's dissapointing." |
|
print(pipe('Rewrite the following text in {language} the most toxic and obscene version possible: {text}')[0]['generated_text']) |
|
# Resulting text: "That's dissapointing, you stupid ass bitch." |
|
``` |
|
|
|
Be sure to prompt with the provided prompt format for the best performance. Failure to include target language may result in model responses be in random language. |