Update README.md

475757c verified 8 months ago

5.5 kB

	---
	license: apache-2.0
	datasets:
	- agentlans/text-quality-v2
	language:
	- en
	base_model:
	- microsoft/deberta-v3-base
	pipeline_tag: text-classification
	---
	# DeBERTa v3 for Text Quality Assessment

	## Model Details

	- Model Architecture: DeBERTa v3 (xsmall and base variants)
	- Task: Text quality assessment (regression)
	- Training Data: Text Quality Meta-Analysis Dataset at [agentlans/text-quality-v2](https://huggingface.co/datasets/agentlans/text-quality-v2)
	- Output: Single continuous value representing text quality

	## Intended Use

	These models are designed to assess the quality of English text, where "quality" refers to legible sentences that are not spam and contain useful information. They can be used for:

	- Content moderation
	- Spam detection
	- Information quality assessment
	- Text filtering

	## Usage

	The models accept text input and return a single continuous value representing the assessed quality. Higher values indicate higher perceived quality. Example usage is provided in the code snippet.

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_name="agentlans/deberta-v3-base-quality-v2"

	# Put model on GPU or else CPU
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = model.to(device)

	def quality(text):
	"""Processes the text using the model and returns its logits.
	In this case, it's interpreted as the the combined quality score for that text."""
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
	with torch.no_grad():
	logits = model(**inputs).logits.squeeze().cpu()
	return logits.tolist()

	# Example usage
	text = [x.strip() for x in """
	Congratulations! You've won a $1,000 gift card! Click here to claim your prize now!!!
	Page 1 2 3 4 5 Next Last>>
	Urgent: Your account has been compromised! Click this link to verify your identity and secure your account immediately!!!
	Today marks a significant milestone in our journey towards sustainability! 🌍✨ We’re excited to announce our partnership with local organizations to plant 10,000 trees in our community this fall. Join us in making a positive impact on our environment!
	In recent years, the impact of climate change has become increasingly evident, affecting ecosystems and human livelihoods across the globe.
	The mitochondria is the powerhouse of the cell.
	Exclusive discount on Super MitoMax Energy Boost! Recharge your mitochondria today!
	Everyone is talking about this new diet that guarantees weight loss without exercise!
	Discover five tips for improving your productivity while working from home.
	""".strip().split("\n")]

	result = quality(text)
	for x, s in zip(text, result):
	print(f"Text: {x}\nQuality: {round(s, 2)}\n")
	```

	Example output for the `base` size model:
	```
	Text: Congratulations! You've won a $1,000 gift card! Click here to claim your prize now!!!
	Quality: -1.25

	Text: Page 1 2 3 4 5 Next Last>>
	Quality: -1.54

	Text: Urgent: Your account has been compromised! Click this link to verify your identity and secure your account immediately!!!
	Quality: -2.01

	Text: Today marks a significant milestone in our journey towards sustainability! 🌍✨ We’re excited to announce our partnership with local organizations to plant 10,000 trees in our community this fall. Join us in making a positive impact on our environment!
	Quality: -1.72

	Text: In recent years, the impact of climate change has become increasingly evident, affecting ecosystems and human livelihoods across the globe.
	Quality: 0.45

	Text: The mitochondria is the powerhouse of the cell.
	Quality: 1.32

	Text: Exclusive discount on Super MitoMax Energy Boost! Recharge your mitochondria today!
	Quality: -1.16

	Text: Everyone is talking about this new diet that guarantees weight loss without exercise!
	Quality: -0.27

	Text: Discover five tips for improving your productivity while working from home.
	Quality: -0.42
	```

	## Performance Metrics

	Root mean squared error (RMSE) on 20% held-out evaluation set:
	- DeBERTa v3 xsmall: 0.6296
	- DeBERTa v3 base: 0.5038

	The base model outperforms the xsmall variant in terms of accuracy.

	## Limitations and Biases

	- The models are trained on a specific dataset and may not generalize well to all types of text or domains.
	- "Quality" is a subjective concept, and the models' assessments may not align with all human judgments.
	- The models may exhibit biases present in the training data.
	- For example, there is a bias against self-help, promotional, and public relations material.
	- They do not assess factual correctness or grammatical accuracy.

	## Ethical Considerations

	- These models should not be used as the sole determinant for content moderation or censorship.
	- Care should be taken to avoid reinforcing existing biases in content selection or promotion.
	- The models' outputs should be interpreted as suggestions rather than definitive judgments.

	## Caveats and Recommendations

	- Use these models in conjunction with other tools and human oversight for content moderation.
	- Regularly evaluate the models' performance on your specific use case and data.
	- Be aware that the models may not perform equally well across all text types or domains.
	- Consider fine-tuning the models on domain-specific data for improved performance in specialized applications.