alamios
/

DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B

Text Generation

text-generation-inference

Model card Files Files and versions Community

DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B / README.md

alamios's picture

Update README.md

ca0d2cc verified 3 months ago

|

history blame contribute delete

1.07 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-Coder-0.5B
	datasets:
	- alamios/DeepSeek-R1-Distill-Qwen-32B-Conversations
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- code
	- qwen
	- qwen2.5
	- qwen-coder
	- codeqwen
	- deepseek
	---

	# DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B

	Updated to v1

	This model is trained on CODE outputs of <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B">deepseek-ai/DeepSeek-R1-Distill-Qwen-32B</a> and is meant to be used only as draft model for speculative decoding.

	It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.

	# Data info

	The data consists of code tasks collected from various datasets. It has been trained for 2 epochs on 2.5k unique examples, for a total of 7.6 million tokens per epoch.

	Since data generation was done using spare GPU time, I may publish a further trained version later.