alamios
/

DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B

Text Generation

text-generation-inference

Model card Files Files and versions Community

DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B / README.md

alamios's picture

Upload 7 files

bb7b152 verified 4 months ago

|

1.01 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-Coder-0.5B
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- code
	- qwen
	- qwen2.5
	- qwen-coder
	- codeqwen
	- deepseek
	---

	# DeepSeek-R1-DRAFT-Qwen2.5-Coder-0.5B

	This model is trained on CODE outputs of <a href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B">deepseek-ai/DeepSeek-R1-Distill-Qwen-32B</a> and is meant to be used only as draft model for speculative decoding.

	It's specifically intended for users of 3090/4090, allowing you to run the DeepSeek-R1-Distill-Qwen-32B-Q4_K_M GGUF version with 16k context and speeding up generation without sacrificing more context length or model quality.

	# Data info

	The data consists of code tasks collected from various datasets. It has been trained for 4 epochs on 1400 unique examples, for a total of 4,600,000 tokens per epoch.

	Since data generation was done using spare GPU time, I may publish a further trained version later.