16 12 12

kas

shing3232

AI & ML interests

None yet

Recent Activity

upvoted an article 6 days ago

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

upvoted a paper 11 days ago

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

upvoted a paper 16 days ago

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

View all activity

Organizations

None yet

shing3232's activity

upvoted an article 6 days ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 236

upvoted a paper 11 days ago

Hogwild! Inference: Parallel LLM Generation via Concurrent Attention

Paper • 2504.06261 • Published 17 days ago • 104

upvoted a paper 16 days ago

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published 18 days ago • 25

liked a model 27 days ago

SakuraLLM/Sakura-GalTransl-7B-v3

Updated 14 days ago • 13.7k • 54

liked a model about 1 month ago

webbigdata/ALMA-7B-Ja-V2

Text Generation • Updated Nov 3, 2024 • 316 • 18

New activity in agentica-org/DeepScaleR-1.5B-Preview 2 months ago

I have difficulty to trigger thinking process

#12 opened 2 months ago by

shing3232

New activity in tencent/Tencent-Hunyuan-Large 6 months ago

这个模型得什么配置能运行起来啊

#13 opened 6 months ago by

demo001s

updated a model 6 months ago

shing3232/Sakura-1.5B-Qwen2.5-v1.0-GGUF-IMX

Updated Nov 8, 2024 • 20 • 1

upvoted a collection 7 months ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 308

liked a model 10 months ago

UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3

Text Generation • Updated Jul 1, 2024 • 7.78k • 123

updated a model 11 months ago

shing3232/sakura-14b-qwen2beta-v0.9.2-IMX

Updated May 31, 2024 • 9 • 3

New activity in SakuraLLM/Sakura-14B-Qwen2beta-v0.9.2-GGUF 11 months ago

CUDA运行不了BF16模型？

#1 opened 11 months ago by

NeuronAstate

New activity in Qwen/Qwen1.5-7B-Chat-GGUF 11 months ago

Please post f16 quantization.

#1 opened 11 months ago by

ZeroWw

liked a model 11 months ago

shing3232/sakura-14b-qwen2beta-v0.9.2-IMX

Updated May 31, 2024 • 9 • 3

upvoted a paper 12 months ago

BASS: Batched Attention-optimized Speculative Sampling

Paper • 2404.15778 • Published Apr 24, 2024 • 10

New activity in Qwen/CodeQwen1.5-7B-Chat about 1 year ago

What are the diffences of this with Qwen/CodeQwen1.5-7B

#5 opened about 1 year ago by

Kalemnor

liked a model about 1 year ago

databricks/dbrx-instruct

Text Generation • Updated Apr 19, 2024 • 9.18k • 1.11k

New activity in Qwen/Qwen1.5-MoE-A2.7B-Chat about 1 year ago

请问这个版本GPU内存消耗28G与14B对比如何?

#7 opened about 1 year ago by

william0014

upvoted a paper about 1 year ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 98

New activity in Qwen/qwen1.5-MoE-A2.7B-Chat-demo about 1 year ago

How is the inference so fast in this free hardware space?

#1 opened about 1 year ago by

mahiatlinux