Daniel Bourke's picture

Daniel Bourke PRO

mrdbourke

·

https://www.mrdbourke.com

AI & ML interests

Computer vision. Small on-device models. VLMs. High-quality tutorials.

Recent Activity

liked a model 2 days ago

bytedance-research/UNO

liked a dataset 3 days ago

facebook/PLM-Video-Human

liked a dataset 3 days ago

facebook/PLM-Image-Auto

View all activity

Organizations

None yet

mrdbourke's activity

upvoted a collection 3 days ago

Perception Encoder

9 items • Updated 3 days ago • 17

upvoted a collection 11 days ago

Kimi-VL-A3B

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 9 days ago • 61

upvoted a collection 13 days ago

Gemma 3 QAT

Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 3 days ago • 147

upvoted an article 15 days ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

16 days ago

• 140

upvoted a paper 16 days ago

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Paper • 2504.00595 • Published 20 days ago • 34

upvoted a collection 20 days ago

ShieldGemma

ShieldGemma is a family of models for text and image content moderation. • 4 items • Updated 18 days ago • 6

upvoted an article 20 days ago

Article

Training and Finetuning Reranker Models with Sentence Transformers v4

26 days ago

• 112

upvoted a paper 20 days ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14 • 96

upvoted a collection 20 days ago

Vision Language Models Quantization

Vision Language Models (VLMs) quantized by Neural Magic • 20 items • Updated Mar 4 • 6

upvoted an article 24 days ago

Article

Welcome to Inference Providers on the Hub 🔥

Jan 28

• 475

upvoted a collection about 1 month ago

LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 11 items • Updated 3 days ago • 60

upvoted an article about 1 month ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Mar 12

• 392

upvoted a collection about 2 months ago

olmOCR

olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 4 items • Updated Mar 19 • 105

upvoted an article about 2 months ago

Article

FastRTC: The Real-Time Communication Library for Python

Feb 25

• 158

upvoted 5 collections about 2 months ago

Granite Vision Models

3 items • Updated 4 days ago • 13

SmolVLM2 📺 Smallest video LM ever 🤏🏻

11 items • Updated Feb 25 • 82

Phi-4

Phi-4 family of small language and multi-modal models. • 9 items • Updated 3 days ago • 116

Ovis2

Our latest advancement in multi-modal large language models (MLLMs) • 15 items • Updated 27 days ago • 59

SigLIP 2

OpenCLIP and timm SigLIP 2 models • 45 items • Updated Feb 21 • 14

upvoted an article about 2 months ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

• 152