Gyanateet Dutta's picture

Gyanateet Dutta

Ryukijano

·

https://ryukijano.github.io

AI & ML interests

Computer Graphics, General Artificial Intelligence,model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.

Recent Activity

upvoted a paper 1 day ago

WORLDMEM: Long-term Consistent World Simulation with Memory

upvoted a paper 9 days ago

DDT: Decoupled Diffusion Transformer

upvoted a paper 12 days ago

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

View all activity

Organizations

Ryukijano's activity

upvoted a paper 1 day ago

WORLDMEM: Long-term Consistent World Simulation with Memory

Paper • 2504.12369 • Published 4 days ago • 28

upvoted a paper 9 days ago

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published 13 days ago • 71

upvoted a paper 12 days ago

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Paper • 2504.05118 • Published 14 days ago • 24

upvoted a collection 15 days ago

TxGemma Release

Collection of open models to accelerate the development of therapeutics. • 5 items • Updated 18 days ago • 49

upvoted a paper 17 days ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 178

upvoted a paper 18 days ago

Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

Paper • 2503.21694 • Published 24 days ago • 16

upvoted a collection about 1 month ago

💫StarVector Models

StarVector is a multimodal LLM for Scalable Vector Graphics (SVG) generation, producing structured SVG code directly from images and text. • 2 items • Updated Mar 20 • 93

upvoted an article 2 months ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.22k

upvoted a collection 3 months ago

Eagle 2

Eagle 2 is a family of frontier vision-language models with vision-centric design. The model supports 4K HD input, long-context video, and grounding. • 9 items • Updated 6 days ago • 31

upvoted a collection 4 months ago

VILA: On Pre-training for Visual Language Models

10 items • Updated 4 days ago • 53

upvoted 2 papers 5 months ago

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published Nov 15, 2024 • 124

Grounding Image Matching in 3D with MASt3R

Paper • 2406.09756 • Published Jun 14, 2024 • 1

upvoted 3 collections 6 months ago

Sparsh

Models and datasets for Sparsh: Self-supervised touch representations for vision-based tactile sensing • 15 items • Updated Oct 24, 2024 • 12

MobileLLM

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated Nov 27, 2024 • 113

Stable Diffusion 3.5

6 items • Updated Jan 9 • 154

upvoted a paper 6 months ago

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

Paper • 2410.10774 • Published Oct 14, 2024 • 26

upvoted a paper 7 months ago

MonoFormer: One Transformer for Both Diffusion and Autoregression

Paper • 2409.16280 • Published Sep 24, 2024 • 18