Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
Kaviyarasan V
kaveeshwaran
Follow
0 followers
·
10 following
settings
v-kaviyarasan-v-2a111525a
AI & ML interests
I want to be an AI Developer
Recent Activity
new
activity
2 days ago
huggingface/HuggingDiscussions:
[FEEDBACK] Notifications
replied
to
philschmid
's
post
2 days ago
Gemini 2.5 Flash is here! We excited launch our first hybrid reasoning Gemini model. In Flash 2.5 developer can turn thinking off. **TL;DR:** - 🧠 Controllable "Thinking" with thinking budget with up to 24k token - 🌌 1 Million multimodal input context for text, image, video, audio, and pdf - 🛠️ Function calling, structured output, google search & code execution. - 🏦 $0.15 1M input tokens; $0.6 or $3.5 (thinking on) per million output tokens (thinking tokens are billed as output tokens) - 💡 Knowledge cut of January 2025 - 🚀 Rate limits - Free 10 RPM 500 req/day - 🏅Outperforms 2.0 Flash on every benchmark Try it ⬇️ https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-preview-04-17
reacted
to
m-ric
's
post
with 🔥
2 days ago
New king of open VLMs: InternVL3 takes Qwen 2.5's crown! 👑 InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline. ➡️ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential 🔢 : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together. 💫 The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? ❤️), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase 🎨. They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. 👑
View all activity
Organizations
None yet
models
1
kaveeshwaran/distilbert-base-uncased-finetuned-sst-2-english
Updated
Feb 25
datasets
1
kaveeshwaran/face_recog-doc
Updated
3 days ago
•
30