Kaviyarasan V's picture

1

Kaviyarasan V

kaveeshwaran

·

AI & ML interests

I want to be an AI Developer

Recent Activity

new activity 2 days ago

huggingface/HuggingDiscussions:[FEEDBACK] Notifications

replied to philschmid's post 2 days ago

Gemini 2.5 Flash is here! We excited launch our first hybrid reasoning Gemini model. In Flash 2.5 developer can turn thinking off. **TL;DR:** - 🧠 Controllable "Thinking" with thinking budget with up to 24k token - 🌌 1 Million multimodal input context for text, image, video, audio, and pdf - 🛠️ Function calling, structured output, google search & code execution. - 🏦 $0.15 1M input tokens; $0.6 or $3.5 (thinking on) per million output tokens (thinking tokens are billed as output tokens) - 💡 Knowledge cut of January 2025 - 🚀 Rate limits - Free 10 RPM 500 req/day - 🏅Outperforms 2.0 Flash on every benchmark Try it ⬇️ https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-preview-04-17

reacted to m-ric's post with 🔥 2 days ago

New king of open VLMs: InternVL3 takes Qwen 2.5's crown! 👑 InternVL have been a wildly successful series of model : and the latest iteration has just taken back their crown thanks to their superior, natively multimodal vision training pipeline. ➡️ Most of the vision language models (VLMs) these days are built like Frankenstein : take a good text-only Large Language Model (LLM) backbone, stitch a specific vision transformer (ViT) on top of it. Then the training is sequential 🔢 : 1. Freeze the LLM weights while you train the ViT only to work with the LLM part, then 2. Unfreeze all weights to train all weights in order to work together. 💫 The Shanghai Lab decided to challenge this paradigm and chose this approach that they call "native". For each of their model sizes, they still start from a good LLM (mostly Qwen-2.5 series, did I tell you I'm a huge fan of Qwen? ❤️), and stitch the ViT, but they don't freeze anything : they train all weights together with interleaved text and image understanding data in a single pre-training phase 🎨. They claim it results in more seamless interactions between modalities. And the results prove them right: they took the crown of top VLMs, at nearly all sizes, from their Qwen-2.5 parents. 👑

View all activity

Organizations

None yet

models 1

kaveeshwaran/distilbert-base-uncased-finetuned-sst-2-english

datasets 1

kaveeshwaran/face_recog-doc

Updated 3 days ago • 30