Vilhelm von Ehrenheim

vonehrenheim

https://qa.tech/

while
while

AI & ML interests

None yet

Recent Activity

updated a model 3 months ago

QAdottech/qwen2.5-7b-custom

liked a model 3 months ago

convergence-ai/proxy-lite-3b

updated a model 4 months ago

QAdottech/qwen2.5-7b

View all activity

Organizations

vonehrenheim's activity

updated a model 3 months ago

QAdottech/qwen2.5-7b-custom

Updated Mar 5 • 1

liked a model 3 months ago

convergence-ai/proxy-lite-3b

Image-Text-to-Text • Updated Mar 8 • 392 • 137

updated 2 models 4 months ago

QAdottech/qwen2.5-7b

Image-Text-to-Text • Updated Feb 19 • 2

QAdottech/qwen2.5-7b-instruct-palette-manifest

Updated Feb 12

published a model 4 months ago

QAdottech/qwen2.5-7b-instruct-palette-manifest

Updated Feb 12

upvoted a collection 6 months ago

PixMo

Collection

A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated Apr 30 • 71

upvoted a paper 7 months ago

OmniParser for Pure Vision Based GUI Agent

Paper • 2408.00203 • Published Aug 1, 2024 • 26

liked a model 7 months ago

microsoft/OmniParser

Image-Text-to-Text • Updated Dec 2, 2024 • 611 • 1.67k

liked a model 10 months ago

openbmb/MiniCPM-V-2_6

Image-Text-to-Text • Updated Jan 15 • 63.6k • 976

upvoted a paper 12 months ago

Transformers meet Neural Algorithmic Reasoners

Paper • 2406.09308 • Published Jun 13, 2024 • 45

liked a Space about 1 year ago

Argilla Space

✍

upvoted an article about 1 year ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

and 2 others •

Apr 15, 2024

• 180

reacted to m-ric's post with 👀 about 1 year ago

Post

1811

𝐓𝐡𝐞 𝐫𝐞𝐭𝐮𝐫𝐧 𝐨𝐟 𝐭𝐡𝐞 𝐑𝐍𝐍𝐬 ⚔ 𝐍𝐞𝐰 𝐌𝐚𝐦𝐛𝐚-𝐛𝐚𝐬𝐞𝐝 𝐚𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞 "𝐉𝐚𝐦𝐛𝐚"

Since the release of BERT by Google in 2019, Transformers architecture have taken over machine learning thanks to their 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝗺𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝗺, that gives them the ability to focus on important points of the input. But 𝙖𝙩𝙩𝙚𝙣𝙩𝙞𝙤𝙣 𝙘𝙤𝙢𝙥𝙪𝙩𝙖𝙩𝙞𝙤𝙣 𝙞𝙨 𝙦𝙪𝙖𝙙𝙧𝙖𝙩𝙞𝙘 𝙞𝙣 𝙩𝙝𝙚 𝙞𝙣𝙥𝙪𝙩 𝙡𝙚𝙣𝙜𝙩𝙝.

💫 The Mamba paper, published in December 2023, announced the return of the RNNs: it has no attention, but integrates a selection mechanism, which should be able to reproduce the “focus” ability of attention, in an architecture for which the compute requirements 𝗴𝗿𝗼𝘄 𝗼𝗻𝗹𝘆 𝗹𝗶𝗻𝗲𝗮𝗿𝗹𝘆 𝗶𝗻 𝗶𝗻𝗽𝘂𝘁 𝗹𝗲𝗻𝗴𝘁𝗵!
🤔 Would this work? We had yet to see a large Mamba model recovering the performance of Attention-based Transformers.

💥 But now it's done! A (Mamba + Transformers) hybrid just beat Transformers!

The AI21 Labs team just released Jamba.
They insert a few Transformer layers to inject some attention in a big pile of Mamba layers, thus getting the best of both worlds.

𝙏𝙇;𝘿𝙍:
🏗️ 𝗡𝗲𝘄 𝗠𝗼𝗘 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: 4 Jamba blocks, each of these being 7 Mamba layers for 1 Transformer.
🏋️ 𝟱𝟮𝗕 𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀, 𝟭𝟮𝗕 𝗮𝗰𝘁𝗶𝘃𝗲 𝗮𝘁 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲: This reduction is enabled by Mixture of Experts, and similar to Mixtral (47B parameters - 13B active).
🏎️ 𝗦𝗽𝗲𝗲𝗱: 𝘅𝟯 𝘁𝗵𝗿𝗼𝘂𝗴𝗵𝗽𝘂𝘁. Jamba is much faster than similar-sized Transformer models on long contexts.
📏 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵: 𝟭𝟰𝟬𝗞 𝘁𝗼𝗸𝗲𝗻𝘀 on a single 80GB A100!
💪 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: 𝘀𝘁𝗮𝘁𝗲-𝗼𝗳-𝘁𝗵𝗲-𝗮𝗿𝘁 𝗳𝗼𝗿 𝘁𝗵𝗶𝘀 𝘀𝗶𝘇𝗲. The small injection of attention seems sufficient since Jamba beats the open-source reference Mixtral-8x7B on many benchmarks!

Try it here 👉 ai21labs/Jamba-v0.1

upvoted a paper over 1 year ago

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26

upvoted a paper almost 2 years ago

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Paper • 2307.12856 • Published Jul 24, 2023 • 36