26 194 635

Florent Daudens

fdaudens

AI & ML interests

AI & Journalism

Recent Activity

liked a Space about 7 hours ago

OpenEvals/find-a-leaderboard

reacted to yjernite's post with 🔥 4 days ago

Today in Privacy & AI Tooling - introducing a nifty new tool to examine where data goes in open-source apps on 🤗 HF Spaces have tons (100Ks!) of cool demos leveraging or examining AI systems - and because most of them are OSS we can see exactly how they handle user data 📚🔍 That requires actually reading the code though, which isn't always easy or quick! Good news: code LMs have gotten pretty good at automatic review, so we can offload some of the work - here I'm using https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct to generate reports and it works pretty OK 🙌 The app works in three stages: 1. Download all code files 2. Use the Code LM to generate a detailed report pointing to code where data is transferred/(AI-)processed (screen 1) 3. Summarize the app's main functionality and data journeys (screen 2) 4. Build a Privacy TLDR with those inputs It comes with a bunch of pre-reviewed apps/Spaces, great to see how many process data locally or through (private) HF endpoints 🤗 Note that this is a POC, lots of exciting work to do to make it more robust, so: - try it: https://huggingface.co/spaces/yjernite/space-privacy - reach out to collab: https://huggingface.co/spaces/yjernite/space-privacy/discussions

liked a model 4 days ago

moonshotai/Kimi-VL-A3B-Thinking

View all activity

Organizations

fdaudens's activity

liked a Space about 7 hours ago

Find a leaderboard

🔍

Explore and discover all leaderboards from the HF community

reacted to yjernite's post with 🔥 4 days ago

Post

3011

Today in Privacy & AI Tooling - introducing a nifty new tool to examine where data goes in open-source apps on 🤗

HF Spaces have tons (100Ks!) of cool demos leveraging or examining AI systems - and because most of them are OSS we can see exactly how they handle user data 📚🔍

That requires actually reading the code though, which isn't always easy or quick! Good news: code LMs have gotten pretty good at automatic review, so we can offload some of the work - here I'm using Qwen/Qwen2.5-Coder-32B-Instruct to generate reports and it works pretty OK 🙌

The app works in three stages:
1. Download all code files
2. Use the Code LM to generate a detailed report pointing to code where data is transferred/(AI-)processed (screen 1)
3. Summarize the app's main functionality and data journeys (screen 2)
4. Build a Privacy TLDR with those inputs

It comes with a bunch of pre-reviewed apps/Spaces, great to see how many process data locally or through (private) HF endpoints 🤗

Note that this is a POC, lots of exciting work to do to make it more robust, so:
- try it: yjernite/space-privacy
- reach out to collab: yjernite/space-privacy

liked a model 4 days ago

moonshotai/Kimi-VL-A3B-Thinking

Image-Text-to-Text • Updated about 15 hours ago • 28.4k • 369

replied to their post 4 days ago

Btw, there's this excellent recap of what MCPs are by
@Kseniase
https://huggingface.co/blog/Kseniase/mcp

posted an update 4 days ago

Post

1377

Just tested something this morning that feels kind of game-changing for how we publish, discover, and consume news with AI: connecting Claude directly to the New York Times through MCP.

Picture this: You ask Claude about a topic, and it instantly pulls verified and trusted NYT content — no more guessing if the info is accurate.

The cool part? Publishers stay in control of what they share via API, and users get fast, reliable access through the AI tools they already use. Instead of scraping random stuff off the web, we get a future where publishers actively shape how their journalism shows up in AI.

It’s still a bit technical to set up right now, but this could get super simple soon — like installing apps on your phone, but for your chatbot. And you keep the brand connection, too.

Not saying it solves everything, but it’s definitely a new way to distribute content — and maybe even find some fresh value in the middle of this whole news + AI shakeup. Early movers will have a head start.

Curious what folks think — could MCPs be a real opportunity for journalism?

1 reply

upvoted an article 5 days ago

Article

Cohere on Hugging Face Inference Providers 🔥

5 days ago

• 75

published a Space 6 days ago

Model Stats Search Keywords

🔥

Search and download statistics of models on Hugging Face

updated a Space 6 days ago

Model Stats Search Keywords

🔥

Search and download statistics of models on Hugging Face

upvoted an article 6 days ago

Article

Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition 🤖

7 days ago

• 36

updated a dataset 6 days ago

huggingface/documentation-images

Viewer • Updated 3 days ago • 52 • 2.99M • 60

reacted to merve's post with 🔥 7 days ago

Post

4005

sooo many open AI releases past week, let's summarize! 🤗
merve/april-11-releases-67fcd78be33d241c0977b9d2

multimodal
> Moonshot AI released Kimi VL Thinking, first working open-source multimodal reasoning model and Kimi VL Instruct, both 16B MoEs with 3B active params (OS)
> InternVL3 released based on Qwen2.5VL, 7 ckpts with various sizes (1B to 78B)

LLMs
> NVIDIA released Llama-3_1-Nemotron-Ultra-253B-v1 an LLM built on Llama 405B for reasoning, chat and tool use
> Agentica released DeepCoder-14B-Preview, fine-tuned version of DeepSeek-R1-Distilled-Qwen-14B on problem-test pairs, along with the compiled dataset
> Zyphra/ZR1-1.5B is a new small reasoning LLM built on R1-Distill-1.5B (OS)
> Skywork-OR1-32B-Preview is a new reasoning model by Skywork

Image Generation
> HiDream releases three new models, HiDream I1 Dev, I1 Full, and I1 fast for image generation (OS)

*OS ones have Apache 2.0 or MIT licenses

4 replies

posted an update 9 days ago

Post

2079

Want AI that truly understands your country's culture? Public institutions are sitting on the next AI revolution - and here's the practical guide to unlock it.

I've had fascinating conversations recently about sovereign AI, with people trying to solve this recurring question: "How do we build AI that truly understands our culture?"

This guide by @evijit and @yjernite brings lots of insights about this question. It's not just about throwing data at models. It's about partnering cultural expertise with tech infrastructure in ways we're just starting to figure out.

An example? The National Library of Norway already has 150+ AI models on Hugging Face. They're not just digitizing books - they're building AI that thinks in Norwegian, understands Norwegian values, and serves Norwegian citizens.

This is sovereign AI in practice: technology that understands your culture, values, and languages.

Especially loved the practical examples on how to do this:
- Real examples from museums, libraries, and government agencies
- How to convert complex documents (PDFs, PowerPoints) into ML-ready formats
- Code templates for processing public data
- Technical recipes for sharing datasets on open platforms

The stakes? Citizens' ability to leverage their collective digital intelligence.

The technology is ready. The infrastructure exists. The guide shows exactly how to use it. What's needed is your cultural expertise to shape these tools.

Check it out: https://huggingface.co/blog/evijit/public-org-data-ai

P.s.: Building cool projects in a public institution? Share them in the comments for others to learn from!

liked a model 9 days ago

agentica-org/DeepCoder-14B-Preview

Text Generation • Updated 11 days ago • 32.1k • 588

liked a model 10 days ago

ByteDance/InfiniteYou

Text-to-Image • Updated 5 days ago • 14.5k • 581

liked a Space 10 days ago

European Art Yolo11

🔥

Detect and annotate objects in European art images

upvoted an article 10 days ago

Article

Journey to 1 Million Gradio Users!

17 days ago

• 15

liked a model 10 days ago

reducto/RolmOCR

Image-Text-to-Text • Updated 18 days ago • 25.6k • 384

upvoted a paper 10 days ago

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Paper • 2504.07096 • Published 11 days ago • 70

posted an update 10 days ago

Post

2779

Do chatbots lie about Céline Dion? We now have answers, not speculation.

Ai2 just released OLMoTrace and it's a game-changer for transparency. You can literally see where an AI's responses come from in its training data - in real time.

The demo shows results about Céline. So I tried it out myself! Watch what happens in the video.

For journalists, researchers studying hallucinations and anyone who needs to trust their AI, this is like getting X-ray vision into AI systems. When the model made claims, I could instantly verify them against original sources. When it hallucinated, I could see why.

You can finally 1) understand how LLMs actually work and 2) verify if what they're saying is true. No more blind trust.

This pushes the open data movement to the next level.

👉 Blog post: https://allenai.org/blog/olmotrace
👉 Paper: https://www.datocms-assets.com/64837/1743890415-olmotrace.pdf

P.S.: A word of caution: never use a chatbot as a knowledge base. It's not Google. Better use it with a connection to the internet.

1 reply